Archive for January, 2017

API design tweets

Monday, January 30th, 2017

Posted a bunch of API design related tweets last year. Recapturing them here:

Does it make your life or the user’s life easier? Good API design puts the user’s first. As a lib writer the burden is on you, not on them.

The main trick of good API design is simply to use it yourself in a real-world scenario - a.k.a. eat your own dogfood. Use your own product.

API design: consider offering both a low-level API (harder to use, but no overhead) and a high-level API (more user-friendly, maybe slower).

The trick here is to provide the HL version as source code that uses the LL version. Also serves as example / sample.

API: in any case, don’t break people’s code with your new version. Don’t change the API without a good reason.

If you do, provide old functions (marked as deprecated) as wrappers that re-route to the new functions. Do. Not. Break. Your. Users’. Code.

API: have a dummy CPP file that calls all API functions (without purpose). Ensure it keeps compiling. If it breaks, you did something wrong.

PhysX: bulldozer test (PhysX 3.4, CPU mode)

Tuesday, January 24th, 2017

Another quick test in PEEL 1.1.

PhysX tip: aggregates and MBP

Monday, January 23rd, 2017

This video shows 200 kinematic characters running around. There is no “physics” per-se in this scene - zero dynamic objects, everything is kinematic and controlled by the users’ code. But the scene still takes a lot of time in PhysX, because all these shapes have to be updated in the broadphase structure.

This is a worst-case scenario for the default broadphase (SAP): all objects move all the time, and they are all located at the same altitude. As a result, the projections of the objects’ AABBs overlap a lot on the Y axis, which creates a lot of “swaps” in the structure, and this takes a lot of time to update.

Of course this is an artificial scene, but it shows problems that do happen in real-world scenarios, in particular if we add all the extra bounds from the static environment. This is not shown in the video but there are other scenes in the combo box to test this case as well. The mockup static level looks like this:

So how do we make things run faster here? There are two main ways.

The first tip is to use “aggregates”. An aggregate is a collection of actors grouped together to form a single entry in the broadphase. In PhysX you should already be familiar with compound actors that group together multiple shapes within a single actor. It is the same idea: you can group together multiple actors within a single aggregate.

A typical use-case is a ragdoll / character, as shown in this video. In this example each character has 19 body parts, i.e. 19 actors. By default all these actors have a broadphase entry each. The body parts and their AABBs overlap each-other quite a lot all the time, and this puts a lot of stress on the SAP broadphase. But if you put each character in its own aggregate, it suddenly creates 19 times less entries in the broadphase, and an overlap is only registered when two characters touch each-other - i.e. when the white compound bounds shown in the video at 0:27 overlap each-other.

If self-collisions or character-vs-character collisions are needed, additional tests are performed after the broadphase to take care of those. The code becomes more complex, since there is now a two-level hierarchy in the broadphase module, but the results are faster overall than putting everything naïvely in the broadphase. Most notably, when self-collisions within an aggregate are not needed, the filtering is done by testing a single bit (at aggregate level) instead of doing this for each overlapping pair within the aggregate. Generally speaking it is always a good idea to use aggregates, provided you don’t put thousands of actors in each of them.

The second tip is to consider using “MBP” (for Multi Box Pruning). This is an alternative broadphase implementation that does not suffer from the same pitfalls as SAP, and it tends to be faster when a lot of objects are moving at the same time. On the other hand it is usually slower than SAP when few objects are moving, i.e. when the majority of the scene is sleeping. This implementation is based on my old box pruning code (but rewritten and much much faster), borrows ideas from the “multi-SAP” approach I described here, and then adds an additional layer of code to take care of sleeping objects. I wrote about it before and showed a demo of it, in a post where it was called “broadphase X”. Well, now you know, and you can grab the code on GitHub.

MBP currently works with user-defined regions, i.e. it is more tedious to setup than SAP - and that’s one main reason why it is not enabled by default. PEEL simply takes the scene’s global bounds and divides them into grid cells, which is usually a good default setup. A real game could do something more advanced, but it is not always needed. As you can see in the video, simply using a few default MBP grid cells has a large impact on performance: in this scene it is pretty much the same performance gain as using aggregates.

Then one can of course use both aggregates and MBP. But it does not always help. In this particular test case for example (200 kinematic characters alone, no mockup static level), combining both does not lead to additional gains compared to simply using one. The performance with the various options look like this:

YMMW and things will depend a lot on the scene’s configuration, the percentage of objects moving at any given time compared to the whole scene, etc.

In any case this post was just to introduce two options you can consider when broadphase performance becomes an issue: aggregates and MBP. Both of them have been used and shipped in AAA games, on PC as well as consoles like the PS4. They are viable options that you could experiment with.

PhysX tip: use the new Opcode2-based midphase structure

Friday, January 20th, 2017

PhysX 3.4 has a new mesh structure based on Opcode 2, which is used for the “midphase” queries (i.e. any collision query against a triangle mesh).

It is not enabled by default because it currently has some limitations compared to the previous midphase structure:

  • it does not support deformable meshes (i.e. it does not support PxTriangleMesh::getVerticesForModification() and PxTriangleMesh::refitBVH())
  • it is not implemented on all platforms. It is currently only available on platforms for which PX_INTEL_FAMILY is defined (that includes PCs but also consoles like the Xbox One and PS4).

To enable it, look up the comments for PxCookingParams::midphaseDesc and the PxMidphaseDesc class. Or check the PhysX manual. Or the PEEL code. Basically it will come down to simply adding this line to the cooking params before passing them to PxCreateCooking:

PxCookingParams Params;

Overall, the new structure should be faster than the previous one. It is also much faster to build, so if for some reason you must cook triangle meshes at runtime, this new structure should help. Memory usage should be roughly the same as before.

You can see the results in PEEL 1.1. The new midphase structure is selected by default in PEEL, but you can go back to the old one using the “cooking” tab in the PhysX 3.4 plugin’s UI. Note that Opcode 2 used as a standalone library still provides significantly faster results (you can see that in PEEL as well), because PhysX has a larger per-query overhead for management, filtering, etc.

The midphase structure is also used in rigid body simulation, for dynamic objects colliding against triangle meshes (to fetch candidate triangles). So switching to the new structure might also give you performance gains there, even if you are not using scene queries.

PhysX tip: make sure debug visualization is really disabled

Friday, January 20th, 2017

In PhysX, debug visualization is enabled or disabled by this call:

PxScene::setVisualizationParameter(PxVisualizationParameter::eSCALE, Value);

With Value = 0.0 to disable it, and usually Value = 1.0 to enable it (but any non-zero value will enable it, and then be used as a scale factor for normals, etc).

Simply setting the PxVisualizationParameter::eSCALE to 1.0 does not render anything on its own. Users have to enable additional flags to tell the system which debug gizmos they want to see. For example here is the list I use in PEEL (but there’s more in the SDK):

  • PxVisualizationParameter::eSCALE,
  • PxVisualizationParameter::eBODY_AXES,
  • PxVisualizationParameter::eBODY_MASS_AXES,
  • PxVisualizationParameter::eBODY_LIN_VELOCITY,
  • PxVisualizationParameter::eBODY_ANG_VELOCITY,
  • PxVisualizationParameter::eCONTACT_POINT,
  • PxVisualizationParameter::eCONTACT_NORMAL,
  • PxVisualizationParameter::eACTOR_AXES,
  • PxVisualizationParameter::eCOLLISION_AABBS,
  • PxVisualizationParameter::eCOLLISION_SHAPES,
  • PxVisualizationParameter::eCOLLISION_AXES,
  • PxVisualizationParameter::eCOLLISION_COMPOUNDS,
  • PxVisualizationParameter::eCOLLISION_FNORMALS,
  • PxVisualizationParameter::eCOLLISION_EDGES,
  • PxVisualizationParameter::eCOLLISION_STATIC,
  • PxVisualizationParameter::eCOLLISION_DYNAMIC,
  • PxVisualizationParameter::eJOINT_LOCAL_FRAMES,
  • PxVisualizationParameter::eJOINT_LIMITS,
  • PxVisualizationParameter::eMBP_REGIONS,

Now there is a trap here.

The trap is that if you just set PxVisualizationParameter::eSCALE to 1.0 alone, nothing gets rendered but it still has a performance impact. And it can be invisible to users, until they use a profiler.

For example in PEEL, it means that this particular UI config has a clear negative impact on performance:

Debug viz options for PhysX in PEEL

And here’s the effect on performance in “ConvexGalore2″:

As you can see this is quite significant. The drop on the blue curve corresponds to the moment I unchecked “Enable debug visualization” in the UI. As soon as I did, performance went back to normal.

So, be aware of this, and make sure you don’t ship your game with PxVisualizationParameter::eSCALE still set to 1.0.

PhysX tip: use tight convex bounds

Thursday, January 19th, 2017

There is a new flag in PhysX 3.4 that tells the system to use tight bounds for convexes: PxConvexMeshGeometryFlag::eTIGHT_BOUNDS.

Use it.

Until now PhysX used a fast O(1) piece of code to update the AABB of a convex mesh, from its precomputed local AABB and the current world transform. That one’s a classic from the GD-Algorithms days, and I remember also spotting it in Charles Bloom’s public code years and years ago. If you want the details, see for example this file in the PEEL distro:


And then look up the “Rotate” function, implemented there both for min/max and center/extents AABBs. This stuff is from 2000 or before, nothing new here.

There is no problem with this “Rotate” function: it’s great, we’ve been using it for more than a decade, and we’re still using it.

However, it obviously does not create the tightest possible AABB, since it only knows about the local un-transformed bounds, and it doesn’t know anything about the object contained within these bounds. That’s why for simple shapes like boxes or capsules we don’t use it: instead we directly recompute the sphere or capsule’s bounds from its data and current pose. This is just as fast, but also provides tighter bounds.

For convexes though, we never tried that, because computing tight bounds for them was more tedious. You either need to use brute-force and iterate over all vertices (which is a potentially expensive O(N) approach), or you need to do something more complex like maybe hill-climbing, etc: basically the same stuff as for GJK really, which has pros and cons I won’t go into right now.

Long story short: in PhysX 3.4 we tried the brute-force approach, and found that it significantly improves both performance and memory usage in scenes with piles of convex objects. The number of vertices in a convex is limited to 256 in PhysX, so there’s a limit to how expensive the computation can be, and we found it reasonable (iterating over vertices in a linear array is cache & SIMD friendly, so the worst case isn’t that bad).

Here are some results using PEEL’s “ConvexGalore2″ test scene. It’s a giant pile of convex objects falling, and it looks like this:

Here are the performance results with and without tight bounds. Blue is with the default PhysX settings, red is with PxConvexMeshGeometryFlag::eTIGHT_BOUNDS. There’s a pretty clear winner here:

It also improves memory usage, since the tighter bounds mean less candidate pairs are generated from the  broadphase, etc. This is again a clear win:

Granted: computing the bounds becomes more expensive, and if you have convex objects that don’t ever go close to other convex objects, using the flag might reduce performance overall. But I’d say that all things considered it’s one of these easy flags that you should just use all the time. It’s a small price to pay for potentially big gains. Visually the difference between the regular and tight bounds can be quite dramatic, as you can see for example here:

PhysX tip: cylinder shapes

Thursday, January 19th, 2017

PhysX does not support native, implicit cylinder shapes.

The initial technical reason behind this is clear, and goes back all the way to NovodeX 2.0 around 2003. I was the first NovodeX employee and my job was to implement everything related to collision detection in this engine: broad phase, narrow phase, contact generation, raycasts, etc. We didn’t have a lot of time for research or experimentation, so I only briefly tried GJK / EPA and quickly ran into a wall: these algorithms were just constantly failing for me. Accuracy issues, infinite loops, etc. At the time I was able to reproduce some of these issues in SOLID 3.5 itself, which made me give up: even the reference version was failing and I just had no time to spend on productizing my “research code” (you know… works in the lab, fails in the field).

So instead I went with what I was familiar with, and implemented contact generation using a mixture of SAT and distance tests, with a dedicated function for each possible pairs of shapes. So for example box-vs-box would use a dedicated function to generate box-vs-box contacts, sphere-vs-box would use another function, and so on. With this design the cost of adding a new shape type, such as a cylinder, is high. To support cylinder shapes, one needs to write and add several new contact generation functions: cylinder-vs-box, cylinder-vs-sphere, cylinder-vs-mesh, etc. And none of these functions are trivial, since a cylinder is not the most friendly shape to generate contacts for. On top of that, the same problem exists for “scene queries”, i.e. one needs to write explicit raycast, overlap and sweep queries for cylinders. It’s a lot of non-trivial work.

This basic design (the same as in the old ODE library for example) survived until PhysX 3, for which we finally got the time to pause, sit down and revisit everything (not just collision detection: everything). We started playing with GJK / EPA again, mainly because Havok had been using it since the beginning, and it seemed to give them an edge in performance. New people joined, new code got written. It took a while to iron everything out, but in PhysX 3.4 we were happy enough with the results and finally switched to the new implementation, “PCM” (for Persistent Contact Manifolds), as the default contact generation method. Which means that in theory adding support for cylinders should now be as easy as adding a “support function” for cylinders (a trivial task), and that would be it.

However in practice, as usual, it is not that easy. Because under the hood, our PCM codepath still relies on the previous non-PCM contact generation routines to create the initial contacts, when two shapes initially touch. We do this to avoid issues and artifacts traditionally coming with the PCM territory. There are other ways to deal with those, as explained in the past IIRC by Erwin (from Bullet) and Gino (from SOLID), but that’s again something that always felt a bit experimental / fragile to me. And in any case that’s more code that we currently don’t have and have no time to research at this point.

That isn’t the main problem though. Beyond these implementation details, adding a new shape type still increases the support burden, the amount of code and classes, the potential for bugs, the testing & maintenance cost, etc. The only way to justify that now would be to instead add a generic “user-defined” shape where users would simply define their own support function, and this could be reused not only for cylinders but also for most existing shapes, or new ones like cones, etc - any convex shape. That’s what several other engines do, so that’s a viable option, and that way at least we would get some more gains & benefit for the price we’d pay. But it would still introduce an inconsistency in the API, since these new shapes would only be supported in the PCM codepath. And it could come with a performance cost if we use virtual calls into users’ code to compute a supporting vertex. And we still wouldn’t have a great way to generate the initial contacts for these shapes. And… and… and this is starting to sound a lot like analysis paralysis, which might explain why user-defined shapes are still an item on our TODO list rather than an actual, current feature.

The final nail in the coffin though is simply that we already have a perfectly fine workaround: just use a convex mesh instead.

People often don’t like this suggestion: they object that a convex shape is not going to be smooth enough to roll convincingly, and/or that a convex mesh tessellated enough to roll nicely would create performance problems.

As far as I can tell, both of these claims are incorrect.

I created a few test scenes in PEEL 1.1 to investigate.

The first scene features a single cylinder starting on an inclined slope and rolling away. The test is configurable and users can select the amount of tessellation for the cylinder. At the same time, since it’s PEEL, you can run the same scene with an engine that features “proper” cylinders (like Bullet or Newton - the Havok case is more questionable, see e.g. here). That way you can directly compare the motions and see if the cylinder that uses a convex mesh rolls more or less convincingly than the real thing. For me, with about 60 vertices in the cylinder’s base circle, I cannot tell the difference. But download PEEL 1.1 and see for yourself. (Use “RollingCylinder”, test 27).

Rolling cylinder

Rolling cylinder

Another way to test this is to try one of the vehicle scenes, for example “ArticulatedVehicle” (test 76). In this test the wheels use convex meshes instead of “real” cylinders. You can change the tessellation level in the test’s UI, and see the effect on the motion. You have to pull the vehicle around with the mouse, as you cannot drive it in this test.

Articulated vehicle

Articulated vehicle

If you want to drive, try the “VehicleTest” scene (number 245). Here again the cylindrical wheels are convex meshes. Use the arrow keys to drive the car around. Do you feel that the tessellation produces a bumpy ride? I don’t.

Now, the performance issue. Yes, certainly, using a highly tessellated convex mesh is more expensive than using an implicit shape. But this is only visible within the same engine. Beyond that, in practice, a tessellated convex mesh in PhysX can be just as fast, or faster, than a “real” implicit cylinder in another engine. Again, don’t take my word for it: download PEEL and see for yourself. (for example with “CylinderStack”, test 28). There is no performance problem using a highly tessellated convex mesh for cylinders, because generally speaking there is no performance problem using highly tessellated convex meshes in PhysX.

Cylinder stack

Cylinder stack

Besides, even if there would be a performance issue with cylinders, it still would remain isolated and only affecting cylinders. Unless your game features cylinders exclusively, it would be unlikely to make a dent in the framerate. In an actual game, the extra cost would be lost in the noise. Remember that there’s a lot more than physics going on in a game. Physics may have maybe a CPU budget of 20% of the frame. And then only a limited part of the physics budget is eaten up by contact generation (there’s also the broad phase, the solver, scene queries, etc). And then only a limited part of that contact generation time will be dedicated to cylinders. So even if the support function for a convex mesh is 10 times slower than the support function for an implicit cylinder, at the end of the day it doesn’t make the whole game 10 times slower. The way it is now, the overall performance impact should be quite small.

So in theory, yeah, you have a point. In practice, no, this is just not an issue.

PhysX: articulated vehicle in PEEL 1.1

Wednesday, January 18th, 2017

This video is a variation on a test scene I previously posted. This is now a configurable test in PEEL 1.1.

The chassis, wheels and axles have relatively large mass ratio differences (40 to 1), which makes things difficult for the iterative solver.

However, as explained in the previous video, using an articulation solves the problem. This is what you see in the beginning of this new video: the vehicle uses an articulation, and everything behaves as expected.

This also shows that articulations can emulate hinge joints just fine, by just setting the joints’ limits properly. There are for example 5 articulated hinges in this simple vehicle.

Now if you disable articulations in the UI and restart the scene, it gets created with regular PhysX joints. And as you can see, the results are not as convincing. Out-of-the-box, it behaves badly (it’s easy to bend the wheels, etc).

But there are still various ways to improve things.

First, you could just use more homogeneous masses. It may not be ‘realistic’ but if it doesn’t actually change the driving experience in the game, it doesn’t really matter. In the video I just set all masses to 1.0 to show that things look more stable & robust that way, i.e. the mass difference is really what breaks this particular setup. It does not have to be 1.0: things become equally stable if you use ‘mass = 40′ for everything.

Now, if you don’t want to change the masses, the usual tricks can be used. Here I just show one of them: create the constraints multiple times. In the video I show what happens if you create them 8 times: while not becoming as robust as articulations, the joints still become visibly more robust.

You could also create them more than 8 times, or increase the solver iteration counts for the vehicle, and so on. Basically there are multiple ways to make such a scene work fine.

PEEL 1.1

Wednesday, January 18th, 2017

PEEL 1.1 has been released. You can grab it here:

Here’s a copy of the release notes:

* January 2017: v1.1 - the PhysX 3.4 issue.

This new version celebrates the official release of PhysX 3.4 and its built-in support for GPU rigid bodies. There is now a “Use GPU” checkbox in the PhysX 3.4’s options panel. Just check that box to run the scenes on the GPU. It uses CUDA, so you need an Nvidia graphics card for this to work - and a recent one for good performance. Otherwise the regular CPU version is of course still the default, and it runs everywhere.

Many changes have been made over the last two years. Some of them are listed below:

PEEL app:

  • implemented smoother camera motion (when using cursor keys).
  • added support for prismatic joint limits.
  • added support for distance joints (only exposed in PhysX so far).
  • added support for cylinder shapes.
  • added support for articulations (PhysX).
  • added support for aggregates (PhysX). This improves performance of ragdoll scenes quite a bit.
  • added initial/experimental support for vehicles.
  • various test scenes have been added or revisited.
  • introduced support for configurable tests (per-test UI dialog). As a result, some similar-looking tests have been merged together.
  • added tool-specific UI to edit tool-specific parameters. For example the picking force can now be modified in the main UI.
  • added camera tracking tool.
  • added support for per-test keyboard controls etc.
  • improved wireframe overlay rendering (’X’ key). Now enabled by default.
  • new Pint caps have been added, as well as comments explaining their role better.
  • new test categories have been added.
  • tests can now access the font renderer to draw some debug text on screen. See for example the “AngularVelocity” test in the API category.
  • tests can now access the debug renderer to draw some debug data on screen. See for example the “COMLocalOffset” test in the API category.
  • replaced “ComputeWorldRay” with its ICE counterpart to fix some accuracy & performance issues in the raytracing test. The raytracing test became an order of magnitude faster after that.

Stats plugin:

  • now displays gravity vector.
  • added support for articulations & aggregates.


  • added new PINT plugin for “Bullet3-2.85.1″ (from Oct 15, 2016).
  • Bullet3-2.85.1: exposed more solver settings to the UI for this version. I also set the default “erp” value to 0.8 for this plugin, which improves the behavior a lot in jointed scenes like for example FixedJointsTorusStressTest. In previous Bullet plugins however I kept Bullet’s own default value (0.2) because increasing it to 0.8 creates “dancing stacks” (see e.g. LargeBoxStack30).
  • Bullet3-2.85.1: for this version I used 4 solver iterations by default, since this is what PEEL also uses for other physics engines. It should then be a little bit faster than before, since we used 5 iterations by default in the previous Bullet plugins (2.79 to 2.82). However I noticed that Bullet’s own default value is now 10 iterations. Not sure if this is really necessary. Play with the settings until you find a good compromise.
  • constraints are now properly released.
  • collision margin has been reduced. That way the ConvexStack2 test doesn’t explode anymore.
  • support for cylinders has been added.


  • Newton’s PINT plugin has been revisited by Julio Jerez.
  • sleeping can now be deactivated (except in 3.14).
  • the per-test gravity vector is now taken into account (it was previously ignored).
  • we can now use Newton 3.9 and 3.13 at the same time: I just switched to static libs for 3.13.
  • the 3.13 libraries have been updated. It is now using the stable version from Jun 17, 2015.
  • added new PINT plugin for Newton 3.12 (from May 24, 2015).
  • added new PINT plugin for Newton 3.14 (trunk from Jan 3, 2017).


  • better tracking of memory usage.
  • started to share code between all Havok plugins.
  • exposed more parameters to UI.
  • initial (incomplete) support for “articulations”. Currently disabled because it crashes in some scenes.
  • support for cylinders has been added.
  • support for kinematics has been enabled in old versions of Havok.
  • free versions of Havok are not available anymore (see so I will not be able to add and test newer versions in PEEL.

PhysX 3.x:

  • the UI has been unified for all 3.x PINT plugins except 3.1.
  • new parameters have been exposed to the UI.
  • default contact offset value has been increased (decreasing performance here and there but otherwise increasing stability).
  • max angular velocity has been properly setup to a higher value in all PhysX plugins. The default value in PhysX is quite small, which makes some tests fail and perhaps give the wrong impressions.
  • added some quick preconfigured settings to tweak PhysX for either performance or accuracy.
  • the PhysX allocator (on the PINT side) is now thread-safe.

PhysX 3.4:

  • the PCM regression (in PEEL 1.01) has been fixed.
  • the overlap/sweep performance regressions (in PEEL 1.01) have been fixed.
  • the 3.4 architecture has changed significantly since the PEEL 1.01 version, to make the code compatible with GPU rigid bodies.

PhysX: long stable rope example in PEEL 1.1

Tuesday, January 17th, 2017

This test from the soon-to-be-released PEEL 1.1 uses the previously mentioned rope tricks to create a long, stable rope in PhysX.

This scene uses regular joints, no articulations.

There are 256 elements (spheres) in the rope. Each sphere has a mass of 1. The box at the end of the rope has a mass of 100.

Contrary to what some people said, this can work just fine in PhysX. cialis