Archive for May 11th, 2013

The evolution of PhysX (3/12) - Rigid bodies (convex stacks)

Saturday, May 11th, 2013

We continue with stacks, this time stacks of convex objects. This is where things start to become interesting. There are multiple things to be aware of here:

  • Regular contact generation between boxes is a lot easier (and a lot cheaper) than contact generation between convexes. So PCM-based engines should take the lead here in theory.
  • But performance depends a lot on the complexity of the convex objects as well.
  • It also depends whether the engine uses a “contact cache” or not.

Overall it’s rather hard to predict. So let’s see what we get.

We have two scenes here, for “small” and “large” convexes. Small convexes have few vertices, large convexes have a lot of vertices. The number of vertices has a high impact on performance, especially for engines using SAT-based contact generation.

—-

The first scene (“ConvexStack”) uses the small convexes. Each PhysX version is again a bit faster than the previous one, so that’s good, that’s what we want.

There doesn’t seem to be any speed difference between PCM and regular contact generation. In fact PCM has even a slightly worse worst case – maybe the overhead of managing the persistent manifolds. I suppose it shows that SAT-based contact generation is a viable algorithm for small convexes, which is something we knew intuitively.

Now, 3.3 is about 2X faster than 3.2 on average, even when they both use SAT. The perf difference here probably comes from two things: 3.3 uses extra optimizations like “internal objects” (this probably explains the better worst case) and it also uses a “contact cache” to avoid regenerating all contacts all the time (this probably explains the better average case).

As for 2.8.4 and Bullet, they’re of similar speed overall, and both are significantly slower than 3.3. We see that 2.8.4 has the worst worst case of all, which is probably due to the old SAT-based contact generation used there – it lacked a lot of the optimizations we introduced later.

As for box stacks the initial frame is a lot more expensive than subsequent frames. Contrary to box stacks though, I think a lot of it is due to the initial contact generation. Engines sometimes use temporal coherence to cache various data from one frame to the next, and that’s why the first contact generation pass is more expensive. It should be pretty clear on a scene like this, where all objects are generating their initial contacts in the same frame, and then they don’t move much anymore. This is not a very natural scenario (most games use “piles” of debris rather than “stacks” of debris), but those artificial scenes are handy to check your worst case.

—-

The second scene (“ConvexStack3”) uses large convexes. Now look at that! Very interesting stuff here.

Each PhysX version is again faster than the previous one, that’s good. But the differences are massive now, with PhysX 3.3 an order of magnitude faster than PhysX 2.8.4. And also almost 3X faster than 3.2. Well it’s nice to see that our efforts paid off.

In terms of PCM vs SAT, it seems pretty clear that PCM gives better performance for large convexes. We see this with PhysX 3.3, but also very clearly with Bullet. It is the first time so far that Bullet manages to be faster than PhysX. It is not really a big surprise: SAT is not a great algorithm for large convexes, we knew that.

On the other hand what we didn’t know is how much slower 2.8.4 could be, compared to the others. I think it is time to upgrade to PhysX 3.3 if you are using large convexes.

Another thing to mention is the memory usage for 3.2. We saw this trend before, and it seems to be worse in this scene: 3.2 is using more memory than it should. The issue has been fixed in 3.3 though.

The evolution of PhysX (2/12) - Rigid bodies (box stacks)

Saturday, May 11th, 2013

So let’s start with some basic rigid body simulation: stacks, piles, and mesh collisions. There are two versions of PhysX 3.3 included in these rigid body tests: one using the regular contact generation, and one using the “PCM” method, introduced in 3.3. PCM stands for Persistent Contact Manifold and this algorithm builds contact manifolds iteratively, over several frames. The regular contact generation, on the other hand, builds a full manifold each frame. Both algorithms have pros & cons. Note that as far as I know, both Bullet and Havok have been using the PCM approach for a long time. Anyway the PhysX 3.3 version using PCM is the one marked with “(PVD)” (for completely uninteresting reasons that are too long to explain).

(Click on each screenshot to enlarge them)

—-

The first scene (“LargeBoxStack30”) is a classical box stack, with a 30-boxes-large base. This isn’t terribly useful per-se in a game, but this is a standard test for stability and performance of physics engines.

I am only testing fairly mature engines here so there is little surprise: they all handle this quite well. In terms of performance we see that each successive version of PhyX is a little bit faster than the previous one, which is what we ideally expect to see in all tests. Bullet gives decent results but remains slower than all PhysX versions - and in particular up to 2X slower than PhysX 3.3/PCM.

Nothing very interesting yet in terms of memory usage, all libraries are close, which is expected for such a simple scene.

If you look at the graph closely you will see that a whole bunch of things seem to happen in the first frame, which takes significantly more time to execute than subsequent frames. This is due both to cache effects (nothing is in the cache the first time), and mainly to the initial creation of various scene-related structures (broadphase, etc). This concerns all engines and it only affects the first simulation frame, so we can ignore them in this case.

One interesting thing to note is that this box stack eventually collapses after a certain time, for all engines except PhysX 2.8.4. This is not because PhysX 2.8.4 has more stable algorithms. This is only because it uses the “adaptive force” feature, which was a rather crude hack we used to increase the stability of stacks. It is still available in PhysX 3.x, but we found out it created various side-effects in various cases. Since large stacks aren’t very common in games, we just disabled this feature by default.

Finally, it is worth noting that we have several R&D projects in the pipe, making this stack (and much larger ones) completely stable. I may release a small proof-of-concept demo later, it’s good stuff.

—-

The second scene (“MediumBoxStacks20”) is quite similar, there are now 10 stacks, each of them with a 20-boxes-large base.

Contrary to the larger stack seen before, those stacks don’t collapse in any of the involved engines. Compared to the first one, this test captures performance issues in engines that have a high per-simulation island cost (since there are 10 islands here instead of 1). This does not actually affect any of the engines tested here though, and the profile and performance ratios for this test look similar to the first one. No big surprise there but one must always double-check instead of assuming, right?

In terms of memory usage, we see that PhysX 3.2 is less efficient than the other engines, but the numbers look quite similar otherwise. Comparing memory usage is not always easy since some engines may pre-allocate various buffers to avoid runtime memory allocations, producing an artificially high memory usage number – i.e. they allocate memory that they do not actually need yet. In any case the numbers are roughly similar here. No big winner.

The evolution of PhysX (1/12) - PEEL

Saturday, May 11th, 2013

PEEL is a tool designed to evaluate, compare and benchmark physics engines - it’s “PEEL” for “Physics Engine Evaluation Lab”.

In a way, it is very similar to the old PAL project (Physics Abstraction Layer). But PEEL supports several things that were missing from PAL.

It was initially written to compare PhysX versions between each-other, and catch performance regressions. One of the recurrent questions we got about PhysX 3.x was “how much faster is it than 2.x?”. PEEL was originally built to easily answer this question. However it quickly became clear that adding support for entirely different engines was easy. And so, I did just that.

There is a virtual interface called “PINT” (for Physics INTerface), and an implementation of this interface for each physics engine. A number of tests (more than 250 so far) have been written, and they all talk to this interface. As a result, the same test can run on an arbitrary number of physics engines, as long as they properly implement the interface. This can also be used as a complement to PhysX’ migration guide, for people porting 2.x apps to 3.x: you can just look up how the interface is implemented in both.

At time of writing, PEEL supports:

  • Bullet 2.79
  • Bullet 2.81
  • Havok 6.6.0
  • Havok 2011_3_0
  • Havok 2011_3_1
  • Havok 2012_1_0
  • Havok 2012_2_0
  • ICE Physics
  • NovodeX 2.1.1
  • Opcode 1.3
  • Opcode 2.0
  • PhysX 2.8.4
  • PhysX 3.1
  • PhysX 3.2
  • PhysX 3.3 (various branches of it)
  • GRB (GPU rigid bodies)

PEEL uses all those physics engines and collision libraries in the same application. This was in itself quite an interesting engineering challenge, since you can’t just link to different versions of the same engine without running into compile and/or link errors about ambiguous or redundant symbols, function names, etc. Namespaces don’t really help when different versions of the same lib use the same namespace - or have the same DLL names, for that matter.

To make this work, each engine ended up in its own PEEL-related DLL (containing the implementation of the PINT interface), and each of these “PINT DLLs” is a plug-in for the PEEL application (reminiscent of the format plug-ins from Flexporter, same story).

For physics engines providing static libs, like Bullet, the whole Bullet-related code ends up in a PINT DLL, and that’s it.

For physics engines providing their own DLLs, like PhysX, the PhysX PINT wrapper ends up in a PINT DLL, which in turn loads the PhysX DLLs. Delay loading is used to be able to rename the PhysX DLLs, e.g. in order for the PhysX 3.2 and PhysX 3.3 DLLs to have different names.

The tests cover a wide range of features. There are API tests to just check how the basic API works. There are performance tests. There are behavior tests. There are tests for rigid body scenes, joints, CCD, raycasts, sweep tests, overlap tests, memory usage, for corner cases or degenerate cases, for multi-threaded or single-threaded simulations, etc. There is even a raytracing test rendering the scene from the current camera viewpoint, using the engines’ raycasting functions. This is what I meant above when I said that PEEL supported more things than PAL.

An interesting feature is that all engines run at the same time. All simulations are performed for all selected engines, in the same PEEL frame. Results are rendered on screen with a different color for each engine, making it extremely easy to spot divergences in behavior. Running all engines at the same time also helps to keep the performance numbers “realistic”. In a real game you never run just the physics engine and simple graphics, you have the whole game behind, and a number of subsystems using up resources, trashing the cache, etc. Running all physics engines at the same time replicates this to some extent. In any case, it is always possible to run one engine at a time by just selecting a single one in the initial selection dialog.

Excel graphes with benchmark results can be saved by simply pressing a key. PEEL also supports simple scripts that can run desired tests and save the results automatically.

Each engine has a dedicated UI dialog for options & settings. A variety of “actions” are implemented (picking, shooting new objects, applying impulses to existing objects, etc) to let you interact with the scene and double check that things behave as expected.

—-

So, now that the stage is set, let’s see what PEEL reveals about PhysX. The following results are for PhysX 2.8.4, PhysX 3.2 and PhysX 3.3. For all these versions I just grabbed the code from their respective trunks at the time of writing. It may or may not exactly map to an officially released build of PhysX, but in any case the results should be reliable enough to give a rough idea of how the library evolved from one version to another.

I felt compelled to also include the last version of Bullet (2.81), to provide an objective external reference. The thing is, there are many people out there who still think that “PhysX is not optimized for CPU”, “PhysX does not use SIMD”, “PhysX is crippled on purpose”, and so on. So providing performance graphes for just PhysX would not prove much to them, and they could still pretend that all versions are slow anyway. Thus, adding a well-known engine like Bullet – which is often proposed as an alternative to PhysX by the same naysayers – seemed like a good reality check. (I’m talking to you, the guy who wrote “if you’re going cpu, bullet is much more optimized there“).

I have been very fair here, and recompiled the library with the same optimization flags as PhysX (in fact I even reported on the Bullet forums that the default compile options were not optimal). I also wrote PEEL’s Bullet plug-in as best as I possibly could, but admittedly I may not be a Bullet expert. So I will release the Bullet plug-in source code later, so that you can double-check if I did something wrong.

I could not include the Havok results since, well, it is forbidden to publish them according to their license. Maybe they are afraid of something here, I don’t know.

In any case, all the graphes in the following posts capture the performance of single-threaded mode. Time units are K-Cycles. Sleeping is disabled for all engines.

shopfr.org cialis