Physics benchmarks for dummies

May 3rd, 2015

(This is a copy of PEEL’s User Manual’s Appendix A. I am re-posting it here since people rarely bother reading docs anyway)

Benchmarking on PC is a black art. Benchmarking physics engines is even harder. Use the following notes to avoid the most basic mistakes.

Use the proper power options.

This is typically found in Control Panel => System and security => Power Options. Select the “High performance” power plan. Running benchmarks with the “Balanced” or “Power saver” plans produces unreliable results.

Close all programs except PEEL. Unplug the internet.

Do not let programs like Outlook, Winamp, antivirus software, etc, run in the background. They can start random tasks at random times that will interfere with your benchmarks.

Ideally, start the Task Manager and kill all unnecessary processes. There are so many here that listing them all is impossible, but with some experience you should be able to know which ones can be killed, and which ones are worth killing.

It is of course very tedious to do this each time. So ideally you would take a radical step and use a dedicated PC with a fresh Windows installation and no internet connection. That is exactly what I do, and PEEL’s benchmark results at home are a lot more stable than PEEL’s benchmark results at work. Even when I do unplug the internet cable on the work PC…

Be aware of each engine’s “empty” operating overhead.

In theory, when you run a physics update on an empty scene, all engines should take the same amount of time, i.e no time at all since there is nothing to do.

In practice, of course, this is not the case. PEEL’s first test scene measures this operating cost.

Avoid benchmarks with just one object.

As a consequence, avoid running benchmarks with just a few objects or even a single object. The simulation time for just one object is likely to be lower than the engine’s empty operating overhead, because the main internal algorithms are usually a lot more optimized than the glue code that connects them all together. Thus, such benchmarks actually measure this operating overhead more than anything else. While it is an interesting thing to measure, it does not reflect the engines’ performance in real cases: the empty overhead is a constant time cost which is going to be lost in the noise of an actual game.

Thus, for example, it would be very wrong to run a benchmark with a single object and conclude that “engine A is faster than engine B” based on such results.

Try small scenes and large scenes.

Not all engines scale well. Some engines may be faster with small scenes, but collapse completely with large scenes – because large scenes have a tendency to expose O(N^2) parts of an engine.

Traditionally it is wise to “optimize for the worst case”, so benchmarks involving large scenes tend to have a higher weight than those involving small scenes. Note that “small” and “large” are vague terms on purpose: a large scene in a game today might be considered a small scene in a game tomorrow. And at the end of the day, if it is fast enough for your game, it does not matter that an engine does not scale beyond that. It may matter for your next game though.

The point is: here again it is difficult to conclude from a limited set of benchmarks that “engine A is faster than engine B”. You may have to refine your conclusions on a case-by-case basis.

Be aware of sleeping.

Virtually all physics engines have “sleeping” algorithms in place to disable work on non-moving, sleeping objects.

While the performance of an engine simulating sleeping objects is important, it is usually not the thing benchmarks should focus on. In the spirit of optimizing the worst case again, what matters more is the engine’s performance when all these objects wake up: they must do so without killing the game’s framerate.

Thus, PEEL typically disable sleeping algorithms entirely in its benchmarks, in order to capture the engines’ ‘real’ performance figures. Unfortunately some physics engines may not let users disable these sleeping mechanisms, and benchmarks can appear biased as a result – giving an unfair advantage to the engines that put all objects to sleep.

Obviously, concluding that engine A (with sleeping objects) is faster than engine B (with non-sleeping objects) is foolish. Keep your eyes open for this in your experiments and benchmarks.

Be aware of solver iteration counts.

Most physics engines have a fast iterative solver that uses a default number of iterations. That default value may be different in each engine. For fair comparisons, make sure compared engines use the same number of iterations.

Alternatively, tweak the number of iterations in each engine until they all use roughly the same amount of time, then check which one produces the best simulation quality for the same CPU budget.

If a complex scene e.g. with joints does not work well by default in engine A, but works well with engine B, think about increasing the number of iterations for engine A. It might make it work while still remaining cheaper overall than engine B. And so on.

Comparing how engines behave out-of-the-box, with their default values, is only the tip of the iceberg.

Artificial benchmarks are not an actual game.

What works in the lab does not always work in the field. A good result in an artificial benchmark may not translate to a similarly good result in the final game. Good results in artificial benchmarks are just hints and good signs, not definitive conclusions. Take the results with the proverbial grain of salt.

Benchmarks are often artificial because they capture situations that would not actually happen in a game. At the same time, situations that would actually happen in a game often aren’t complicated enough to expose significant differences between engine A and engine B, or they are too complicated to recreate in a benchmark environment.

Similarly, physics usually only takes a fraction of the game’s frame. Thus, if engine A is “2X faster” than engine B in benchmarks, it does not mean that using engine A will make your game 2X faster overall. If your physics budget is 5% of the frame, even if you switch to an incredible physics engine that takes absolutely no time, you still only save 5% of the game’s frame. Thus, it might actually be reasonable and acceptable to switch to a slower engine if it offers other benefits otherwise (better support, open source, etc).

Benchmarks are never “done”.

There is always some possible scenario that you missed. There is always a case that you did not cover. There is maybe a different way to use the engine that you did not think about. There is always the possibility that an engine shining in all available benchmarks performs poorly in some other cases that were not captured.

There are more than 300 tests in PEEL, and still it only scratches the surface of what supported physics engines can do. Already though, in the limited set of available tests, no single engine always ends up “fastest”. Sometimes engine A wins. Sometimes engine B wins.

PEEL 1.01

April 7th, 2015

Version 1.01 has been released. Download link.

Release notes:

* april 2015: v1.01 - the Bullet single-triangle-mesh issue

- the Bullet plugin was crashing or behaving oddly in all scenes featuring the “single triangle” mesh. This has been fixed. The reason was that the triangle’s data was not persistent (contrary to what happens for other meshes), and since Bullet does not copy the data, bad things happened. It looks like all the other engines copy the data, since they were working fine. Thanks to Erwin Coumans for figuring out the root of the problem.

- Opcode2 plugins will not crash anymore in raycast scenes without meshes (they won’t do anything though).

PEEL - public release 1.0

April 4th, 2015

I am very happy to announce the first public release of PEEL - the Physics Engine Evaluation Lab.

I briefly mentioned it on this blog already, here.

Source code is included for the main program and most of the PINT plugins. That way you can create your own test scenes and check that everything is done correctly, and benchmarks are not biased.

Pre-compiled binaries for most of the plugins are provided, for convenience. Some of the binaries (in particular Havok plugins) have been removed, since it is unclear to me if I can distribute them or not. On the other hand some plugins are currently only available as binaries (Opcode2, ICE physics..).

Please refer to PEEL’s user manual and release notes for more information.

Have fun!

Download link

(As usual, the bitcoin tip jar is here if you like what you see :))

Googling your own name…

March 7th, 2015

According to the Internet, I am one of the most legendary scene coders but I write unreadable code. LOL.

I totally need to put that on a business card :)

PhysX is Open Source (EDIT: or is it?)

March 5th, 2015

Note that contrary to what the post says, this is only the second best version (3.3.3). We are currently working on 3.4, which already contains significant changes and significant speedups (for example it includes Opcode2-based mesh collision structures, which provides faster raycasts, overlap and sweep queries). I think we will eventually open source 3.4 too, when it is released.


I’ve been reading the internet and receiving private emails after that. Apparently what I wrote is technically wrong: it is not “Open Source” because it does not have a proper open source license, it comes with a EULA, etc.

I notice now that both NVIDIA’s press release (above) and EPIC’s (here) are carefully worded. They actually never say “Open Source”, or even “open source”. They just say things like:

“NVIDIA opens PhysX code”

“PhysX source code now available free”

“The PhysX SDK is available free with full source code”

The weird thing then, is that many internet posts do the same mistake as I did, and present the news as if PhysX was indeed “Open Source:

(etc, etc)

Why is everybody making this mistake, if indeed none of the official press releases actually said that?

I’ll tell you why.

That’s because the distinction between “NVIDIA opens PhysX source” and “PhysX is open source” is so subtle that only pedantic morons misguided souls would be bold enough to complain about it when given something for free.

Give them a finger, they’ll take the whole hand, and slap you with it.

I have the feeling this is the only industry where people are so insane and out of touch with reality. You’ve given a free Porsche and then you complain that it is not “really free” because you still need to respect the “strings attached” traffic code. Hello? Just say “thank you”, enjoy what you’re given, or go buy a Ferrari if you don’t like Porsche. Jeeez.

Contact generation for meshes

February 18th, 2015

Here is another paper I wrote some time ago. If you are struggling with invalid contacts against internal edges in your rigid body simulation, this one might help.

As usual, the bitcoin tip jar is here if you like what you read :)

Zero-byte BVH

January 30th, 2015

I wrote this last year. Enjoy.

As usual, the bitcoin tip jar is here if you like what you read :)


“Any sufficiently advanced technology is indistinguishable from magic” :)

More random PhysX stuff

January 16th, 2015

If a game uses PhysX, it does not mean you will notice it. It might not have obvious PhysX effects in it. It might not have cloth, or PhysX particles, or water effects, etc.

These are just flashy/trendy effects that are easy to advertise/sell/etc for the marketing people. The kind of stuff that gamers care about, maybe. But this is not the important part.

The most important part is the one that you don’t see. The one that makes you game playable at all.

When you fire a gun in a FPS, that’s PhysX (”raycast single” scene queries).

When NPCs/AI see you, that’s PhysX (”raycast any” scene queries).

When many NPCs properly interact and avoid going through each-other, that’s PhysX (broadphase).

When your character simply moves in the level, that’s PhysX (”sweep” scene queries, or even PhysX’s character controller).

It’s not just about ragdolls or particle effects. PhysX is also there supporting the invisible foundation upon which everything else is built.

While I’m at it….

October 15th, 2014

“Interestingly enough, on desktop-like platforms, PhysX3 is faster than Box2D”

Random PhysX stuff

October 14th, 2014

So, yes, PhysX 3 is well optimized (and PhysX 3.4 even more). But I’d like to go back to the old “2.8.4 is crippled” myth, since this is mentioned here again (”PhyX 2.x was garbage because a ton of sites outed nVidia for using x87 and lacking multi-threading on the old CPU code”).

This is not what happened, at all. I’m afraid you’ve been spoon-fed utter bullshit by websites that love an easy dramatic headline. I suppose it makes a good story to reveal nasty things about “big” companies like MS, Google, Apple, Nvidia, whatever. But the reality behind it here is terribly mundane and ordinary. There is no story. There is no conspiracy. There is no evil plan to cripple this or that.

NovodeX (on which PhysX was initially based) was written by Adam and me. The code was super optimized, to the best of my knowledge at the time. But it did not support SIMD or multi-threading. At that time, none of us knew how to write SSE code, and I also had no experience with multi-threading. Also, IIRC SSE2 was not widely supported (only SSE). From our perspective the gains from SSE seemed limited, using SSE2 would have made the SDK incompatible with most of our users’ machines, and we simply didn’t have the time or resources to learn SSE, support SIMD and non-SIMD versions, etc.

Then came Ageia. In the beginning, we had to make the SDK feature-complete before thinking about making it run faster. NovodeX did not even support convex objects! And that’s the first thing I had to implement in the post-NovodeX days. NovodeX was the fusion of two hobby projects from two independent developers. In a number of ways the result was still just that: a hobby project. We loved it, we worked very hard on it, but we basically had no customers and no actual games making actual requests for random features that are actually useful in games. This all changed when the hobby project became PhysX under Ageia. That’s when it became an actual product. An actual middleware. With actual customers actually using the thing in actual games. Especially when it got picked up by Epic and integrated in the Unreal engine. Suddenly we got TONS of bug reports, feedback, feature requests, you name it. Tons of stuff to do. But as far as I can remember nobody asked for “the SSE version”, and so none of us worked on it. There was no time for that, no incentive to worry about it, and we still didn’t have a SIMD expert at the time anyway. We briefly looked at the new compiler options in MSVC (/SSE2, etc) but the gains were minimal, maybe 15 to 20% at best. If you believe that recompiling with such a flag will magically make your code run 4X faster, I am sorry but you are just a clueless fool misguided soul. At the time, with the available compiler, we never saw more than 20% in the very best of case. And most of the time, for actual scenes running in actual games, we saw virtually no gains at all. Enabling the flag would have given us marginal gains, but would have increased our support burden significantly (forcing us to provide both SIMD and non-SIMD versions). It would have been stupid and pointless. Hence, no SIMD in PhysX2. Simple as that.

For proper SIMD gains you need to design the data structures accordingly and think about that stuff from the ground up, not as an afterthought. And this is exactly what we did for PhysX3. After making PhysX2 stable and complete enough, after making it a real, useable, feature-complete middleware, it was finally time to think about optimizations again. It took time to make the software mature enough for this to be even on the roadmap. It took time for us (and for me) to actually learn SIMD and multi-threading. It took time for compilers to catch up (/SSE2 on latest versions of MSVC is way way way better and produces way more efficient code than what it did when we first tried it). It took time for SSE2 support to spread, and be available in all machines (these days we only have a SIMD version - there is no non-SIMD version. It would have been unthinkable before). And still, even after all this happened, a better algorithm, or better data structures or less cache misses, still give you more gains than SIMD. SIMD itself does not guarantee that your code is any good. Any non-SIMD code can kick SIMD code’s ass any day of the week if SIMD code is clueless about everything else. Anybody claiming that PhysX2 is “garbage” because it doesn’t use SIMD is just a ridiculous moron (pardon my French but hey, I’m French) clearly not a programmer worth his salt (or not a programmer at all for that matter).

So there was no crippling. The old version is just that: old. The code I wrote 10 years ago, as fast as it appeared to be at the time, is not a match for the code I write today. Opcode 2 (which will be included in PhysX 3.4) is several times faster than Opcode 1.3, even though that collision library is famous for its performance. It’s the same for PhysX. PhysX 2 was faster than NovodeX/PhysX 1. PhysX 3 is faster than PhysX 2. We learn new tricks. We find new ideas. We simply get more time to try more options and select the best one.

As the guy in the article says, PhysX3 is so fast that it changed his mind about the whole GPU Physics thing. Does that sound like we’re trying to promote GPU Physics by crippling PhysX3? Of course not. And in the same way we did not try to promote Ageia Physics by crippling PhysX2. We were and we are a proud bunch of engineers who love to make things go fast - software or hardware.

EDIT: I forgot something. Contrary to what people also claim, PhysX works just fine on consoles and it is a multi-platforms library. That is, writing SIMD is not as easy as hardcoding a bunch of SSE2 intrinsics in the middle of the code. It has to be properly supported on all platforms, including some that do not like basic things like shuffles, or do not support very useful instructions like movemask. Converting something to SIMD means writing the converted code several times, possibly in different ways, making sure that the SIMD versions are faster than their non-SIMD counterparts on each platform - which is not a given at all. It takes a lot of time and a lot of effort, and gains vary a lot from one platform to the next. cialis