May 3rd, 2015
(This is a copy of PEEL’s User Manual’s Appendix A. I am re-posting it here since people rarely bother reading docs anyway)
Benchmarking on PC is a black art. Benchmarking physics engines is even harder. Use the following notes to avoid the most basic mistakes.
Use the proper power options.
This is typically found in Control Panel => System and security => Power Options. Select the “High performance” power plan. Running benchmarks with the “Balanced” or “Power saver” plans produces unreliable results.
Close all programs except PEEL. Unplug the internet.
Do not let programs like Outlook, Winamp, antivirus software, etc, run in the background. They can start random tasks at random times that will interfere with your benchmarks.
Ideally, start the Task Manager and kill all unnecessary processes. There are so many here that listing them all is impossible, but with some experience you should be able to know which ones can be killed, and which ones are worth killing.
It is of course very tedious to do this each time. So ideally you would take a radical step and use a dedicated PC with a fresh Windows installation and no internet connection. That is exactly what I do, and PEEL’s benchmark results at home are a lot more stable than PEEL’s benchmark results at work. Even when I do unplug the internet cable on the work PC…
Be aware of each engine’s “empty” operating overhead.
In theory, when you run a physics update on an empty scene, all engines should take the same amount of time, i.e no time at all since there is nothing to do.
In practice, of course, this is not the case. PEEL’s first test scene measures this operating cost.
Avoid benchmarks with just one object.
As a consequence, avoid running benchmarks with just a few objects or even a single object. The simulation time for just one object is likely to be lower than the engine’s empty operating overhead, because the main internal algorithms are usually a lot more optimized than the glue code that connects them all together. Thus, such benchmarks actually measure this operating overhead more than anything else. While it is an interesting thing to measure, it does not reflect the engines’ performance in real cases: the empty overhead is a constant time cost which is going to be lost in the noise of an actual game.
Thus, for example, it would be very wrong to run a benchmark with a single object and conclude that “engine A is faster than engine B” based on such results.
Try small scenes and large scenes.
Not all engines scale well. Some engines may be faster with small scenes, but collapse completely with large scenes – because large scenes have a tendency to expose O(N^2) parts of an engine.
Traditionally it is wise to “optimize for the worst case”, so benchmarks involving large scenes tend to have a higher weight than those involving small scenes. Note that “small” and “large” are vague terms on purpose: a large scene in a game today might be considered a small scene in a game tomorrow. And at the end of the day, if it is fast enough for your game, it does not matter that an engine does not scale beyond that. It may matter for your next game though.
The point is: here again it is difficult to conclude from a limited set of benchmarks that “engine A is faster than engine B”. You may have to refine your conclusions on a case-by-case basis.
Be aware of sleeping.
Virtually all physics engines have “sleeping” algorithms in place to disable work on non-moving, sleeping objects.
While the performance of an engine simulating sleeping objects is important, it is usually not the thing benchmarks should focus on. In the spirit of optimizing the worst case again, what matters more is the engine’s performance when all these objects wake up: they must do so without killing the game’s framerate.
Thus, PEEL typically disable sleeping algorithms entirely in its benchmarks, in order to capture the engines’ ‘real’ performance figures. Unfortunately some physics engines may not let users disable these sleeping mechanisms, and benchmarks can appear biased as a result – giving an unfair advantage to the engines that put all objects to sleep.
Obviously, concluding that engine A (with sleeping objects) is faster than engine B (with non-sleeping objects) is foolish. Keep your eyes open for this in your experiments and benchmarks.
Be aware of solver iteration counts.
Most physics engines have a fast iterative solver that uses a default number of iterations. That default value may be different in each engine. For fair comparisons, make sure compared engines use the same number of iterations.
Alternatively, tweak the number of iterations in each engine until they all use roughly the same amount of time, then check which one produces the best simulation quality for the same CPU budget.
If a complex scene e.g. with joints does not work well by default in engine A, but works well with engine B, think about increasing the number of iterations for engine A. It might make it work while still remaining cheaper overall than engine B. And so on.
Comparing how engines behave out-of-the-box, with their default values, is only the tip of the iceberg.
Artificial benchmarks are not an actual game.
What works in the lab does not always work in the field. A good result in an artificial benchmark may not translate to a similarly good result in the final game. Good results in artificial benchmarks are just hints and good signs, not definitive conclusions. Take the results with the proverbial grain of salt.
Benchmarks are often artificial because they capture situations that would not actually happen in a game. At the same time, situations that would actually happen in a game often aren’t complicated enough to expose significant differences between engine A and engine B, or they are too complicated to recreate in a benchmark environment.
Similarly, physics usually only takes a fraction of the game’s frame. Thus, if engine A is “2X faster” than engine B in benchmarks, it does not mean that using engine A will make your game 2X faster overall. If your physics budget is 5% of the frame, even if you switch to an incredible physics engine that takes absolutely no time, you still only save 5% of the game’s frame. Thus, it might actually be reasonable and acceptable to switch to a slower engine if it offers other benefits otherwise (better support, open source, etc).
Benchmarks are never “done”.
There is always some possible scenario that you missed. There is always a case that you did not cover. There is maybe a different way to use the engine that you did not think about. There is always the possibility that an engine shining in all available benchmarks performs poorly in some other cases that were not captured.
There are more than 300 tests in PEEL, and still it only scratches the surface of what supported physics engines can do. Already though, in the limited set of available tests, no single engine always ends up “fastest”. Sometimes engine A wins. Sometimes engine B wins.