New broad-phase demo

We recently had a friendly-yet-furious internal competition at work, where the goal was to create the fastest broad-phase algorithm ever.

We slowly reached the limits of what we can do with a 3-axes sweep-and-prune (SAP), and so we were trying to finally move away from it. This is not very easy, as SAP is very efficient in typical game scenes. Unfortunately as the total number of objects in games increase, and game worlds become more and more dynamic, the SAP performance increasingly becomes an issue.

Several people attacked the problem with several (and very different) approaches. I submitted two entries: one was the multi-SAP (MSAP) broad-phase I previously mentioned; the other one is something I probably cannot talk about yet, so let’s just call it “broad-phase X” for now. Long story short: this last algorithm won.

Out of curiosity, and since I had done it before with MSAP already, I tried it out in Bullet’s “CDTestFramework” to compare it against Bullet’s algorithms. On my machine the new broad-phase is a little bit more than 4X faster than any of Bullet’s versions, and more than 2X faster than MSAP. Of course the relative performances of all those algorithms vary with the total number of objects in the scene, how many of them are static, how many of them are moving, how fast they are moving, etc. So there isn’t usually a clear winner for all possible scenarios. Still, at work we tested those new algorithms in dozens and dozens of scenes with different characteristics, and this “broad-phase X” always performed remarkably well.

The demo is straight out of Bullet 2.78, compiled in Release mode with full optimizations and the SSE2 flag enabled.

Click here to download

The new broad-phase should be available in PhysX 3.3. Stay tuned! :)

9 Responses to “New broad-phase demo”



  1. David Black Says:

    You should write about this new broadphase algo.

    I do hope NVIDIA isnt going to start patenting basic spatial sorting/searching algorithms… (I was fairly disgusted by the GPU Radix sort patent:-(

  2. your fan's Says:

    i’ve been watching you about ten years, since flipcode age. keep on great work, and make it better.

    say hello to your son, so sweet:)

  3. Dirk Gregorius Says:

    Hi Pierre,

    nice work! Does the new broadphase X also acclerate raycasting or do you need an additional structure for this? In the later case I would argue that your benchmark should also take the update for this structure into account. From my experience only overlapping pairs are very rare these days. I need to look at the Bullet source, but I would argue the same for SAP which should also update the tree structure in the benchmark. I think that Bullet actually supports this.

    Cheers and thanks for sharing your work!
    -Dirk

  4. admin Says:

    No, I would say this broadphase does not support raycasting - not in this current implementation in any case. It’s certainly true that this is only a broadphase benchmark, and not the full story. But then if you go this way, measuring the cost for updating the second structure is *also* not the full story. The full story involves the cost of performing the actual raycasts (i.e. traversing those different structures)…

  5. Dirk Gregorius Says:

    I agree. We should have a benchmark that tells the “full story”.

  6. David Black Says:

    >>> I agree. We should have a benchmark that tells the “full story”.

    I dont think most(any) people could handle the full story(truth:-).

    Better not to complicate things needlessly and have a defined goal when broadening the scope of the metrics.

  7. David Black Says:

    You could just as easily ask if it supports sweeps(angular and linear) or any other form of spatial query…

    (grr these captchas are impossible, I would be impressed if a machine could read them, I cant:-(

  8. David Black Says:

    You could just as easily ask if it supports sweeps(angular and linear) or any other form of spatial query…

    (grr these captchas are impossible, I would be impressed if a machine could read them, I cant:-( gegrewre

  9. Erwin Coumans Says:

    Hey Pierre,

    Thanks for the update, I haven’t done much CPU work recently but I’ll get back at it.

    I started rewriting that benchmark for GPU, with instanced rendering and OpenCL, I’ll add some new GUI too. There is a GPU based parallel 1-axis sweep and prune that can easily handle 100k moving objects (10ms) to 1M (100ms).

    See Windows download here:
    https://github.com/erwincoumans/experiments/downloads

    The OpenCL kernel is here:
    https://github.com/erwincoumans/experiments/blob/master/opencl/broadphase_benchmark/sap.cl

    By the way, that btDbvt thing can run faster if you enable the ‘deferred collide’ option, it is Nathanael Presson’s spelling mistake :)

shopfr.org cialis