Archive for the 'Ideas to try' Category

Static/Bipartite SAP

Wednesday, September 14th, 2011

Another idea for optimizing the SAP when we have a small amount of dynamic objects moving amongst a large amount of static objects:

Put the dynamics in their own vanilla SAP. Get the dynamic-vs-dynamic overlaps from there, nothing new.

Put the dynamics & statics in a special SAP (“Static SAP”, or “Bipartite SAP”) modified to only report dynamic-vs-static overlaps:

  • The statics are stored in the SAP as usual (same box/endpoint structures as before)
  • The dynamics however are stored in a separate array, using a new box structure.
  • When the dynamic objects are inserted in the SAP, their endpoints are not inserted into the same array as for statics. They are stored directly in the new box structure, i.e. in the (dynamic) box class itself. They link to the static arrays of endpoints (their virtual position within those arrays, if they would have been inserted there as usual)
  • When running “updateObject” on those dynamic objects, we move their endpoints virtually instead of for real. That is, we still look for the new sorted positions within the arrays of endpoints, but we don’t actually shift the buffers anymore. The update is a const function for the main SAP data-structure, the only thing that gets modified is the data within the dynamic box we’re updating. In other words we don’t have real “swaps” anymore. No more memmoves, thus reduced number of LHS, etc.

The drawback is that we update dynamic objects twice, once in the “static SAP”, once in the normal SAP.

Didn’t try it, don’t know about the perf, but might be interesting.

Debug console for Win32 app

Monday, July 7th, 2008

I keep forgetting this one so let me post it once and for all. Adding a debug console to a Win32 app takes 4 lines of code.

freopen(”CONOUT$”, “wt”, stdout);

SetConsoleTitle(L”Debug Console”);


You can then printf your way out of trouble.

Mesh granularity

Tuesday, July 1st, 2008

Idea to try: one region = one mesh.

Current system:
- a level is made of different regions connected by portals
- a portal is a convex polygon
- a region is a collection of meshes
- a mesh is a collection of “subsets”. Each mesh has a single PRS transform and is a single object in 3DS MAX.
- a subset is a collection of triangles sharing the same render states (same material).

Currently, things are exported as-is from 3DS MAX, and each region contains several meshes, depending on whatever is in the MAX scene. Visible objects are batched, and sorted to minimize the number of render state changes. However this may not be necessary: we could use the “static batching” that happens for each individual mesh instead, the one creating subsets. We just have to merge all the meshes of a given region together, ending up with a single mesh per region.

If the regions are quite small, i.e. if it is possible to see “all” meshes of a region from a particular viewpoint, this looks like a good idea (following the “optimize your worst case” strategy). We save memory by reducing the number of “meshes” in the system, we save CPU time on culling, batching, etc.

If the engine creates one physics mesh for each render mesh, it might also have some benefits. It certainly saves memory, minimizing the cost associated with each “object” within the physics engine. It also reduces the stress on the broad phase. However it puts it on the midphase (basically OPCODE), since the whole region is now a huge AABB-tree. The old “RAPID vs OPCODE” tests showed that AABB-trees are not too great when a small object is fully contained within a big scene, but on the other hand temporal caches helped a lot for this case, so it might be ok in the end.

Now the only issue with this is probably when you have a room with several open doors, each of them being a portal connected to a different region. If you stand in the middle of the room looking at those doors, you end up rendering all the regions even though a small part of them is visible, whereas before only the first meshes of each region would have been rendered. So you made your worst case worse, in a way.


Not merging meshes at all and keeping the same organization and the same number of objects as in MAX does work, but it puts the burden on the artists, and on the batcher. But even if the batcher does a perfect job, you still waste memory associated with individual meshes. In ICE, sizeof(Mesh) is something like 500 bytes or more, which I know is too much, but that’s how it is now, and a lot of professional engines have worse than this. With ~1000 meshes in a single region,that’s ~500 Kb lost per region: not good. Plus, the batcher is never “perfect” anyway, i.e. it has a runtime cost.

Merging meshes by material regardless of spatial considerations doesn’t work. It makes things easier for the batcher but it kills the physics.

A good alternative is just to cover the region with a fixed-size grid, assign triangles to grid cell (regardless of their original mesh), and create one mesh per cell. It keeps the number of meshes under control, it’s physics-friendly, and also culling-friendly. The main drawback is when the physics does a convex decomposition and relies on it to have good collision response. You can end up with a perfectly flat plane cut to N different, unconnected physics meshes - and bad collision response when an object slides from one part of the mesh to the other.

You can always completely decouple the physics & graphics representations, but it’s a pain to manage. Probably the best option though.

TODO: stack-allocated container?

Thursday, February 14th, 2008

I was browsing a codebase when I noticed this usage pattern, several times:

void SomeFunction()

vector<some class> localVector;

localVector.push_back(some data);


In considered cases, the push_back doesn’t always happen and the number of pushed elements is not known ahead of time (i.e. there’s a good reason why a dynamic array is used). However what happens is that only a few elements are usually written to the array, and overall a small amount of memory is actually needed. So, it’s a bit painful to do an allocation/deallocation for this.

So it might be useful to allocate the initial memory on the stack (i.e. with alloca) and only switch to “real” memory allocations the first time the array is resized. If we stack-allocate say 16K initially, it could significantly reduce the number of small allocations performed by that kind of code. Something to try at some point.

EDIT: done! Works nicely. cialis