Archive for February 22nd, 2008

While I’m at it…

Friday, February 22nd, 2008

What’s the deal with alpha-sorted polygons those days? I hear more and more that engine X or Y doesn’t support them because “it’s too slow”. Errr, hello? We even sorted polygons on 8 Mhz Atari STs, I assume you can manage to do the same on a 3 Ghz multi-core machine? Sure the number of polygons has increased, but so has the CPU speed. So what’s the problem?

  • the first problem is that they don’t use the right algorithm. If the number of polygons and the CPU speed evolve in parallel, the ratio only remains constant with an O(n) sort. Since they insist on using qsort, well, they’re doomed before the race even starts.
  • the second problem is that they don’t generate their sort keys efficiently. No, you don’t need that square root. No, you don’t need the distance to the camera position. You need a simple dot-product: the vertex projection on the camera’s forward vector.
  • the third problem is that they sort everything each frame. It’s useless. Sort the index buffer once, and only sort it again when the view direction has changed significantly. You don’t need an exact sort each frame since the whole process is not accurate to begin with (each triangle is sorted according to a single witness point). So it means you need to actually sort stuff only once in a while - or all the time but over N frames, if you prefer.
  • the fourth problem is that they sort at runtime. You don’t have to. As we just said the sorting is approximate, so you can precompute 8 sorts/index buffers for the 8 possible octants in which the view vector can be, and simply pick up the correct one at runtime. This has been used countless times in countless demos on Atari and PC. Seriously old skool. It certainly increases the memory used, but here’s my deal for you: drop the STL, and you’ll win back all the memory you need to include those precomputed sorted buffers. The number of transparent objects shouldn’t be huge anyway, so this is usually not a problem. Especially since those sorted buffers can be easily compressed like crazy, and decompressed to the actual D3D index buffer only when the view vector changes.

So, what’s your excuse already?

It had to happen: the STL rant

Friday, February 22nd, 2008

It’s not just that I don’t like the STL, it’s simply that I don’t need it.

If I need a basic container (linked list, vector…), the STL doesn’t help much because I can use my good old versions from 10 years ago.

If I need a more advanced container (e.g. a map), the STL doesn’t help much because custom versions are way faster anyway.

If I need a sort, I usually want a radix. Here again the STL doesn’t help.

If I want to disable exceptions for performance reasons, the STL doesn’t help.

If I want to save memory, the STL definitely doesn’t help.

If I want to debug my code easily, if I want to avoid “byzantine syntax” in my code, if I want to keep my compile times low or avoid insane code bloat, again, the STL fails.

So, really, why exactly should I bother? Just because it’s “standard” ? Are you insane? What does that buy me exactly? Isn’t that a lot of troubles for very dubious benefits?

STL horror stories on top of my head:

  • the “contact stream” in the old Novodex SDK: we needed a dynamic vector where we could push_back both floats and ints (something like an int encoding a number of floats, followed by the actual floats). It was originally implemented with a templated array. Replacing it with a vanilla Container from ICE reduced the exe size and gave a small framerate boost.
  • the VC6 implementation was a joke, but the new ones aren’t that much better. The bug is in the template theory: everything is (or can be) inlined. Take a simple push_back. It really has two parts: the one that copies the object to the dynamic array, and the one that resizes the array. Most of the time, say for a vector of ints, you really want the copy to be inlined (it just copies an int after all). However you never want the resizing part to be inlined: it happens rarely, and it’s so slow anyway that inlining it is pretty useless. But how does the compiler know that one part should be “always” inlined, while the other should not? Exactly: it doesn’t. So what do you think happen? Could you swear that the compiler will not happily inline both parts for every push_back in your code? I saw it happen with VC6. Could you swear it doesn’t happen with, say, PS3 compilers?
  • speaking of which, can you guess how much we saved on the exe size a few weeks ago on PS3, when we switched from the default STL implementation to uSTL ? Hint: give your answers in megabytes.

Please use and abuse the STL, it makes my life that much easier when it comes to beating the competition. Maybe exceptional products are not written with “standard” code, you know…

Faster convex-convex SAT : partial hulls

Friday, February 22nd, 2008

Here’s a question I received in my mailbox a while back:

I am looking into the SAT code for small convex meshes here at the moment and it doesn’t seem very effcient. Can you share some ideas how to reduce the number of tests. Especially the number of edge/edge tests seems to grow huge.

There would be a lot to say about this. But one way is to replace colliding convexes with simpler versions (I call them “partial hulls”). Take a look at the following picture:

Partial hulls

The two overlapping convexes have a red part and a green part. The key observation is that the penetration depth and the penetration direction (the yellow line) do not depend on the green parts - which can be safely ignored. The penetration would be the same if the convexes would be “closed” by the blue segments.

So the trick is simple in theory: ignore the green parts, and compute the penetration distance using the “partial hulls” enclosed between the red and blue segments. It has two direct benefits:

  • the number of edge/edge tests is greatly reduced (that’s clearly the main advantage).
  • the green vertices can also be ignored in the projection code

How to implement this efficiently is another (long) story, but here is a small proof-of-concept test that already runs orders of magnitude faster than the brute-force Separating Axis Theorem.

Free Havok

Friday, February 22nd, 2008

Wow!

shopfr.org cialis