Modulo 3

Ever encountered this one?

int Index; // 0, 1 or 2

int Index2 = (Index+1)%3;

int Index3 = (Index+2)%3;

It appears all over the
place, in particular when you have some algorithm working on vertices, and
"Index" is the index of a particular component, i.e. X, Y or Z.

Each time I see this code I
feel terribly sorry for the poor guy who wrote it. Especially
when I am that guy.

Needless to say (but I'll
say it anyway), this code is evil since in this case it includes two hidden
divisions by 3. Which is slow.

The usual way to get rid of
the modulos is to bite the bullet and write explicit comparisons.
For example:

int Index; // 0, 1 or 2

int Index2 = Index+1; if(Index2>2) Index2 -= 3;

int Index3 = Index+2; if(Index3>2) Index3 -= 3;

This is a bit better. But not much. Of course, you traded the modulos
for comparisons, i.e. you replaced slow divisions with potential branch mispredictions, which are just as evil.

Another popular way to fix
this (used in *3DS MAX*, for example)
is to use a small lookup table:

int LUT[] = { 1, 2, 0, 1 };

int Index; // 0, 1 or 2

int Index2 = LUT[Index];

int Index3 = LUT[Index+1];

It almost looks nice. Almost. But of course it suffers from two issues:

- the
LUT has to be written to memory each time

- then
read back

To get rid of the first
issue you usually do:

static int LUT[] = { 1, 2, 0, 1 };

int Index; // 0, 1 or 2

int Index2 = LUT[Index];

int Index3 = LUT[Index+1];

Hence you just have to read
a static LUT. Well, it's also bad. Accessing memory is bad, no matter what.
Accessing a static array is even worse, since the corresponding memory can be
anywhere, and probably not close to your stack. In short, you replaced slow
divisions with even slower cache misses. This is not really a progress.

So, we have the choice
between 3 evil ways to do this. Hmmm. Is there any hope
left?

Well, maybe. I'd like to
propose this alternative:

int Index; // 0, 1 or 2

int Index3 = 0x01000201;

Index3>>=(Index<<3);

int Index2 = Index3 & 0xff;

Index3>>=8;

Index3&=0xff;

No modulo, no comparisons,
no memory access. Looks good!

In case you don’t actually
need all the 3 indices, but only the next one in the sequence, you can use this
alternative function from

int Modulo(int
edge)

{

return ((edge&1)<<1) | ((~(edge+1)&2)
>>1);

}

**Addendum March 22, 2007:**

Christer Ericson was kind
enough to send two alternative methods. The first one was:

int Index; // 0, 1 or 2

int Index2 = ( 9 >> (Index << 1)) &
3;

int Index3 = (18 >> (Index << 1)) &
3;

int Modulo(int edge)

{

return ( 9 >> (edge << 1)) & 3;

}

And the most recent one is
so simple I think we can just close the topic:

int Index1 = n; // 0, 1 or 2

int Index2 = (1 << Index1) & 3;

int Index3 = (1 << Index2) & 3;

int Modulo(int edge)

{

return (1 <<
edge) & 3;

}

Pierre Terdiman