Rendering the CPU time

 

 

 

How do you profile your code ?

 

Nowadays, there are two usual ways :

-         RDTSC

-         VTune

 

RDTSC gives the number of elapsed cycles during the profiled sequence. For example you may use it that way :

 

††††† udword NbCycles ;

††††† StartProfile(&NbCycles) ;

††††† Ö

††††† profiled code

††††† Ö

††††† EndProfile(&NbCycles) ;

††††† // Get the number of elapsed cycles in NbCycles

 

The used functions could be :

 

††††† inline void StartProfile(udword* val)

††††† {

††††† ††††† __asm {

††††††††††† ††††† rdtsc

††††††††††† ††††† mov†† ††††† ebx, val

††††††††††† ††††† mov†† ††††† [ebx], eax

††††† ††††† }

††††† }

 

††††† inline void EndProfile(udword* val)

††††† {

††††† ††††† __asm {

††††††††††† ††††† rdtsc

††††††††††† ††††† mov†† ††††† ebx, val

††††††††††† ††††† sub†† ††††† eax, [ebx]

††††††††††† ††††† mov†† ††††† [ebx], eax

††††† ††††† }

††††† }

 

VTune is the horsepower solution : it gives you cycles, cache misses, etc, basically everything. VTune is a great tool to play with, and itís really worth trying it. Unfortunately, you must first install it, learn it, and it may just be a bit complicated for basic needs.

 

Sometimes you donít really mind about the exact number of cache misses or the exact number of cycles, you just want to have a coarse idea of how much CPU time youíre eating in a given piece of code. Moreover, a number of cycles is not something obvious, and greatly depends on your computerís frequency. If RDTSC claims some routine takes 50000 cycles, so what ? Is it too much, is it cheap ? Without extra information, you just canít tell.

 

Thatís why Iím usually fond of another way of profiling my code, which is a lot more visual. I donít count cycles, I count scanlines. This is actually a well-known method, but it has became quite obsolete since the generalization of 16bits and 32bits frame buffersÖ Why ? Because the usual way to do that was to change the background color before and after the code to profile. Then you could directly see on screen how much scanlines it took.

 

Now, you canít do that anymore because your framebuffer is 16 or 32bits, and thereís no more palette you can immediately modify.

 

Fortunately, we still can get the same visual profiling thanks to a DirectDraw method :

GetScanLine. As the name suggests, it returns the current scanline beeing traced by the electron beam. Hence the recipe :

 

FirstLine = GetScanline() ;

Ö

Öcode to profileÖ

Ö

LastLine = GetScanline() ;

NbScanlines = LastLine Ė FirstLine ;

 

Then you just have to render the elapsed CPU time, as a standard TLVERTEX quad :

 

 

bool Renderer::DrawCPUTime(udword y)

{

††††† TLVertex††††† Verts[4];†† ††††† // Vertices for a rectangle

††††† uword ††††† Indexes[6]; ††††† // Indices

 

// Initalize the vertices

††††† float††††† sx††† ††††† = (float)mRenderWidth;

††††† float††††† sy††† ††††† = (float)(mLastScanline - mFirstScanline);

††††† float††††† ystart††††† = float(y);

 

††††† Verts[0].p.x††††† ††††† = 0.0f;

††††† Verts[0].p.y††††† ††††† = ystart+sy;

††††† Verts[0].p.z††††† ††††† = 0.0f;

††††† Verts[0].rhw††††† ††††† = 1.0f;

††††† Verts[0].color††† ††††† = 0x7fffffff;

††††† Verts[0].specular††††† = 0;

††††† Verts[0].u††††††† ††††† = 0.0f;

††††† Verts[0].v††††††† ††††† = 1.0f;

 

††††† Verts[1].p.x††††† ††††† = 0.0f;

††††† Verts[1].p.y††††† ††††† = ystart;

††††† Verts[1].p.z††††† ††††† = 0.0f;

††††† Verts[1].rhw††††† ††††† = 1.0f;

††††† Verts[1].color††† ††††† = 0x7fffffff;

††††† Verts[1].specular††††† = 0;

††††† Verts[1].u††††††† ††††† = 0.0f;

††††† Verts[1].v††††††† ††††† = 0.0f;

 

††††† Verts[2].p.x††††† ††††† = 0.0f+sx;

††††† Verts[2].p.y††††† ††††† = ystart+sy;

††††† Verts[2].p.z††††† ††††† = 0.0f;

††††† Verts[2].rhw††††† ††††† = 1.0f;

††††† Verts[2].color††† ††††† = 0x7fffffff;

††††† Verts[2].specular††††† = 0;

††††† Verts[2].u††††††† ††††† = 1.0f;

††††† Verts[2].v††††††† ††††† = 1.0f;

 

††††† Verts[3].p.x††††† ††††† = 0.0f+sx;

††††† Verts[3].p.y††††† ††††† = ystart;

††††† Verts[3].p.z††††† ††††† = 0.0f;

††††† Verts[3].rhw††††† ††††† = 1.0f;

††††† Verts[3].color††† ††††† = 0x7fffffff;

††††† Verts[3].specular††††† = 0;

††††† Verts[3].u††††††† ††††† = 1.0f;

††††† Verts[3].v††††††† ††††† = 0.0f;

 

††††† // Initialize the indices

††††† Indexes[0]††††††† ††††† = 0;

††††† Indexes[1]††††††† ††††† = 1;

††††† Indexes[2]††††††† ††††† = 2;

††††† Indexes[3]††††††† ††††† = 2;

††††† Indexes[4]††††††† ††††† = 1;

††††† Indexes[5]††††††† ††††† = 3;

 

††††† mRS->SetLighting(false);

††††† mRS->SetAlphaBlending(false);

††††† mRS->SetTexture(null) ;

††††† mRS->SetMaterial();

††††† mRS->SetCullMode(CULL_NONE);

††††† return DrawIndexedPrimitive(PRIMTYPE_TRILIST, VF_XYZRHW|VF_DIFFUSE|VF_SPECULAR|VF_TEX1, Verts, 4, Indexes, 6);

}

 

 

Unfortunately I donít know how to do the same with OpenGL.

 

Historical notes :

The first game I saw using that method was Goldrunner, by Steve Bak, 1987. You had to press the F10 key to discover some strange dancing rasters on your screen : the actual CPU time used.

 

 

 

 

 

Pierre Terdiman