I've stuck it in vtune. The piece the optimizer is having trouble with is this, especially the pixel munging. Your assembler is much faster but it shouldn't be. Otherwise, the optimizer is beating the other chunk of assembler easily.
"" "367" " d=float(x_start)-xx0;" "" ""
"" "368" " red=rr0+dred*d;" "" ""
"" "369" " gre=gg0+dgre*d;" "" ""
"" "370" " blu=bb0+dblu*d;" "" ""
"" "371" " while (pixel_address<=last_address){" "" ""
"" "372" " *pixel_address=((int)red<<16)|((int)gre<<8)|(int)blu;" "" ""
"" "373" " red+=dred;" "" ""
"" "374" " gre+=dgre;" "" ""
"" "375" " blu+=dblu;" "" ""
"" "376" " pixel_address+=1;" "" ""
"" "377" " }" "" ""
I get about 150ms for 1000 tris with assembler, and 450ms without, Release or Debug doesn't make any difference.
I'm timing it like this
unsigned long long thetime(void)
{
LARGE_INTEGER li;
QueryPerformanceCounter(&li);
return li.QuadPart;
}
unsigned long long freq;
void inittime(void)
{
LARGE_INTEGER li;
QueryPerformanceFrequency(&li);
freq = li.QuadPart;
}
void do_graphics_stuff(...)
{
...
char tmp[256];
unsigned long long t = thetime();
...thing to time
sprintf(tmp, "%I64ums\n", ((thetime()-t)*1000)/freq);
OutputDebugString(tmp);
}
You need to call inittime() somewhere in WinMain first. I'm not timing the flip, just the triangles.
OutputDebugString() writes its output into the debugger output window in VS Express.
Jim