Dark Bit Factory & Gravity

PROGRAMMING => C / C++ /C# => Topic started by: Stonemonkey on January 13, 2007

Title: Graphics in C++
Post by: Stonemonkey on January 13, 2007
Right, I've been wanting to give C++ a go for a while but the thing I've been having problems with really is setting up the graphics and windows side of things. Initially I'm wanting to do some pixel stuff without hardware so anyone got any ideas about the best way to go about setting up something that would work like ptc?
Title: Re: Graphics in C++
Post by: Jim on January 13, 2007
There's this thread on the old yabasic forum, which is still up to date.  I think the code you're looking for is near the end where I switch away from using Windows' SetPixel and write to a buffer.
http://p205.ezboard.com/fyabasicprogrammingfrm20.showMessageRange?topicID=14.topic (http://p205.ezboard.com/fyabasicprogrammingfrm20.showMessageRange?topicID=14.topic)

Jim
Title: Re: Graphics in C++
Post by: Stonemonkey on January 13, 2007
Nice one Jim, that's going back a bit.
Title: Re: Graphics in C++
Post by: rdc on January 13, 2007
There is also PixelToaster (http://www.pixeltoaster.com/) if your looking for a PTC-type of thing. I forget who mentioned at the moment, but there is an example program here somewhere using it.
Title: Re: Graphics in C++
Post by: Stonemonkey on January 13, 2007
Well, I've been messing around with the stuff Jim linked to and I'm getting along with it a lot better this time. Is it ok to do stuff like this:

Code: [Select]
#define VC_EXTRALEAN
#define WIN32_LEAN_AND_MEAN
#include <windows.h>

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <math.h>


typedef struct
{
        int wwidth,height,bpp;
        unsigned int* argb;
}image_buffer;

typedef struct
{
        HWND window;
        image_buffer* buffer;
}window_struct;


image_buffer* create_image_buffer(int wwidth, int height)
{
              image_buffer* buffer=(image_buffer*)malloc(sizeof (image_buffer));
              buffer->wwidth=wwidth;
              buffer->height=height;
              buffer->argb=(unsigned int*)malloc(sizeof(unsigned int)*wwidth*height);
              buffer->bpp=32;
              return buffer;
}

void flip(window_struct* window)
{
     BITMAPINFO bmi;
     HDC hdc;
     bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
     bmi.bmiHeader.biWidth = window->buffer->wwidth;
     bmi.bmiHeader.biHeight = -window->buffer->height;
     bmi.bmiHeader.biPlanes = 1;
     bmi.bmiHeader.biBitCount = window->buffer->bpp;
     bmi.bmiHeader.biCompression = BI_RGB;
     bmi.bmiHeader.biSizeImage = 0;
     bmi.bmiHeader.biXPelsPerMeter = 75;
     bmi.bmiHeader.biYPelsPerMeter = 75;
     bmi.bmiHeader.biClrUsed = 0;
     bmi.bmiHeader.biClrImportant = 0;
     hdc = GetDC(window->window);
     SetDIBitsToDevice(hdc, 0,0, window->buffer->wwidth,window->buffer->height, 0,0, 0,window->buffer->height, window->buffer->argb, &bmi,  DIB_RGB_COLORS);
     ReleaseDC(window->window, hdc);
}

void setpixel(image_buffer* buffer,int x, int y, unsigned int colour)
{
     if (x < 0 || x >= buffer->wwidth) return;
     if (y < 0 || y >= buffer->height) return;
     *(buffer->argb+x+buffer->wwidth*y)=colour;
}

unsigned int getpixel(image_buffer* buffer,int x, int y)
{
     if (x < 0 || x >= buffer->wwidth) return 0;
     if (y < 0 || y >= buffer->height) return 0;
     return *(buffer->argb+x+buffer->wwidth*y);
}

void clear_buffer(image_buffer* buffer,unsigned int colour)
{
     memset(buffer->argb, colour, buffer->wwidth*buffer->height*sizeof(int));
}

void shutdown(window_struct* window)
{
     free(window->buffer->argb);
     free(window->buffer);
     free(window);
}


void init_fireworks(void);
void do_fireworks(image_buffer*);
void draw_fireworks(image_buffer*);
void do_graphics_stuff(window_struct* window)
{
     clear_buffer(window->buffer,0);
     do_fireworks(window->buffer);
     InvalidateRect(window->window, NULL, FALSE);
     UpdateWindow(window->window);
     flip(window);
     Sleep(16);
}


int quit = FALSE;
LRESULT CALLBACK WindowProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
        switch (uMsg)
        {
                case WM_PAINT:
                        {
                        PAINTSTRUCT paint={0};
                        paint.hdc = (HDC)wParam;
                        GetUpdateRect(hwnd, &paint.rcPaint,TRUE);
                        BeginPaint(hwnd, &paint);
                        EndPaint(hwnd, &paint);
                        break;
                        }
                case WM_DESTROY:
                        quit = TRUE;
                        PostQuitMessage(0);
                        break;
                default:
                        return DefWindowProc(hwnd, uMsg, wParam, lParam);
        }
return 0;
}

window_struct* setup_window(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR     lpCmdLine,
                     int       nCmdShow,
                     int wwidth,
                     int height,
                     TCHAR title[])
{
        WNDCLASSEX clas;
        MSG msg;

        TCHAR graphics_class[] = "SM3d";
       
        clas.cbSize = sizeof(WNDCLASSEX);
        clas.style = CS_HREDRAW | CS_VREDRAW;
        clas.lpfnWndProc = WindowProc;
        clas.cbClsExtra = 0;
        clas.cbWndExtra = 0;
        clas.hInstance = hInstance;
        clas.hIcon = NULL;
        clas.hCursor = NULL;
        clas.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
        clas.lpszMenuName = NULL;
        clas.lpszClassName = graphics_class;
        clas.hIconSm = 0;
        RegisterClassEx(&clas);

        window_struct* new_window=(window_struct*)malloc(sizeof(window_struct));
        {
        RECT rect={0,0,wwidth,height};
        int style = WS_SYSMENU|WS_MINIMIZEBOX|WS_VISIBLE;
        AdjustWindowRectEx(&rect, style, FALSE, WS_EX_TOPMOST);
        new_window->window = CreateWindowEx(0, graphics_class, title, style, CW_USEDEFAULT, CW_USEDEFAULT, rect.right-rect.left, rect.bottom-rect.top, NULL, NULL, hInstance, 0);
        }

        ShowWindow(new_window->window, nCmdShow);
        UpdateWindow(new_window->window);
        new_window->buffer=create_image_buffer(wwidth,height);
        return new_window;
}


int APIENTRY WinMain(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR     lpCmdLine,
                     int       nCmdShow)
{
        MSG msg;

        window_struct* window=setup_window(hInstance,hPrevInstance,lpCmdLine,nCmdShow,640,480,"Stonemonkey");
       
        init_fireworks();
       
        while (!quit)
        {
                while (PeekMessage(&msg, window->window, 0, 0, PM_NOREMOVE))
                {
                        if (!GetMessage(&msg, window->window, 0, 0))
                                break;
                        TranslateMessage(&msg);
                        DispatchMessage(&msg);
                }

                do_graphics_stuff(window);
               
        }
        shutdown(window);

        return 0;
}





typedef struct
{
        float x,y,z;
} VECTOR;

typedef struct
{
        unsigned int flags;
        VECTOR posn;
        VECTOR velocity;
        float spin;
        float spin_rate;
        float life_span;
        int age;
} COIN;

enum
{
        COIN_ALIVE_F=1,
        COIN_PARENT_F=2,
};


#define MAX_COINS 5000
#define MAX_GEN 500
COIN fireworks[MAX_COINS];

#define BIG_G -9.81f

int my_rand()
{
        return rand()+(rand()<<15);
}

float rand_angle()
{
        return (float)(my_rand()%360);
}

float rand_vel()
{
        return (float)((my_rand()%201)-100);
}

float rand_yvel()
{
        return (float)(my_rand()%750)-BIG_G;
}


float rand_rate()
{
        return (float)((my_rand()%45)-23);
}

float rand_lifespan()
{
        return (float)((my_rand()%3)+1);
}

void create_a_coin(COIN *coin, int always_alive)
{
        if (always_alive)
                coin->flags = COIN_ALIVE_F;
        else
                coin->flags = my_rand()&COIN_ALIVE_F;

        if (coin->flags & COIN_ALIVE_F)
        {
                coin->posn.x = 320.0f;
                coin->posn.y = 0.0f;
                coin->posn.z = 0.0f;
                coin->spin = rand_angle();
                coin->spin_rate = rand_rate();
                coin->velocity.x = rand_vel();
                coin->velocity.z = rand_vel();
                coin->velocity.y = rand_yvel();
                coin->life_span = rand_lifespan();
                coin->age = 0;
                if ((my_rand()%10)==0)
                        coin->flags |= COIN_PARENT_F;
        }
return;
}

void create_a_system(VECTOR *pos, int age)
{
        int amount = 1+(my_rand()%MAX_GEN);
        COIN *coin = fireworks;
        int x;

        age++;
        for (x = 0; x < MAX_COINS && amount; x++, coin++)
        {
                if (!(coin->flags & COIN_ALIVE_F))
                {
                        create_a_coin(coin, TRUE);
                        coin->posn.x = pos->x;
                        coin->posn.y = pos->y;
                        coin->posn.z = pos->z;
                        coin->velocity.x = rand_vel();
                        coin->velocity.z = rand_vel();
                        coin->velocity.y = rand_yvel()/(float)(age+1);
                        coin->life_span/=(float)(age+1);
                        coin->age = age;
                        amount--;
                }
        }
}

void init_fireworks()
{
        int x;
        COIN *coin = fireworks;

        srand(time(NULL));

        for (x = 0; x < MAX_GEN; x++, coin++)
        {
                create_a_coin(coin, FALSE);
        }
        for (x = MAX_GEN, coin = fireworks+MAX_GEN; x < MAX_COINS; x++, coin++)
                coin->flags = 0;

return;
}

void process_fireworks(float elapsed_time)
{
        int x;
        COIN *coin = fireworks;

        elapsed_time /= 1000.0f;

        for (x = 0; x < MAX_COINS; x++, coin++)
        {
                if (coin->flags & COIN_ALIVE_F)
                {
                        coin->spin += coin->spin_rate;
                        coin->posn.x += coin->velocity.x * elapsed_time;
                        coin->posn.y += coin->velocity.y * elapsed_time;
                        coin->posn.z += coin->velocity.z * elapsed_time;
                        coin->velocity.y += BIG_G;
                        coin->life_span -= elapsed_time;
                        if (coin->life_span < 0.0f)
                        {
                                if ((coin->flags & COIN_PARENT_F) && coin->age < 5)
                                {
                                        create_a_system(&coin->posn, coin->age);
                                        coin->flags &= ~COIN_ALIVE_F;
                                }
                                else
                                {
                                        coin->flags &= ~COIN_ALIVE_F;
                                        //create_a_coin(coin, TRUE);
                                }
                        }
                        if (coin->posn.y < 0.0f)
                                coin->flags &= ~COIN_ALIVE_F;
                }
        }

return;
}

int get_blit_size(float z)
{

        z += 320.0;
        return (int)(64.0f*z/640.0f);
}

void draw_fireworks(image_buffer* buffer)
{
     
        int x;
        COIN *coin = fireworks;
 
        int population = 0;

        for (x = 0; x < MAX_COINS; x++, coin++)
        {
                if (coin->flags & COIN_ALIVE_F)
                {
                        setpixel(buffer,(int)coin->posn.x, 400-(int)coin->posn.y+1, 0x00ff00);
                        population++;                                                                                                   
                }
        }
       
        if (!population) init_fireworks();

return;
}

void do_fireworks(image_buffer* buffer)
{
        process_fireworks(1000.0f/60.0f);
        draw_fireworks(buffer);
}
Title: Re: Graphics in C++
Post by: Jim on January 13, 2007
Looks like it...as long as that PeekMessage loop is called every second or so it'll be fine.  You can basically do whatever you want.

Jim
Title: Re: Graphics in C++
Post by: Stonemonkey on January 14, 2007
I've just taken my freebasic triangle code and put it in this, the code is virtually identical but it runs a fair bit slower!
Title: Re: Graphics in C++
Post by: ninogenio on January 14, 2007
yeah i had the same trouble gdi wasnt that fast with my triangle renders but i think it has somehing to do with vsyncing so mabey a bit of delta timing will sort things out?
Title: Re: Graphics in C++
Post by: ninogenio on January 14, 2007
is the sleep 16 necessary i removed it in my demo and it seemed to still work fine except the cpu usage went to 100% all the time but the code executed faster.

btw excellent code post above its a great read!
Title: Re: Graphics in C++
Post by: Stonemonkey on January 14, 2007
Nah, the sleep's just to stop it hogging the cpu, i've set sleep to 1 in both fb and c++ and still fb looks much faster.

Here's the c++ tri function:
Code: [Select]
void fill_gtriangle(image_buffer* buffer, float x0, float y0, unsigned int argb0,
                                          float x1, float y1, unsigned int argb1,
                                          float x2, float y2, unsigned int argb2)
{
 unsigned int *pixel_address, *last_address, *address=buffer->argb;
 int wwidth=buffer->wwidth, height=buffer->height, p0, p1, p2, y_start, y_end,x_start, x_end;
 float x[3], y[3], r[3], g[3], b[3], xx0, dx0, xx1, dx1, d;
 float rr0, dr0, gg0, dg0, bb0, db0, rr1, dr1, gg1, dg1, bb1, db1;
 float red, dred, gre, dgre, blu, dblu;
 
 x[0]=x0;
 y[0]=y0;
 r[0]=float((argb0 >> 16)&0xff);
 g[0]=float((argb0 >> 8 )&0xff);
 b[0]=float(argb0 &0xff);
 x[1]=x1;
 y[1]=y1;
 r[1]=float((argb1 >> 16)&0xff);
 g[1]=float((argb1 >> 8 )&0xff);
 b[1]=float(argb1 &0xff);
 x[2]=x2;
 y[2]=y2;
 r[2]=float((argb2 >> 16)&0xff);
 g[2]=float((argb2 >> 8 )&0xff);
 b[2]=float(argb2 &0xff);
 
 p0=0; p2=2;
 if (y1<y0) p0=1;
 if (y2<y[p0]) p0=2;
 if (y1>y2) p2=1;
 if (y0>y[p2]) p2=0;
 p1=3-p0-p2;
 
 y_start=(int)y[p0]+1;
 y_end=(int)y[p1];
 if (y_start<0) y_start=0;
 if (y_end>=height) y_end=height-1;
 
 if (y_start<=y_end) {
                    d=1.0f/(y[p1]-y[p0]);
                    dx0=(x[p1]-x[p0])*d;
                    dr0=(r[p1]-r[p0])*d;
                    dg0=(g[p1]-g[p0])*d;
                    db0=(b[p1]-b[p0])*d;
                    d=1.0f/(y[p2]-y[p0]);
                    dx1=(x[p2]-x[p0])*d;
                    dr1=(r[p2]-r[p0])*d;
                    dg1=(g[p2]-g[p0])*d;
                    db1=(b[p2]-b[p0])*d;
                    if (dx0>dx1) {
                                d=dx0;
                                dx0=dx1;
                                dx1=d;
                                d=dr0;
                                dr0=dr1;
                                dr1=d;
                                d=dg0;
                                dg0=dg1;
                                dg1=d;
                                d=db0;
                                db0=db1;
                                db1=d;
                    }
                    d=float(y_start)-y[p0];
                    xx0=x[p0]+dx0*d;
                    rr0=r[p0]+dr0*d;
                    gg0=g[p0]+dg0*d;
                    bb0=b[p0]+db0*d;
                    xx1=x[p0]+dx1*d;
                    d=1.0f/(dx1-dx0);
                    dred=(dr1-dr0)*d;
                    dgre=(dg1-dg0)*d;
                    dblu=(db1-db0)*d;
                    y_start*=wwidth;
                    y_end*=wwidth;
                    while (y_start<=y_end){
                                  x_start=(int)xx0+1;
                                  x_end=(int)xx1;
                                  if (x_start<0)  x_start=0;
                                  if (x_end>=wwidth)  x_end=wwidth-1;
                                  if (x_start<=x_end) {
                                                     pixel_address=address+(x_start+y_start);
                                                     last_address=address+(x_end+y_start);
                                                     d=float(x_start)-xx0;
                                                     red=rr0+dred*d;
                                                     gre=gg0+dgre*d;
                                                     blu=bb0+dblu*d;
                                                     while (pixel_address<=last_address){
                                                           *pixel_address=((int)red<<16)|((int)gre<<8)|(int)blu;
                                                           red+=dred;
                                                           gre+=dgre;
                                                           blu+=dblu;
                                                           pixel_address+=1;
                                                     }
                                  }
                                  xx0+=dx0;
                                  rr0+=dr0;
                                  gg0+=dg0;
                                  bb0+=db0;
                                  xx1+=dx1;
                                  y_start+=wwidth;
                    }
 }
 
 
 y_start=(int)y[p1]+1;
 y_end=(int)y[p2];
 if (y_start<0) y_start=0;
 if (y_end>=height) y_end=height-1;
 
 if (y_start<=y_end) {
                    d=1.0f/(y[p1]-y[p2]);
                    dx0=(x[p1]-x[p2])*d;
                    dr0=(r[p1]-r[p2])*d;
                    dg0=(g[p1]-g[p2])*d;
                    db0=(b[p1]-b[p2])*d;
                    d=1.0f/(y[p0]-y[p2]);
                    dx1=(x[p0]-x[p2])*d;
                    dr1=(r[p0]-r[p2])*d;
                    dg1=(g[p0]-g[p2])*d;
                    db1=(b[p0]-b[p2])*d;
                    if (dx0<dx1) {
                                d=dx0;
                                dx0=dx1;
                                dx1=d;
                                d=dr0;
                                dr0=dr1;
                                dr1=d;
                                d=dg0;
                                dg0=dg1;
                                dg1=d;
                                d=db0;
                                db0=db1;
                                db1=d;
                    }
                    d=float(y_start)-y[p2];
                    xx0=x[p2]+dx0*d;
                    rr0=r[p2]+dr0*d;
                    gg0=g[p2]+dg0*d;
                    bb0=b[p2]+db0*d;
                    xx1=x[p2]+dx1*d;
                    d=1.0f/(dx1-dx0);
                    dred=(dr1-dr0)*d;
                    dgre=(dg1-dg0)*d;
                    dblu=(db1-db0)*d;
                    y_start*=wwidth;
                    y_end*=wwidth;
                    while (y_start<=y_end){
                                  x_start=(int)xx0+1;
                                  x_end=(int)xx1;
                                  if (x_start<0) x_start=0;
                                  if (x_end>=wwidth) x_end=wwidth-1;
                                  if (x_start<=x_end) {
                                                     pixel_address=address+(x_start+y_start);
                                                     last_address=address+(x_end+y_start);
                                                     d=float(x_start)-xx0;
                                                     red=rr0+dred*d;
                                                     gre=gg0+dgre*d;
                                                     blu=bb0+dblu*d;
                                                     while (pixel_address<=last_address){
                                                           *pixel_address=((int)red<<16)|((int)gre<<8)|(int)blu;
                                                           red+=dred;
                                                           gre+=dgre;
                                                           blu+=dblu;
                                                           pixel_address+=1;
                                                     }
                                  }
                                  xx0+=dx0;
                                  rr0+=dr0;
                                  gg0+=dg0;
                                  bb0+=db0;
                                  xx1+=dx1;
                                  y_start+=wwidth;
                    }
 }
}                                   
Title: Re: Graphics in C++
Post by: Stonemonkey on January 14, 2007
It's the float to int conversions that are doing it, when writing a constant int value to each pixel the c++ version is now much faster.
Title: Re: Graphics in C++
Post by: ninogenio on January 14, 2007
cool! i never knew floats to ints were slow thats probably why my tri routines were slow as there was quite a lot of conversions going on.

so is it just floats to ints thats slow or also ints to floats?

would there be any chance of having a look at the tri routine now youve got the problem fixed?
Title: Re: Graphics in C++
Post by: Stonemonkey on January 14, 2007
Sorry nino, it's not really fixed. I just have it filling with solid colour atm`which can be done by changing the 2 pixel filling lines to:

*pixel_address=argb0;

but it does show how much the conversions are slowing it down.
Title: Re: Graphics in C++
Post by: taj on January 14, 2007
so is it just floats to ints thats slow or also ints to floats?

Nino,

Its just float to int. You do not need to worry about int to float. Its not just speed its *enormous* in size (60 bytes?). This is because of the very strict IEEE rules for floats.

One solution I found was to use lrintf which whilst not ideal (its still 20 bytes) is much faster. Another solution is to look into using the asm instruction fist and variants (float to int and store I think - ask Rbraz).

When I tried to solve this, I found this page quite useful:
http://mega-nerd.com/FPcast/

Title: Re: Graphics in C++
Post by: Stonemonkey on January 14, 2007
Thanks taj, I just tried the asm method from that and it's going pretty well. There was another method Jim mentioned before using some magic number.
Title: Re: Graphics in C++
Post by: Jim on January 15, 2007
It's the C rounding rules where instead of the default fpu mode of round to nearest C demands round towards zero.  So it has to switch the fpu status every time a conversion happens.  Best thing to do is not do that.  Either use fixed point or all int or all float.

Jim
Title: Re: Graphics in C++
Post by: Stonemonkey on January 15, 2007
Well here's what I've got so far:

Title: Re: Graphics in C++
Post by: Jim on January 16, 2007
Is it still slow?  You got a couple of things that might be worth fixing.
In a lot of places you do something like
Code: [Select]
float x;
int ix;
ix=5;
x=float(ix);
That's unusual.  I've never seen float used as a macro or function before.  It's supposed to look like this
Code: [Select]
float x;
int ix;
ix=5;
x=(float)ix;
Since I don't know what the macro/function does, that might or might not make a difference.

In another place you have
Code: [Select]
float x;
x=1.0f
x=x+0.4999
That's a problem because 0.4999 is a double, not a float (64bits instead of 32) so x will get converted to a double, 0.4999 will be added, and then it'll get converted back to a float.  So change it to be
Code: [Select]
float x;
x=1.0f
x=x+0.4999f
That will make a difference.  I think you got it right in most places.

You can remove the Sleep() if you don't want to yield any CPU to the rest of the PC, but that's very unfriendly.  Try Sleep(0) as a minimum.  I set it at 16 because that's ~60Hz (1000/16) and since the drawing I was doing took well under 1ms it was the thing to do.

Are you using djgpp?   Which version?

Jim
Title: Re: Graphics in C++
Post by: Stonemonkey on January 16, 2007
Thanks Jim, It's running much better now (maybe around twice as fast as fb now) that i've changed the float-int conversions and i already noticed the missing f at the ends of some numbers. I'm using devc++ and it's not complained about the x=float(ix) or anyting but i've changed it now.
Title: Re: Graphics in C++
Post by: rdc on January 17, 2007
FB will be a bit than faster GDI since it uses DirectX if installed. If DX is not installed it falls back to GDI. You might want to look at GDI+, since it supposed to be optimized for drawing routines. I have used it in xblite and it seems to work quite well, plus it has a number of extra functions that GDI doesn't have.
Title: Re: Graphics in C++
Post by: Jim on January 17, 2007
I think GDI's SetDIBitsToDevice is at least as quick as FB memcpying into a DirectX surface.  I'd have to try it to be sure though.  Never looked at GDI+ before.  How easy is it to move a 32bit window sized memory buffer into a window and have it displayed.

->Fryer.  Shame you're not using Visual Studio Express.  Now you could turn on SSE or SSE2 and get another 2-4x speed up :P  Which version of devc++ is it (which is what I meant to ask last time, doh!)

Jim
Title: Re: Graphics in C++
Post by: Stonemonkey on January 19, 2007
I'm using VC++ express now but it's slower for this than devc and for some reason much slower in release mode and I've tried with SSE/SSE2 on.
Anyway, I've also compared with using a bit of asm for the scanline loops which is a massive improvement but could there be any way to get the compiler to compete with this? I'm not wanting to get into fixed point with this btw.
Title: Re: Graphics in C++
Post by: Jim on January 19, 2007
How are you measuring the performance?

Jim
Title: Re: Graphics in C++
Post by: Jim on January 19, 2007
I've stuck it in vtune.  The piece the optimizer is having trouble with is this, especially the pixel munging.  Your assembler is much faster but it shouldn't be.  Otherwise, the optimizer is beating the other chunk of assembler easily.
Code: [Select]
"" "367" "                 d=float(x_start)-xx0;" "" ""
"" "368" "                 red=rr0+dred*d;" "" ""
"" "369" "                 gre=gg0+dgre*d;" "" ""
"" "370" "                 blu=bb0+dblu*d;" "" ""
"" "371" "                 while (pixel_address<=last_address){" "" ""
"" "372" "                     *pixel_address=((int)red<<16)|((int)gre<<8)|(int)blu;" "" ""
"" "373" "                     red+=dred;" "" ""
"" "374" "                     gre+=dgre;" "" ""
"" "375" "                     blu+=dblu;" "" ""
"" "376" "                     pixel_address+=1;" "" ""
"" "377" "                 }" "" ""

I get about 150ms for 1000 tris with assembler, and 450ms without, Release or Debug doesn't make any difference.
I'm timing it like this

Code: [Select]
unsigned long long thetime(void)
{
LARGE_INTEGER li;
QueryPerformanceCounter(&li);
return li.QuadPart;
}

unsigned long long freq;

void inittime(void)
{
LARGE_INTEGER li;
QueryPerformanceFrequency(&li);
freq = li.QuadPart;
}

void do_graphics_stuff(...)
{
...
char tmp[256];
unsigned long long t = thetime();
...thing to time
sprintf(tmp, "%I64ums\n", ((thetime()-t)*1000)/freq);
OutputDebugString(tmp);
}


You need to call inittime() somewhere in WinMain first.  I'm not timing the flip, just the triangles.
OutputDebugString() writes its output into the debugger output window in VS Express.

Jim
Title: Re: Graphics in C++
Post by: Jim on January 19, 2007
Almost certainly to do with float->int rounding for red,gre,blu, which isn't in the asm version, and is inside this tight loop.  Need to look at that.  I know you don't want to, but I'd seriously consider fixed point for the colours, because of this.  Keep it in float and see what you can do.

Jim
Title: Re: Graphics in C++
Post by: taj on January 21, 2007
If you want I have a C routine that does float to int using asm. I have one for cl somewhere and I have a macro for gcc somehwre too. Let me know...
Title: Re: Graphics in C++
Post by: Jim on January 21, 2007
Here's what I've done.  I've got this rounding routine
Code: [Select]
static inline unsigned int myfrnd(float f)
{
static const double magic = 6755399441055744.0;
double d;
d = f + magic;
return *(unsigned int *)&d;
}

#define float_to_int(in) myfrnd(in)
And I've changed the pixel line to read
Code: [Select]
*pixel_address=(float_to_int(red)<<16)|(float_to_int(gre)<<8)|float_to_int(blu);
That's just as quick as the asm version.

The magic constant is 2^51+2^52.  Adding that to a double makes the lower 32bits of the mantissa of the double equal to the integer part of the fp number.

Jim
Title: Re: Graphics in C++
Post by: Stonemonkey on January 22, 2007
Cool, thanks Jim. I was just looking into doing it that way but it wasn't quite as neat as that.
Title: Re: Graphics in C++
Post by: Jim on January 22, 2007
Sadly, that trick doesn't work with SSE2 enabled, so here's how to do it using SSE2 intrinsic functions
Code: [Select]
#include <intrin.h>
static inline unsigned int myfrnd(float f)
{
return _mm_cvt_ss2si(_mm_load_ss(&f));
}
mm_load_ss loads a float into an SSE register, and mm_cvt_ss2si converts the single precision float to a 32bit int.
I had to comment out 2 lines in the intrin.h that conflict with winnt.h (_interlockedbittestandset and _interlockedbittestandreset) to get it to work.  The great thing is that these macros work well with the optimiser, meaning this version is about 25% faster than the asm.

You can use the _M_IX86_FP to detect what mode it's building in
Code: [Select]
#if _M_IX86_FP == 0
//fpu used
#elif _M_IX86_FP == 1
//sse used
#elif _M_IX86_FP == 2
//sse2 used
#else
#error unknown fpu architecture
#endif

That way you can have different builds for different platforms.

Jim

Actually, this works in SSE mode too.  Both the instructions it generates are SSE not SSE2.
Title: Re: Graphics in C++
Post by: taj on February 09, 2007
Here's what I've done.  I've got this rounding routine
Code: [Select]
static inline unsigned int myfrnd(float f)
{
static const double magic = 6755399441055744.0;
double d;
d = f + magic;
return *(unsigned int *)&d;
}

#define float_to_int(in) myfrnd(in)
And I've changed the pixel line to read
Code: [Select]
*pixel_address=(float_to_int(red)<<16)|(float_to_int(gre)<<8)|float_to_int(blu);
That's just as quick as the asm version.

The magic constant is 2^51+2^52.  Adding that to a double makes the lower 32bits of the mantissa of the double equal to the integer part of the fp number.

Jim



Karma++++++++
nothing else to say, except that goes on my list for best post of the year.
Title: Re: Graphics in C++
Post by: Shockwave on February 09, 2007
Jim, you are awesome.
Title: Re: Graphics in C++
Post by: Rbz on February 09, 2007
Jim, you are awesome.
Indeed  8)
Title: Re: Graphics in C++
Post by: Stonemonkey on March 15, 2007
Example of what I've got so far with a couple of 2d drawing functions, this works in both DevC and VC but there is some asm (sorry Jim) that can be used if it's in VC.
Title: Re: Graphics in C++
Post by: Shockwave on March 15, 2007
Mmm. wouldn't compile in dev..

Undefined reference to setDIbitstodevice @48..

I'm thinking that it can't initialise the direct x screen.. I need to install this. Thanks for posting though! I'll look forward to seeing it when I have my compiler set up!
Title: Re: Graphics in C++
Post by: Stonemonkey on March 15, 2007
It's not direct x, did you create it as a windows application?
Title: Re: Graphics in C++
Post by: Shockwave on March 15, 2007
lol, no I will try that.
I created it as an empty project! Sorry.
Title: Re: Graphics in C++
Post by: Shockwave on March 15, 2007
Mmm. Created a windows app, it's over my head at the moment, I'll have to come back to this when I know what I am doing as I don't even know where to insert the code :P

I'm sure it looks great  :)
Title: Re: Graphics in C++
Post by: Stonemonkey on March 15, 2007
delete all the code from the project and paste mine in. It doesn't really look great though.
Title: Re: Graphics in C++
Post by: Shockwave on March 15, 2007
Thanks :)

Looks a lot better than my hello worlds!!
Title: Re: Graphics in C++
Post by: Stonemonkey on March 15, 2007
There's a lot of that sort of thing I should look into, doubt i could manage hello world without looking it up.
Title: Re: Graphics in C++
Post by: Shockwave on March 15, 2007
Triangles are more fun than printf mate :)