Author Topic: branch prediction and goto? (Read 7285 times)

Stonemonkey · « **on:** August 31, 2007 »

I've come to the conclusion now that sometimes I need to clamp U/V coordinates which is a bit of a pain. ATM I'm using IF THEN ELSE to select the bit of code that either clamps or wraps but I've heard that mis-prediction can be costly and I'm not really sure how the branch prediction works and whether or not it deals with non changing conditions inside loops and was wondering if there'd be any benefit in using goto (if it's even possible to do what i have in mind).

Code: [Select]

//in inner loop
If (clamp){
  //calculate clamped U and V;
}else{
  //calculate wrapped U and V;
}


or have something like:

//before rasterising:
Label clamp_select;
if (clamp){clamp_select=clamp_label;}else{clamp_select=wrap_label;}

//in inner loop;
Goto clamp_select;

clamp_label:
  //calculate clamped U and V;
goto clamp_select_end;

wrap_label:
  //calculate wrapped U and V;

clamp_select_end:

If the branch prediction can somehow deal with conditions not changing then I doubt it'd be a problem and if not then I don't know if there'd be any problem using goto like that either so does anyone have any idea about it?

and I'm not really sure how the label would be set up as in datatype or anything.

ninogenio · « **Reply #1 on:** August 31, 2007 »

hmm im not sure you can change labels like that and jump to option 1,2 from a goto loop.

i can see what your saying about speed but i dont think goto is classed as good c/cpp practice anymore the if then else is my prefered option.

then again if it works and its faster i guess you gotta do what you gotta do sometimes.

Stonemonkey · « **Reply #2 on:** August 31, 2007 »

Thanks nino, I don't think using goto has ever been classed as good c/c++ programming practice. There may be other things I'm wanting options for such as colouring, shading and alpha and would prefer not to need loads of different triangle functions to deal with that so I'm just looking at any alternatives although I don't much like the idea of trying to code dynamically created functions.

taj · « **Reply #3 on:** August 31, 2007 »

Stonemonkey,

In this case if I understand right I would write two rasterising pipelines: one that clamps and one that doesnt. I would chose the pipeline based on state. Yes its more code, but thats how we used to write opengl drivers for speed. Actually we had JIT compilation in the end but some common paths for benchmarks were hard coded like this.

So basically do the if outside the inner rasteriser loop is what Im saying. Is it possible in your case?

Stonemonkey · « **Reply #4 on:** August 31, 2007 »

Yep chris, that's the problem. And a reasonable solution even though it will add some to the code as there could be quite a few combinations, thanks. I guess switch/case would be the way to go with that.

spitfire · « **Reply #5 on:** September 01, 2007 »

I thought cpu cant branch predict goto, making it slower.

Jim · « **Reply #6 on:** September 01, 2007 »

It always predicts jmps correctly, since they are always taken. gotos are bad because they're not structured code, and they make it hard for compilers to optimise. They lead to spaghetti coding that can jump all over the place. Under the hood though, every if/then, switch/case or loop is coded as a goto, since that's how CPUs work.

Branch Prediction works a bit like card counting at blackjack. You want to know if you are likely to bust with the next card - since 50% of the cards are over or equal to 7 and 50% are under 7, you can take a guess whether to twist or not if you have been dealt 14 or fewer (since that's where your odds are 50:50 of going bust). Every time you see a >=7 come up, add 1 to your running total, every time you see a <7, subtract 1. Then to see if you should twist on 14, if your total is >0 then you've had more over 7s, if it's <0 then you've had more under 7s. If you've had lots of >7s you should twist, since the next card is more likely to be under 7. Don't do this in a real casino, they'll throw you out

With the branch prediction, you can do something similar. For every branch instruction in the program, create a counter. When a branch is taken, add 1 to a counter, when it's not taken, subtract 1. Next time the CPU sees the branch, it checks the counter to see if the branch is more likely to be taken than not taken, based on its history.

1. Obviously, you can't have a counter for EVERY branch instruction. In the CPU it uses the bottom N bits of the instruction's address to index an array of counters, the size of N depends on the CPU architecture.
2. The real algorithm used is far better than the +1/-1 algorithm - I think even the Pentium MMX could predict patterns of taken/not taken up to 16 long!
3. Branch prediction allows the CPU to speculatively execute code down the predicted path. If it ends up being predicted incorrectly, then all that work is lost and has to be undone before execution restarts down the alternate path.

So, Stonemoneky - if your 'if' is always taken, the branch prediction will always be correct. If it's taken in a logical pattern it will probably be predicted. If it's random, then it won't. As CPUs have got more complicated, mis-predicting a branch has become more and more expensive, I think it was something like 90+ cycles on a P4 (but it's less on a Core). You also can't compute gotos in C.

I think a jump table might work better for you, have a table of routines for each render type, but you still have the problem of maintaining the code if you have lots and lots of different versions of inner loops. You'll end up with dozens or hundreds. If you discover an optimisation you'll have to update every single version

One way round this is to use macros. Something like

Code: [Select]

#define USE_X float x
#define USE_Z float z
#define USE_RGB float r,g,b
#define USE_UV float u,v
etc.
#define INIT_X dx = x1-x0
#define INIT_Z dz = (z1-z0)/dx
#define INIT_RGB { dr=(r1-r0)/dx;dg = (g1-g0)/dx; db=(b1-b0)/dx;}
#define INIT_UV { du = (u1-u0)/dx; dv = (v1-v0)/dx;}
etc.
#define ADD_X x++
#define ADD_Z z += dz
#define ADD_RGB r+=dr, g+=dg, b+=db
#define ADD_UV u+=du, v+=dv
etc.

Then every routine can be written in terms of macros...

Code: [Select]

gouraud_poly(...)
{
USE_X;
USE_RGB;
...
INIT_X;
INIT_RGB;

while (dy)
{
  ...custom pixel plot
  ADD_X;
  ADD_RGB;
}
}

gouraud_tex_z_rgb_poly(...)
{
USE_X;
USE_Z;
USE_RGB;
USE_UV;
...
INIT_X;
INIT_Z;
INIT_RGB;
INIT_UV;
while (dy)
{
  ...pixel plot
  ADD_X;
  ADD_Z;
  ADD_RGB;
  ADD_UV;
}
}

That way you only ever have to update your macros

Jim

Stonemonkey · « **Reply #7 on:** September 01, 2007 »

Cool, lots of different ideas. Thanks.

0x4e71 · « **Reply #8 on:** September 02, 2007 »

Quote from: Jim on September 01, 2007

It always predicts jmps correctly, since they are always taken. gotos are bad because they're not structured code, and they make it hard for compilers to optimise. They lead to spaghetti coding that can jump all over the place. Under the hood though, every if/then, switch/case or loop is coded as a goto, since that's how CPUs work.

++ to that. Try to avoid goto in C/C++, it makes data flow and control flow analysis a pain and some compilers may give up analysis of the whole function if it's goto ridden, falling back to basic block-wide optimization, which leads to crappy code.

If you need to explicitly control your jumps for a particular loop you are better off isolating that bit of code and switching to inline asm for that whole function.

Stonemonkey · « **Reply #9 on:** September 02, 2007 »

ok, thanks again.

Something else i've been trying is to include the clamp variable in the clamping code, at adds a few instructions (a total 4 &s if it's to be clamped and 16 extra instructions if it's not) so i'm not sure if i'll keep it like that but it does simplify things a little.

Code: [Select]

//clamp=0(not clamped)  or clamp=-1(clamped)
//since this is really just to fix the problem of bleeding over the edges
//and I see no point in texturing beyond 0-1.0 with clamping on I'm
//only clamping the lo value to 0 and the high value to tex_width/height-1 (the mask value)

u_lo=~((u_lo>>31)&clamp)&u_lo&u_mask;

u_hi=((((u_mask-u_hi)>>31)&clamp)|u_hi)&u_mask;

v_lo=~((v_lo>>31)&clamp)&v_lo&v_mask;

v_hi=((((v_mask-v_hi)>>31)&clamp)|v_hi)&v_mask;

Whether or not I stick with this I don't know yet but I'll still need to deal with other state changes.

Is there any way that bit of clamping code could be improved further, bearing in mind that u_hi=u_lo+1 and v_hi=v_lo+tex_width?

Jim · « **Reply #10 on:** September 02, 2007 »

If you set u_mask and v_mask to 0xffffffff then no masking is done without using an if.
'clamp' to me looks like it's just isolating the sign, to give you 0 or -1 if there's an overflow?

Jim

Stonemonkey · « **Reply #11 on:** September 02, 2007 »

Ah sorry, maybe I should've said clamp=0(wrapped) instead which is what &u_mask and &v_mask do at the end of each line.

Jim · « **Reply #12 on:** September 02, 2007 »

I see

I don't think I can beat that, in fact, I admire your solution!

Jim

Stonemonkey · « **Reply #13 on:** September 02, 2007 »

Thanks, I'm thinking there's something else I could to it but can't see what. For the lo values I can do this:

Code: [Select]

lo_u & =~((lo_u>>31)&clamp)&u_mask;

but that's about it and maybe the compiler already does that.

Author Topic: branch prediction and goto? (Read 7285 times)

Stonemonkey

branch prediction and goto?

ninogenio

Re: branch prediction and goto?

Stonemonkey

Re: branch prediction and goto?

taj

Re: branch prediction and goto?

Stonemonkey

Re: branch prediction and goto?

spitfire

Re: branch prediction and goto?

Jim

Re: branch prediction and goto?

Stonemonkey

Re: branch prediction and goto?

0x4e71

Re: branch prediction and goto?

Stonemonkey

Re: branch prediction and goto?

Jim

Re: branch prediction and goto?

Stonemonkey

Re: branch prediction and goto?

Jim

Re: branch prediction and goto?

Stonemonkey

Re: branch prediction and goto?