Dark Bit Factory & Gravity
PROGRAMMING => General coding questions => Topic started by: Optimus on July 05, 2006
-
Hi.
I am searching for an alternative algorithm on the decission whether to render a polygon (how is it called with one two words?). I really need that one because it might help me optimize some bits on my code and write diferrent clipping algorithms too, avoid z-buffering in some objects without having to write a polygon sorting routine (which I also have to do with other objects).
The first algorithm was checking the polygon normal and if it was showing away from the view (either by calculating the vector product with the view vector or simply if it's Z was smaller than zero) it wouldn't call the drawtriangle function I have.
But there is another way that doesn't use the normals and can more true in some cases (for example, polygons in a cube which are hidden but still their normal vector have not yet below 0 in angle to the view because of the projection) which in thought sounded simple but got me puzzled when I tried to think of it (maybe I think too complicated) and I'd need some explanation or source code how to do it (but mostly a good explanation). In that one, simply the orientation of the 3 points of a triangle on the screen (the final 2d points, after projected) could give me a clue whether I should render the triangle or not. Even if it seemed simple to think of it I never was able to make it work. Have you done this kind of algorithm?
-
I don't know if I am reading this right Optimus but it seems to me that all you need to use is the cross product formula.
You can decide whether to draw or eliminate the face depending on the cross product being positive or negative.
here is the formula in Freebasic code;
vx1= x1-x2
vy1= y1-y2
vx2= x3-x2
vy2= y3-y2
n= vx1*vy2-vx2*vy1
If n<0 THEN
Draw polygon
Else
Don't draw polygon, make a cup of tea and watch the football instead.
End if
A few caveats, this will work without a hitch as long as you are using convex objects like cubes, pyramids, hedra etc. If you have a non convex object where the faces can overlap then you will need to do some kind of sort or w or z buffer routine. For demo objects like cubes etc, this is fast and is an accepted way of doing things.
Basically if the number is positive / negative then you can tell if the points are in clockwise / anti clockwise order.
You can even fake fast lightsourcing with this method because you can divide the cross product by some reciprocal to shade the faces.
You could of course normalise the cross product ie. make it a vector with a length of 1 and still do the light sourcing properly too, I'm waffling now. Hope it helps.
-
Nb. One more thing, make sure that you define all your face connections in a consistent manner, either do them all clockwise or all counter-clockwise or it won't work.
-
I've used a similar thing to shockwave but instead of comparing to 0, I compared 2 parts of the calculation
if (sy1-sy0)*(sx2-sx0)>(sy2-sy0)*(sx1-sx0) then draw_triangle
But to make use of that method it has to (in my case) come after the near plane clipping and perspective transform so what I'm using is similar to what optimus was doing but instead of just using the z value of the normal which as you've pointed out isn't always true I take the dot product of the triangles normal and the 3d postition of any of the triangle's points in camera space(without any perspective transform) which can be done much earlier.
While I've been writing this I've just reaslised that I'm rotating every triangle normal but don't need to, it's possible to carry out the reverse rotation on the camera in relation to the object first then use the untransformed triangle normals to be able to test even earlier.
EDIT:
I do still use the cross product of the screen triangle but only to find the area of the triangle for mipmapping but that happens much later.
-
Here's a little snippet to show what I'm on about, here I'm using linked lists:
'cx,cy,cz=coordinates of the camera in object/mesh space
tri=mesh->first_triangle
while tri<>0
  if (tri->nx*(tri->v0->x-cx)+tri->ny*(tri->v0->y-cy)+tri->nz*(tri->v0->z-cz))<0.0 then
    draw_textured_triangle tri,buffer
  end if
  tri=tri->next_triangle
wend
At the moment I rotate all the vertices before this but that's not necessary for this bit of the code to work and it might be possible to only rotate the vertices as they're required and set some kind of flag so I'm not rotating each vertex more than once as they're used on multiple triangles.
-
Heh! It works now!!!
Well thanks! Hmm,. so it's just a 2d version of the dot product and upon the screen points, NOT the projected ones (that was my main problem). I think I get it now. That really works well now alone with Z-buffer disabled for two of my demo objects like the plasma cube or the led blur energy drink cylinder ;D
For the rest of the objects I should also code a little polygon sorting routine. Mmm.. I can take the Z average of the three points of each triangle and then use a quicksort I have here for my vectorballs.
My original purpose was to remove Z-buffer in my GP32 demo. Yesterday, accidentally I disabled the Z-buffer in one object on my demo and was puzzled at first why it is so much faster (Too much memory read and write with Z-buffer here, also the GP32 has only 16kb data cache ;PPP). And then I thought that especially objects like the cube or the cylinder could do without z-buffer and even without z-sorting if I had the thing in question. I had the other thing with 3d dot product after projection but this one would even display the polygons that are close to the 90 angle which although because of the projection should be hidden, so even in these objects you could see some glitches! Now I get double speed and flat and gouraud objects without z-buffer but just a 25% increase in textured object because the texture reading still kills the cache. Althogh my plasma cube got almost double speed too, because I calculate the plasma in realtime by putting U,V in some plasma function directly in my rasterizer routine ;)
Yey! Some parts on my demo started crawling on my GP32 but now I can save. I am still impressed how many more optimizations I can do in this machine, but won't do now because I have to release the demo lol. But maybe for the new GP2X I'll optimize the engine much more. There is hidden power I guess if coded corectly ::)
-
Sorting by poly could be quite slow particularly with objects with a lot of polys or scenes with lots of objects and there will often be errors, you mention that texture reading is also something that slows it down.
Something to consider would be to still use a z buffer but sort objects from front to back, that way you draw the stuff at the front first then when drawing the stuff behind if the z test fails there's no need to calculate u,v or read from the texture or write to either the depth or color buffers.
If you don't use a z buffer and you sort from back to front you'll have to calculate u,v and read from the texture and write to the screen buffer for every pixel of every poly.
EDIT:
That also goes for any shading (or plasma) calculations you have in your rasteriser.
-
I'd also add a little bit of a caution here too, you'll get much better results from making sure that your texture routine is as fast as it can be than your sort routine. Quicksort or Radix sort will be pretty fast but you'll generally get much better value from optimising parts of your code that are really intensive, such as your textures.
I've personally found that the sort routine used and the hidden surface removal method makes far less difference to the final speed than spending a good long time on the draw routine and using as many integers as possible and getting rid of as many divides as possible.
-
In my case, this worked out well. The triangle rasterizer was already optimized enough (fixed point, no divisions), though I was searching for some more speed. In objects like the cube I wouldn't even need z-sorting or z-buffering. In some other objects there are very few polygon areas that are hidden, so I generally gained more than I lost. But Stonemonkey's ideas are something that I liked and still consider to think about, maybe for specific objects. I think I am currently doing the U,V interpolation outside the zbuffer check inner loop, at least the U could maybe done inside but not the V. Also, today I decided to reduce the texture size on some parts from 256*256 to 128*128 or 64*64 and I had some good speed improvement because in GP32 the cache is always the case :)
And yes, I am afraid of using the sorting routine on the beethoven object with 5000 polygons. I have to test this one and see the loss. Maybe switching to z-buffer is important for too much polygons.
p.s. I am currently thinking that I should maybe check S-buffer or the other method (W-buffer? What is this???).
-
w-buffer is just perspective correct z-buffer 1/z. If your using p.c. textures then, if I remeber right, it really simple to implement the w-buffer.
-
As you are using smaller textures and getting the speed gains, you might as well include mip-mapping, get the area of the triangle from the cross-product and you'll need to generate several textures from the original, texture map using the texture that most closely matches the area of the triangle.
That way you'll have a good compromise between quality and speed.. "Dancing pixels" will become much much less noticeable on smaller polygons too so in some ways your routine will look better and even closer to hardware stuff.
-
Mipmapping would be a cool feature to add and it would save me from both cache and display as you say. It's used on Quake and Doom versions on GP32 for the same reasons. Mmm,. the cache hasn't changed even in GP2X (it's 16kb for data plus 16kb for code actually)
W-buffer. I get it, but what is the benefit from doing it in W instead of Z? Maybe since you are doing perspective correct and already interpolating W, you wouldn't need to do it twice for Z too? That's what I can think of..
-
In perspective correct mapping (there are other methods) you interpolate u/z, v/z and 1/z and to find the value of z for each pixel you divide 1 by the interpolated value for 1/z. So if you were to use the value of z in the depth buffer you need the divide even if it's going to fail, with w you can compare the value in the depth buffer directly with the interpolated value and only need the divide if the test passes. Another thing is that the buffer can be cleared with 0 instead of clearing with a large z value.
-
Keep meaning to mention, the GP32/ARM chip doesn't have a hardware integer divide instruction, so unless you're using floating point this method for perspective correction will suck really hard on that platform. Much better to come up with a polygon subdivision technique based on u/v error correction.
Jim
-
That rules out a w buffer then. It depends on what you're rendering but for scenes with many textured objects I'd still suggest some kind of depth buffer and front to back sorting of the objects. For sinlge objects or only a few basic objects you might get away with poly sorting but anything more complex is likely to give errors and could also be slow if there's a lot of polys.
-
W-buffer. I get it, but what is the benefit from doing it in W instead of Z? Maybe since you are doing perspective correct and already interpolating W, you wouldn't need to do it twice for Z too? That's what I can think of..
A w-buffer gives a linear representation of depth, whereas a z-buffer gives a non-linear representation where you get lots of accuracy for z values close to the camera, but you lose accuracy further away from the camera (which is desirable in most cases). I guess that w-buffer would probably work well in combination with perspective texture mapping...
Nice demo btw Optimus. I'm almost tempted to get a Gampark myself now :)
-
Thanks! If you decide to buy one, get the new GP2X which rules more and where the gamepark scene has moved too. There are almost no new stuff for GP32 released these days :(
I get it about w-buffer now. I guess the thing to try next in my engine is coding a perspective correct tmapper and try out the w-buffer. As about the divisions and ARM there must be a way to avoid them as I did with the other 3d code and Juhlia effect in the GP32 demo. Fixed point and a fairly small division table did well enough it's job, except from the fact that it made my code very bloated. I wasn't expecting that I could avoid floats or division in such maths, so maybe perspective correct tmapper can be done in the similar way too. I'll code first a floating point version that works and then try to optimize it for ARM. The only problem with fixed point was that my values were very close to overflow, I had some bugs because of this which I corrected by changing a bit the precision (not noticably). Would be even harder to try to render big worlds with this engine I believe..