Author Topic: WIP 3D Engine (Update : v0.07 screenie preview) [v0.06]  (Read 92913 times)

0 Members and 1 Guest are viewing this topic.

Offline Shockwave

  • good/evil
  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 17409
  • Karma: 498
  • evil/good
    • View Profile
    • My Homepage
I think that it looks great Hezad, it really does!

As Hellfire says, it really needs optimising though, please see the attachment.

This attachment was written by Stonemonkey, he developed a similar engine to yours, this example doesn't have fog etc, but take a look, this is the sort of speed you can achieve with optimisation.
Shockwave ^ Codigos
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
@hellfire : I get the point, thanks :) Couldn't I create an Exp() look up table ? I mean precalculate a huge Exp() table and seeking the good value inside ?


@Shockwave : Thanks for the demo ...     those ~55 FPS make me dream ^^


@Both :
Thanks for the feedback :) Yep, optimizing is bloody necessary now if I want to continue to add features :S

Offline Shockwave

  • good/evil
  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 17409
  • Karma: 498
  • evil/good
    • View Profile
    • My Homepage
Your new features will not look thier best without those optimisations Hezad, but take heart.

Stonemonkeys demo there has quite a lot of triangles, gourad shading, perspective correct mapping, clipping and mip mapping and the fps is limited too.. It can go even faster than that so there's no reason why you should not be able to at least treble your fps in time :)
Shockwave ^ Codigos
Challenge Trophies Won:

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
There seems to be some lack of precision.
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
@Shockwave :
thanks :) I hope I'll succeed in such an objective ^^

@hellfire :
ouch ! Never noticed it o_O Could it come from not precise enough data types ? (eg : single instead of double)

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Quote
Could it come from not precise enough data types ? (eg : single instead of double)
No. One reason is missing subpixel-/-texel-precision (I wrote something about that somewhat earlier).
The problem is that the coordinates you interpolate along the slopes of your triangle are valid for a floating-point position (cyan dots) on the edge, but the actual (integer) pixel-position is significantly different:

(image assumes all coordinates are rounded up)

Concerning optimization I restructured your source a bit.
It's still far from optimal but maybe you can get some ideas out of it:
Code: [Select]
#Macro SmoothLine_Tex_Fog(x1,x2,y, _
                    U_z1,V_z1,_
                    U_z2,V_z2,_
                    Texture, _
                    r1,r2,g1,g2,b1,b2,_
                    _z1,_z2,_
                    CurOpacity)
   
    dim as integer dR,dG,dB
    dim as integer r,g,b
    dim as single sr1,sr2,sg1,sg2,sb1,sb2
    dim as single dU_z,dV_z,U_z,V_z
    dim as single Uu_z1,Vv_z1,Uu_z2,Vv_z2
    dim as integer itx1,itx2
    dim as single xDiv,Cur_Z,_zz1,_zz2,d_Z
    dim as integer it, LODVal, MipU, MipV
    dim as integer CurTex,Cr,Cg,Cb
    dim as single tmp_Z

    ' this isn't required when all polygons have equal orientation!
    if x1<x2 then
        _zz1 = _z1
        _zz2 = _z2
       
        itx1 = x1
        itx2 = x2
       
        Uu_z1 = u_z1
        Uu_z2 = u_z2
        Vv_z1 = v_z1
        Vv_z2 = v_z2
       
        sr1 = r1
        sr2 = r2
        sg1 = g1
        sg2 = g2
        sb1 = b1
        sb2 = b2
    else
        _zz1 = _z2
        _zz2 = _z1
       
        itx1 = x2
        itx2 = x1
       
        Uu_z1 = u_z2
        Uu_z2 = u_z1
        Vv_z1 = v_z2
        Vv_z2 = v_z1
       
        sr1 = r2
        sr2 = r1
        sg1 = g2
        sg2 = g1
        sb1 = b2
        sb2 = b1
    end if
   
    ' deltas can be computed once per triangle!

    xDiv = 1/(itx2 - itx1)

    ' delta rgb (integer)
    dR = xDiv * (sr2 - sr1) * 256.0
    dG = xDiv * (sg2 - sg1) * 256.0
    dB = xDiv * (sb2 - sb1) * 256.0

    ' delta uvz (float)
    d_Z = xDiv * (_zz2 - _zz1)
    dU_z = xDiv * (Uu_z2 - Uu_z1)
    dV_z = xDiv * (Vv_z2 - Vv_z1)
   
    Cur_Z = _zz1

    U_z = Uu_z1
    V_z = Vv_z1
   
    r = sr1 * 256.0
    g = sg1 * 256.0
    b = sb1 * 256.0
           
    it = y*xRes + itx1
       
    dim length as integer
    length= itx2-itx1
   
    dim u1 as integer
    dim v1 as integer
    dim u2 as integer
    dim v2 as integer
    dim du as integer
    dim dv as integer

    dim shiftu as integer
    dim shiftv as integer

    ' get first set of perspective correct u,v
    tmp_Z = 65536.0/Cur_Z
    u1 = U_z*tmp_Z*Texture.SizeX
    v1 = V_z*tmp_Z*Texture.SizeY
   
    while length>0
      ' draw 16 pixels (or rest of scanline)
      dim size as integer
      size= 16
      if size > length then size= length
      length-=size

      ' step 16 pixels further
      U_z += dU_z*16
      V_z += dV_z*16
     
      ' get 2nd set of perspective u,v
      tmp_Z = 65536.0/(Cur_Z+d_Z*16)
      u2 = U_z*tmp_Z*Texture.SizeX
      v2 = V_z*tmp_Z*Texture.SizeY
     
      ' linear deltas
      du= (u2-u1) shr 4
      dv= (v2-v1) shr 4

      ' mip-level from deltas
      mipu = log2( abs(du shr 16) )
      mipv = log2( abs(dv shr 16) )
      LODVal = iif(mipU>mipV,MipU,mipV)
      If LODVal>Texture.NbLOD then LODVal = Texture.NbLod

      ' store this in Texture
      shiftu= log2(Texture.SizeX)
      shiftv= log2(Texture.SizeY)

      shiftu-= LODVal
      shiftv-= LODVal
     
      dim masku as integer
      dim maskv as integer
           
      masku= (1 shl shiftu)-1
      maskv= (1 shl shiftv)-1

      for i as integer = 1 to size
       
        If Cur_Z > ZBuffer(it) then
               
            ' store 1/z in zbuffer (zbuffer is now single and cleared to 0!)
            ZBuffer(it) = Cur_Z
           
            dim u as integer
            dim v as integer

            u= (u1 shr (16+LODVal)) and masku
            v= (v1 shr (16+LODVal)) and maskv
           
'            CurTex = Texture.GFX[ (v shl 8) or u ]
            CurTex = Texture.MipMap(LODVal)[ (v shl shiftu) or u ]
           
            Cb = (CurTex and 255)
            Cg = ((CurTex shr 8) and 255)
            Cr = ((CurTex shr 16) and 255)
               
            Cb= Cb*b shr 16
            Cg= Cg*g shr 16
            Cr= Cr*r shr 16
           
            ScrPtr[it] = rgb(Cr,Cg,Cb)
           
        end if
       
        it+=1
           
        r += dR
        g += dG
        b += dB
        u1+= du
        v1+= dv
        Cur_Z+=d_Z
           
      Next
     
      ' reuse 2nd set of u,v
      u1= u2
      v1= v2
     
    Wend
   
#EndMacro
(Skipped the fog- & transparency-parts for readability)
« Last Edit: November 06, 2008 by hellfire »
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
thanks, Thanks, THANKS ;D I'm reading it, and it's really interesting :)

edit : I just tried it in my code ... DUDE, YOU SHOULD BE CANONIZED BY THE CODERS CHURCH  :goodpost: I get between 15 and 25 FPS (35 if just ~500 triangles are in view) instead of the 9 FPS I got before :P yehookey, the v0.06 will have a big improvement in speed, thanks to you mate.

I have a question though,
In a comment, you say :

Quote
' deltas can be computed once per triangle!

But I don't understand how it could be possible :S Since I use 1/(x2-x1) to calculate the deltas, don't I need to recalc it on each scanline ((x2-x1) changes on every line) ?


Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Quote
Quote
deltas can be computed once per triangle!
But I don't understand how it could be possible :S
Well, the reason why this is possible is a bit difficult to explain without rolling out a lot of math.
It's probably easy to understand by showing a picture:

On the left you see a textured cube with some perspective.
The texture-coordinates are interpolated as u/z, v/z, 1/z and reconstructed to u'= (u/z)/(1/z) v'= (v/z)/(1/z) (just the same as you do now).
In the middle you can see the perspective-correct texture-coordinates u',v' shown as red and green (a bit differently scaled so you can see a gradient).
On the right you can see the interpolated u/z and v/z (also in red and green).
What you can see here is that the gradients are parallel(!)
That's the reason why you can interpolate them linearly (instead of the original u,v).
This also means that the deltau,deltav between two pixels doesn't change from one scanline to another but is *always* the same.
To get the highest precision, you can simply calculate the deltas at the longest scanline.

Quote
I get between 15 and 25 FPS (35 if just ~500 triangles are in view) instead of the 9 FPS I got before
A big portion of the speed-improvement was achieved by removing float-to-integer-conversions.
Conversion is expensive because cpu & fpu don't have a direct connection but need to write/read values from memory.
In your source there are many more situations that can benefit significantly from such optimization.
My personal favourite is this:
Code: [Select]
For i as integer = 0 to xRes-1 step 2
    For j as integer = 0 to yRes-1 step 2
        RenderPtr[i*.5+xRes_2*j*.5] = ...

The key to proper optimization is to find those functions where your cpu spends most of the time.
Luckily there's a counter in your cpu which is increased with each processor-cycle.
You can simply query it before and after a function-call and the difference is the number of cycles spend inside the function.
To query the counter you can use this:
Code: [Select]
function getCpuTick() as double
    dim as double ticks
    asm
        lea edi,[ticks]
rdtsc
mov [edi],eax
mov [edi+4],edx
fild qword ptr [edi]
fstp qword ptr [edi]
    end asm
    return ticks
end function
« Last Edit: November 06, 2008 by hellfire »
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
Thanks for all those tips, it'll definitely help me to optimize the engine :)

I implemented in every render procedure the restructured code you shared two or three posts ago and it work pretty well :) As I said, the speed improvement is totally noticeable, thanks again for that. But there's a little problem. It seems like the texture tiling doesn't work anymore ! Well in fact it works perfectly on the terrain, but the sphere I use for the sky is really weirdly tiled. I attach a screenshot to show what I mean. And I post here the render procedure used for the skysphere :

Code: [Select]
#Macro FlatLine_Tex(x1,x2,y, _
                    U_z1,V_z1,_
                    U_z2,V_z2,_
                    Texture, _
                    r_,g_,b_,_
                    _z1,_z2,_
                    CurOpacity)
   
    dim as integer r,g,b
    dim as single dU_z,dV_z,U_z,V_z
    dim as single Uu_z1,Vv_z1,Uu_z2,Vv_z2
    dim as integer itx1,itx2
    dim as single xDiv,Cur_Z,_zz1,_zz2,d_Z
    dim as integer it, LODVal, MipU, MipV
    dim as integer CurTex,Cr,Cg,Cb
    dim as single tmp_Z

    ' this isn't required when all polygons have equal orientation!
    if x1<x2 then
        _zz1 = _z1
        _zz2 = _z2
       
        itx1 = x1
        itx2 = x2
       
        Uu_z1 = u_z1
        Uu_z2 = u_z2
        Vv_z1 = v_z1
        Vv_z2 = v_z2
       
    else
        _zz1 = _z2
        _zz2 = _z1
       
        itx1 = x2
        itx2 = x1
       
        Uu_z1 = u_z2
        Uu_z2 = u_z1
        Vv_z1 = v_z2
        Vv_z2 = v_z1
       
    end if
   
    ' deltas can be computed once per triangle!

    xDiv = 1/(itx2 - itx1)

    ' delta uvz (float)
    d_Z = xDiv * (_zz2 - _zz1)
    dU_z = xDiv * (Uu_z2 - Uu_z1)
    dV_z = xDiv * (Vv_z2 - Vv_z1)
   
    Cur_Z = _zz1

    U_z = Uu_z1
    V_z = Vv_z1
   
    r = r_ * 256.0
    g = g_ * 256.0
    b = b_ * 256.0
           
    it = y*xRes + itx1
       
    dim length as integer
    length= itx2-itx1
   
    dim u1 as integer
    dim v1 as integer
    dim u2 as integer
    dim v2 as integer
    dim du as integer
    dim dv as integer

    dim shiftu as integer
    dim shiftv as integer

    ' get first set of perspective correct u,v
    tmp_Z = 65536.0/Cur_Z
    u1 = U_z*tmp_Z*Texture.SizeX
    v1 = V_z*tmp_Z*Texture.SizeY
   
    while length>0
      ' draw 16 pixels (or rest of scanline)
      dim size as integer
      size= 16
      if size > length then size= length
      length-=size

      ' step 16 pixels further
      U_z += dU_z*16
      V_z += dV_z*16
     
      ' get 2nd set of perspective u,v
      tmp_Z = 65536.0/(Cur_Z+d_Z*16)
      u2 = U_z*tmp_Z*Texture.SizeX
      v2 = V_z*tmp_Z*Texture.SizeY
     
      ' linear deltas
      du= (u2-u1) shr 4
      dv= (v2-v1) shr 4

      ' mip-level from deltas
      mipu = eng_log2( abs(du shr 16) )
      mipv = eng_log2( abs(dv shr 16) )
      LODVal = iif(mipU>mipV,MipU,mipV)
      If LODVal>Texture.NbLOD then LODVal = Texture.NbLod

      ' store this in Texture
      shiftu= eng_log2(Texture.SizeX)
      shiftv= eng_log2(Texture.SizeY)

      shiftu-= LODVal
      shiftv-= LODVal
     
      dim masku as integer
      dim maskv as integer
           
      masku= (1 shl shiftu)-1
      maskv= (1 shl shiftv)-1

      for i as integer = 1 to size
       
        If Cur_Z > ZBuffer(it) then
               
            ' store 1/z in zbuffer (zbuffer is now single and cleared to 0!)
            ZBuffer(it) = Cur_Z
           
            dim u as integer
            dim v as integer

            u= (u1 shr (16+LODVal)) and masku
            v= (v1 shr (16+LODVal)) and maskv
           
'            CurTex = Texture.GFX[ (v shl 8) or u ]
            CurTex = Texture.MipMap(LODVal)[ (v shl shiftu) or u ]
           
            Cb = (CurTex and 255)
            Cg = ((CurTex shr 8) and 255)
            Cr = ((CurTex shr 16) and 255)
               
            Cb = Cb*b shr 16
            Cg = Cg*g shr 16
            Cr = Cr*r shr 16
           
            If CurOpacity<>1 then
                dim as integer br,bg,bb,ptVal
               
                ptVal = ScrPtr[it]
               
                br = (ptVal shr 16) and 255
                bg = (ptVal shr 8) and 255
                bb = ptVal and 255

                Cr = CurOpacity * (Cr - Br) + Br
                Cg = CurOpacity * (Cg - Bg) + Bg
                Cb = CurOpacity * (Cb - Bb) + Bb
           
            end if
           
            if Use_Fog then
                Dim as single BlendFunc
               
                select case fog_method
                case LINEAR
                   
                    blendfunc = 1-_MAX_Z/ZBuffer(it)
                       
                    Cr = BlendFunc * (Cr - Fog_Color.r) + Fog_Color.r
                    Cg = BlendFunc * (Cg - Fog_Color.g) + Fog_Color.g
                    Cb = BlendFunc * (Cb - Fog_Color.b) + Fog_Color.b
                   
                case EXPONENTIAL
                   
                    blendfunc = exp(-(_MAX_Z/ZBuffer(it)) * fog_Density)
                       
                    Cr = BlendFunc * (Cr - Fog_Color.r) + Fog_Color.r
                    Cg = BlendFunc * (Cg - Fog_Color.g) + Fog_Color.g
                    Cb = BlendFunc * (Cb - Fog_Color.b) + Fog_Color.b
                       
                case HETEROGENOUS
                    '' placeholder
                end select
            end if
           
            ScrPtr[it] = rgb(Cr,Cg,Cb)
           
        end if
       
        it+=1

        u1+= du
        v1+= dv
        Cur_Z+=d_Z
           
      Next
     
      ' reuse 2nd set of u,v
      u1= u2
      v1= v2
     
    Wend
   
#EndMacro


( its the macro used for Flat shading, but I tried with gouraud shading and there's the same problem :S )

As you noticed, I didn't change the deltas calculations to be done once per triangle yet, I'd like to understand what's going on with the tiling before :)

For information, I tried to change the tiling coefficient (I tried higher and lower) and it seems like the selected mipmap level is wrong with big triangles (like the ones on the skysphere), don't know if it can help.

thanks again !

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
There's some ugly inconsistency in your mipmap handling:
TextureT.MipMap(x) starts at x=1 (which contains the first mipmap).
x can be 0 though (the original texture is to be used), which causes an out-of-bounds array-access.
Accidentally the correct pointer "GFX" is stored at that adress ;D
I strongly recommend to handle that a bit cleaner.
This doesn't seem to cause your problem, though.
Where do you scale your texture-coordinates according to MaterialT.TileX/.TileY ?
And what exactly does eng_log2() do and how does it differ from Log2() ?
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
Hey :)

Quote
x can be 0 though (the original texture is to be used), which causes an out-of-bounds array-access.
Accidentally the correct pointer "GFX" is stored at that adress Grin
I strongly recommend to handle that a bit cleaner.

Yep, i'll change that :)

Quote
Where do you scale your texture-coordinates according to MaterialT.TileX/.TileY ?

well in fact each UV coords of a triangle is multiplicated by TileX/Y variable before rendering the textures. That's why I used U = U-int(U) before (to get the U val between 0 and 1). I tried to use it with your code but I was a bit lost between the shifts, UV calcs, etc.. :P

Quote
And what exactly does eng_log2() do and how does it differ from Log2() ?

oh sorry I forgot the tell about it :P in fact eng_log2() is EXACTLY the same as the previous Log2() ^^ just the name changed (that was because I tried the crt/maths.bi version of exp_() to see if it was faster but it was not. I changed the name of the sub because there was a sub in maths.bi also called Log2() )

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Sorry Hazard, I forgot to mention that texture-addressing assumes the texture-size to by a power of 2 (your sky-texture is 300x300 pixels though).
I believe this is a reasonable limitation and matches well with the concept of mipmapping.
Another thing I noticed is that, when calculating the scanline-deltas, you use the integer-coordinates of x1,x2:
Code: [Select]
    dim as integer itx1,itx2
    ...
    itx1 = x1
    itx2 = x2
    ...
    xDiv = 1/(itx2 - itx1)
    d_Z = xDiv * (_zz2 - _zz1)
    ...
That makes you loose a lot of precision.
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
I changed the size of the texture to be 256*256 and it works PERFECTLY :D thanks :) I don't even know how I could forget it (the power of 2 thing) >_<

Quote
[...] That makes you loose a lot of precision.
hum.. Once again I don't understand lol since the x delta is based on the screen pixels delta, shouldn't it be integers coords anyway ?

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Quote
since the x delta is based on the screen pixels delta, shouldn't it be integers coords anyway ?

Have a look at the image (triangle) above again...
You interpolated u,v,z along the left and right side of your triangle.
These coordinates are valid on the edge of the triangle (precisely at the x-coordinate which was interpolated, too).
The pixel starts somewhat later (or earlier, depends on rounding - later you should also take into acount the space "skipped" from the edge to the first pixel of the scanline).
So when dividing by the number of pixels you're somewhat away from the actual space between the two sets of coordinates.

In other words:
If you have 7.9 and want to know how much is 1.0 you wouldn't divide it by 7 ;)

Of course the same applies to the interpolation along the edges (y-direction).


« Last Edit: November 07, 2008 by hellfire »
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
Quote
Have a look at the image (triangle) above again...

erm .. yeah sorry I spoke before thinking >< I modified the render subs and macros and it looks way better :) Now I notice a small gap when I look for clipped triangles from a really near point of view. But it must be a precision stuff again, I'll check that :)

I continue to optimize stuff with the advises you shared :)


oh by the way, my pseudo's not "hazard"  :2funny:

anyway, thanks again :)

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Quote
oh by the way, my pseudo's not "hazard"
Oops, sorry dude.
I also just realized that I have three folders on my pc, each containing a different version of your engine.
Their names are "hazad", "hezard" and "hazerd"  ;D
Feel honoured, they're usually called "crap" or "shit" ;)
But I promise to pay some more attention from now on :)
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
Don't worry, I don't feel outraged ^^ And I am honored to have my own folders on your hard-drive :D

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Now that the zbuffer contains floating-point-values, the ztest became a bit suboptimal:
Code: [Select]
fld dword ptr [ebp-380]    ' load Cur_Z (floating-point)
fcomp dword ptr [eax]      ' Compare with Zbuffer(it)
push eax                   ' Save Register ax
fnstsw ax                  ' load fpu-state to ax
test ah, 0b01000001        ' test for bit containing "lower" state
pop eax                    ' restore register ax
jnz .Lt_05EA               ' jump if bit was set
  ' render pixel
.Lt_05EA:
  ' continue
(piece of assembly-output from the freebasic compiler)

As this is a general overhead even for those pixels which don't pass the ztest, I suggest you scale your interpolated z-value to reasonable size and store it into an integer zbuffer:
Code: [Select]
dim as integer Cur_Z,d_Z
...
d_Z = xDiv * (_zz2 - _zz1) * 2147483647.0
Cur_Z = _zz1 * 2147483647.0

for i as integer = 1 to size
  If Cur_Z > ZBuffer(it) then
    ZBuffer(it) = Cur_Z
    ...
  end if
  ...
  Cur_Z+=d_Z
Next
(Using 2147483647.0 [2^31-1] assumes your front-clipplane is >=1)

Alternatively, because the binary bitset of strictly increasing positive floating-point values is also strictly increasing, you can simply "assume them as integers":
Make integer-pointers to your (single-)zbuffer and (single-)z-value and compare those.
Both ways end up equally:
Code: [Select]
mov eax, dword ptr [ebp-380] ' load Cur_Z (inetger)
cmp dword ptr [eax]          ' Compare with Zbuffer(it)
jle .Lt_05EA                 ' jump if smaller or equal
 ' render
.Lt_05EA:
 ' skip

If your sub (including macros) is called often and has a lot of local variables, declare them static.
Otherwise the compiler will initialize all local variables to zero when it is called, which looks like this:
Code: [Select]
mov dword ptr [ebp-4], 0
mov dword ptr [ebp-8], 0
mov dword ptr [ebp-12], 0
mov dword ptr [ebp-16], 0
mov dword ptr [ebp-20], 0
mov dword ptr [ebp-24], 0
mov dword ptr [ebp-28], 0
mov dword ptr [ebp-32], 0
mov dword ptr [ebp-36], 0
mov dword ptr [ebp-40], 0
mov dword ptr [ebp-44], 0
mov dword ptr [ebp-48], 0
mov dword ptr [ebp-52], 0
mov dword ptr [ebp-56], 0
mov dword ptr [ebp-60], 0
mov dword ptr [ebp-64], 0
mov dword ptr [ebp-68], 0
mov dword ptr [ebp-72], 0
mov dword ptr [ebp-76], 0
mov dword ptr [ebp-80], 0
mov dword ptr [ebp-84], 0
mov dword ptr [ebp-88], 0
mov dword ptr [ebp-92], 0
mov dword ptr [ebp-96], 0
mov dword ptr [ebp-100], 0
mov dword ptr [ebp-104], 0
mov dword ptr [ebp-108], 0
mov dword ptr [ebp-112], 0
mov dword ptr [ebp-116], 0
mov dword ptr [ebp-120], 0
mov dword ptr [ebp-124], 0
mov dword ptr [ebp-128], 0
mov dword ptr [ebp-132], 0
mov dword ptr [ebp-136], 0
mov dword ptr [ebp-140], 0
mov dword ptr [ebp-144], 0
mov dword ptr [ebp-148], 0
mov dword ptr [ebp-152], 0
mov dword ptr [ebp-156], 0
mov dword ptr [ebp-160], 0
mov dword ptr [ebp-164], 0
mov dword ptr [ebp-168], 0
mov dword ptr [ebp-172], 0
mov dword ptr [ebp-176], 0
mov dword ptr [ebp-180], 0
mov dword ptr [ebp-184], 0
mov dword ptr [ebp-188], 0
mov dword ptr [ebp-192], 0
mov dword ptr [ebp-196], 0

In TextureT.GenerateMipMap you should take into account that texture do not need to be quadratic.
Especially for sky-domes it's common to have 1024x256 or something like that.
« Last Edit: November 10, 2008 by hellfire »
Challenge Trophies Won:

Offline Hezad

  • Sponsor
  • Pentium
  • *******
  • Posts: 613
  • Karma: 44
  • I believe .. in Patrick.
    • View Profile
    • Hezad.com Web hosting
Cool :D thanks a lot for this tip, I'll code it immediately :)

edit : erm .. Since d_Z is used for the U,V coordinates and LOD calculation, and since theses are already subjects to integers conversions (the reorganization you shared some posts ago), if I modify d_Z, all the nexts calcs involving it become wrong :S Must I reconvert d_Z before ?

A little preview of a new feature of the v0.06 before I release it : the klein bottle generation :D

(You can see a little light problem on the screenshot, it's because of the normals, since the Klein has only one side, some normals are not okay, I don't really know how to fix that for now but it's not a problem, the Klein bottle generation is more a gadget than anything else :D )
« Last Edit: November 10, 2008 by Hezad »

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Nice pipe ;D

d_Z isn't required for LOD calculation if you lay out your code as I suggested before.
If it's a problem to scale the interpolated z, you can also avoid the float-compare as described above:
Code: [Select]
dim as integer ptr izbuf= zbuffer
dim as integer ptr izval = @Cur_Z

...
for i as integer = 1 to size
  if *izval > izbuf(it) then
    zbuffer(it)= Cur_Z
    ...
  end if

  Cur_Z+=d_Z
  ...
end
(untested)

Another thing you should look after is the fact that your triangle-structure is 696 Bytes big!
It doesn't make sense to store a copy(!) of the material for each triangle.
I suggest keeping an array of (different) materials, each with a list of triangles.
Also try to reduce the vertex-size to a minimum. Remember: A "vertex" is the data necessary for rasterization and not a trashbin of coordinates ;)
Personally I would create different types of materials, each doing a specified job.

Notice that the compare in GenerateTore and GenerateSphere doesn't work due to rounding errors:
Code: [Select]
dPhi = PIx2/nbFaces
for theta as single = 0 to PIx2 step dphi
  ...
  if theta<>PIx2 then  ' this is always the case!
- seems like the sphere has an additional row of triangles (tested with this texture)
« Last Edit: November 10, 2008 by hellfire »
Challenge Trophies Won: