Author Topic: half-life 1 software renderer (OMFG)[BB2D]  (Read 12806 times)

0 Members and 1 Guest are viewing this topic.

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
did you ever seen the hl1 software renderer?
on a screen resoultion of 1280x1024 i get the full 75 FPS!
and when i make a simple for..next loop in bb where i fill the whole screen i just get 40 FPS
something's wrong there...
half-life gets full FPS and i only get 40 fps on a much simmpler fill screen routine.
what is hl doing different than i?
« Last Edit: July 21, 2007 by Shockwave »

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #1 on: July 09, 2007 »
"what is hl doing different than i?"

You'd probably have to ask Abrash or Carmack.

I always had problems getting blitz to do much in anything over 320*240 other than maybe a couple of objects or something but certainly not the larger scenes with some effects going on that I managed at 320*240, that was a couple of years ago and on a lower spec machine than I have now but moving over to freebasic at the time made 640*480 a bit more of a realistic possibility.

Now on to using c++, I think 640*480 is fine speedwise but 800*600 is still pushing it a bit, not to say it can't be done other ways but it's still a bit beyond me.

HL was very heavily optimised from the type of maps they used right down to the low level code so it comes down to a lot of different things.

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #2 on: July 09, 2007 »
hmh, since i found out that blitz3d is totaly dis-optimized, i'm thinking of converting my virtualGL lib to bmax and test it there.
i think i'll learn it within 3 days. (dont worry, i learned php within a week)
hm, i'm so jelous of carmacs crew. they made a software renderer of top quality. also they made doom3. it is using multiple stencil lights and lots of shaders, and it runs on 75 FPS at all the times without getting under 60 fps
so jelous ...

Offline Shockwave

  • good/evil
  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 17412
  • Karma: 498
  • evil/good
    • View Profile
    • My Homepage
Re: half-life 1 software renderer (OMFG)
« Reply #3 on: July 09, 2007 »
Mmm, I have to echo your sentiment there and also Stonemonkeys.. I used to use Blitz a lot but got totally disenchanted with it because of the massive file sizes and in particular the slow render speed of writepixelfast. I did write one or two demos with texture mapping in 640 X 480, but I had to use all sorts of tricks to get them up to speed.

In my experience anything over 640 X 480 was a waste of time in Blitz. You will certainly be better served with another language and with someone of your obvious ability I am sure you'll have a ball when you have some proper power to play with.

Btw Stonemonkey, I might be far off the beam here but I think I read somewhere that there are some compiler options you can use to increase the execution speed of VC++ and I'm sure it was Jim who posted it.
Shockwave ^ Codigos
Challenge Trophies Won:

Offline ninogenio

  • Pentium
  • *****
  • Posts: 1668
  • Karma: 133
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #4 on: July 09, 2007 »
yeah you will be talking about turning the sse1/2 switches on in visual studio that can make a huge diffrence in speed also when you  switch to release mode that makes a big speed difrence as well.
Challenge Trophies Won:

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #5 on: July 09, 2007 »
Hmm, I tried the SSE options but it ended up being slower :( .

Offline ninogenio

  • Pentium
  • *****
  • Posts: 1668
  • Karma: 133
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #6 on: July 09, 2007 »
really, i havent yoused it much in visual studio but in dev c i get between 7 and 10x speed increases doing perspective correct mapping and matrix ops setting in up to generate code for a pentium 4 sse2.
Challenge Trophies Won:

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #7 on: July 09, 2007 »
my hope were some pixel dlls like this:#
http://dbfinteractive.com/index.php?topic=1121.0

but this is 4 times slower than blitz!
the reasong might be, that for each pixel i have to call a function in a dll...

writepixelfast writes some bytes to a buffer of an image, the screen, a texture or whatever.
is it faster to write it directly using PokeInt? and if, how?

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #8 on: July 09, 2007 »
There is a way to access the screen memory as a bank in B+, but not sure if that's the case for BB or B3D but what I usually did was to render everything into a 1D array of ints then copy that to the screenbuffer using writepixelfast once the scene was drawn.

Offline Paul

  • Pentium
  • *****
  • Posts: 1490
  • Karma: 47
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #9 on: July 09, 2007 »
This is the best i can find on the BB site:
http://www.blitzbasic.com/codearcs/codearcs.php?code=1104
//Paul
I will bite you - http://s5.bitefight.se/c.php?uid=31059
Challenge Trophies Won:

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #10 on: July 09, 2007 »
hm this sounds cool. i tried it.
[...]
thank you
« Last Edit: July 09, 2007 by Devils Child »

Offline Jim

  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 5301
  • Karma: 402
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #11 on: July 09, 2007 »
I managed to get about 25fps out of blitz at 800x600 with a wbuffer filling about 2/3 screen, but it was so stupid.  Everything was hampered by the 'copy to screen and clear the wbuffer' stuff, which took nearly all the time.
If you were writing halflife, you wouldn't bother trying to do any of that stuff at high level, you'd dive straight in at the 'write direct to memory' level.  You should be able to hammer several hundred frames a second over to a modern video card, so do all the work in system RAM and then blast the finished frames over using an optimised copy.

Jim
Challenge Trophies Won:

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #12 on: July 10, 2007 »
hmh. i've seen the pixel-to-RAM code above, but it is slower than writepixelfast on my machine and faster than writepixelfast on an other machine.

also on my machine it is both at same speed using 320x240 and when you use 1280x1024 writepixelfast is 5x faster?!?

so what should i take?

Offline Jim

  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 5301
  • Karma: 402
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #13 on: July 10, 2007 »
Totally forget writepixelfast.  It's useless because each time you write a pixel you supply the x and y coordinate, so to write a pixel it has to calculate:
address = screen_base_address + (x + y * screen_width) * sizeof_pixel
This is futile, since more than likely the last pixel you drew was at x-1 (the previous pixel just next to the current one) and you could get its address by just adding sizeof_pixel (usually 4 for a 32bit screen) to the previous address!  This is how you can get far more speed by going in at a lower level.

Best thing to do is to write to a system memory buffer and use memcpy() to move the data up to the screen in one go, that will be within 10% of optimum.

By the way 1280x1024x32@75fps is a smidge under 400Megabytes/second.  A modern PCI Express card can in theory do 16x that speed.  Reading from the screen is far, far slower.

Jim
Challenge Trophies Won:

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #14 on: July 10, 2007 »
this is a test i made:
Code: [Select]
.lib "kernel32.dll"
apiRtlMoveMemory(Destination*,Source,Length):"RtlMoveMemory"
apiRtlMoveMemory2(Destination,Source*,Length):"RtlMoveMemory"

Const gw = 1024
Const gh = 768
Const gw1 = gw - 1
Const gh1 = gh - 1
Const gw2 = gw / 2 - 1
Const gw21 = gw / 2
Graphics gw, gh, 32, 2
SetBuffer BackBuffer()

bnkVideo = CreateBank(gw * gh * 4)

w4 = gw * 4
start = MilliSecs()
For i = 1 To 30
For x = 0 To gw2
For y = 0 To gh1
PokeInt bnkVideo, y * w4 + x * 4, $00FF00
Next
Next
Next
LockBuffer()
bnkInfo = CreateBank(32)
apiRtlMoveMemory bnkInfo, GraphicsBuffer() + 72, 32
size = PeekInt(bnkInfo, 20) * PeekInt(bnkInfo, 24) * PeekInt(bnkInfo, 28) / 8
If BankSize(bnkVideo) < size Or PeekInt(bnkInfo, 0) = 0 Then
FreeBank bnkInfo
RuntimeError "Failed to draw on buffer."
Else
apiRtlMoveMemory2 PeekInt(bnkInfo, 0), bnkVideo, size
FreeBank bnkInfo
EndIf
UnlockBuffer()
time1 = MilliSecs() - start

LockBuffer()
start = MilliSecs()
For i = 1 To 30
For x = gw21 To gw1
For y = 0 To gh1
WritePixelFast x, y, $FF0000
Next
Next
Next
time2 = MilliSecs() - start
UnlockBuffer()

Text 10, 10, "Poke: " + time1
Text gw2 + 10, 10, "WritePixelFast: " + time2

Flip
WaitKey()
End

on my machine, writepixelfast is 4 times faster on high resolutions...
why ain't i just writing >>DIRECTLY<< to the backbuffer? i mean - this code is modifyed of the link above
so on some machines the poke method is faster and on some machines the pixel method is faster. also on high resolutions the writepixel method is faster...

Offline Jim

  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 5301
  • Karma: 402
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #15 on: July 10, 2007 »
You're not going to get the best speed with blitz.  Both ways are going to be too slow.

Some suggestions though.

Try changing the PokeInt loop a bit
Code: [Select]
For x = 0 To gw2
For y = 0 To gh1
PokeInt bnkVideo, y * w4 + x * 4, $00FF00
Next
Next
to
Code: [Select]
size%=gh1*gw2-1
address%= 0
for c%= 0 to size
  pokeint bnkVideo,address,$00ff00
  address = address+4
next
It's still not going to be fast, but it omits a heap of extra multiplication PER PIXEL which you must avoid at all costs.

Also, try moving the CreateBank/FreeBank outside the loop.  I suspect the allocation is pretty slow.

The (obvious) problem though is here you're comparing apples with oranges.  The bank version is writing to system memory, reading back from system memory, then writing to video memory - the wpf version is writing direct to video memory.  Clearly even with the hideous overheads of wpf it's going to be faster.  The point is supposed to be that writing to system memory is supposed to be so quick that the extra write/read is hidden.  Problem is PokeInt is too slow.  Also, this isn't a real life scenario.  Often you write each pixel more than once, and read it back to do alpha effects.  If you tried that with ReadPixelFast it would be a disaster, with PeekInt it's much more possible.

Given the theoretical maximum you can blast across to the video card is 4GB/s, can you work out roughly how much bandwidth you're getting from each of these 2 methods to see how far off theoretical it is?  That would be pretty interesting.

<edit>Just realised you're writing your pixels in vertical columns.  That's really bad for the CPU cache.  Much better to write in horizontal rows where possible.

Jim
« Last Edit: July 10, 2007 by Jim »
Challenge Trophies Won:

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #16 on: July 10, 2007 »
hm, semms like i better trust you in that ;)
why is your software renderer using wpf then?
i cant implement vertical colums anymore. it would be to difficult to manage...

thank you for these fast replies :)

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #17 on: July 10, 2007 »
here is my (slower) vgl lib using the poke method

http://dc.freecoder-portal.de/upload/files/Devils%20Child/VirtualGL.zip

am i doing everything right?

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #18 on: July 10, 2007 »
Quote
i cant implement vertical colums anymore. it would be to difficult to manage...

Jim was meaning that in your tests you have the y loop as the inner loop which fills the buffer column by column which is bad for the cache.

Offline Devils Child

  • C= 64
  • **
  • Posts: 66
  • Karma: 2
    • View Profile
Re: half-life 1 software renderer (OMFG)
« Reply #19 on: July 10, 2007 »
you mean this one?
Code: [Select]
mi = GraphicsWidth() * GraphicsHeight() * 4
i = 0
While i < mi
PokeInt vglScreenBank, i, col
i = i + 4
Wend
this is as optimized as it could be...