Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - nystep

Pages: [1]

Projects / Re: realtime raytracer project

« on: October 25, 2007 »

So, i've done quite many changes that bring the performance next to 1,3 MRPS on a particular scene. The average performance is more often arround 1MRPS in the general case. I've performed quite a lot of optimisation to reach this performance. Here is what i have done:

- packed the kd-tree node structure to 8 bytes, and found a way to do it in 4 bytes..
- precomputed 1/direction_vector, which removes the division in the kd-tree traversal, overall speed bumped by 30% from this.
- tried to tweak my termination critera for kd-tree building, i have various speed improvements from it.. depends on the scene.
- made the iterative kd-tree traversal 25% faster by removing the stack template and implementing a hard-coded stack...

All in all, this tracer is now as fast as what Ingo Wald describes in his thesis, as for it is a mono-raytracer, single core, and consists of C++ code. I'm quite satisfied with it currently, next would be to implement a packet tracer, and MLRTA. MLRTA is quite tricky for what i've read so far... And the benefits are not quite those listed in the tables on the first page i think, for it only finds an entry point for the kd-tree traversal deeper in the code.. Even in the end of the paper they admit they only divide by 3 the number of steps in the kd-tree traversal in some cases, so the performance boost canot be that big...

I share my current sources of course.. Hope it can help someone..

Projects / Re: realtime raytracer project

« on: October 25, 2007 »

Sure chris,

Building a kd-tree is kind of simple by concept. It's a recursive process, where you take a list of primitives as argument. If you meet an end critera, you store the primitives for this leaf and return. Else, you have to decide where to split your current Axis Aligned Bounding Box (aabb) for each children. Then you intersect your primitive list with each of the children's aabb, and recurse on each child to keep building the tree.

The biggest problem is where to split the current aabb. There are various algorithms for it, and the performance greatly varies amoung them. We can think of straightforward ways to split the current aabb: Put the split at the median point of the primitives, or divide the size of the aabb by 2. We can think the first is a good heuristic, because you end up having a balanced tree with it. However, if your rays travel large regions of empty space, it will still have to test for triangles it will most likelly never intersect. As a result, the median cut was the heuristic that produced the slowest resutls on my test scenes. On another side, if you divide the size of the aabb by 2, you end up having some empty space, but you can't really handle efficiently the case of a teapot in a stadium: the algorithm will store the geometry for the teapot very deep in the tree, so it will be very slow to reach it.

The idea behind the surface area heuristic is to associate a raytracing cost to an axis aligned bounding box. The SAH assumes that rays are equally distributed in space, and are not blocked by objects. Having an AABB named A, it is possible to calculate the probability of each of the sub-AABBs, let them be AL and AR.

P(VR/V) = SA(AR) / SA(A)
P(VL/V) = SA(AL) / SA(A)

Where P is a probability and SA is the Surface Area of each sub Axis Aligned Bounding Box. Once this is known, it is possible to associate a cost to each of the left and right sub-AABBs:

Let Ctrav be the cost for traveling a node in the kd-tree, and Cinter the cost of intersecting a primitive:

Csplit = Ctrav + ( P(VR/V)*NR + P(VL/V)*NL )

where NR is the number of primitives that intersect the left AABB, and NR the number of primitives that intersect the right AABB. (From ingo wald's thesis).

The idea of the heuristic is to extract the minimum Csplit. Hopefully, this cost function varies (only) at each boundary of an object, so you'd be forced to test along the xmin of a triangle and along the xmax. For a single axis scan, it still makes N*2 cost evaluations to do, and you have to determine NR and NL at each evaluation. So this is a pretty slow process, however, accelerating it is one of today's research topics in raytracing.

If the minimum cost you obtain is lower than your current AABB cost, you perform a split, otherwise, you store primitives and tag your current node as a leaf. There is some pretty readable further information about it on Jacco bikker's excellent tutorial page: http://www.devmaster.net/articles/raytracing_series/part7.php.

General chat / Re: England are t3h fux0rs

« on: October 20, 2007 »

the bloomin ozzies?

it was a nice match, too bad there was no tries though.. Though the refused one could have changed the shape of the match..

Projects / Re: realtime raytracer project

« on: October 20, 2007 »

Actually, I have Visual Studio 2005 Professional installed here, Yet I don't use it. It's too bloated, it installs a sql server in your back, and my versions of visual assist X don't install on it. So, I keep rocking the good old vc6, which furthemore is the last Visual studio which has a profiler integrated in the IDE.

And, quite sincerelly, i don't expect the compiler to do the SSE job in my place, even when you have proper SOAs, it doesn't even detect how to use them.. So intrisics, nasm still work fine. At least for hobbies... At work it's another story

Projects / Re: realtime raytracer project

« on: October 20, 2007 »

Quote from: Paul on October 20, 2007

What does x rays per second per core acually mean?
does it mean my core2duo gets 2fps om a 1024*768 resolution?
Surley it must depend on the speed of the cores, so what speeds are theese cores yuo're refering to going at?

actually my CPU is a centrino duo T2300 clocked at 1.66GHz... So you'd probably get more fps on a core2duo CPU.
I think that talking in number of rays per seconds per core makes it easier to measure the performance of a raytracer. For example, I came accross www.rayforce.net, the number of rays per second is really impressive, as they mention 40-60 millions of rays per second, but on the other side this result is from a 8 core machine, so if you really want to compare with your tracer you'd better take the number of rays per core.. Still, there are many types of CPUs arround.. What should we talk about, CPU cycles per ray then?

I've implemented the SAH (Surface Area Heuristic), but there must be something seriously fucked up in my code, as I didn't notice any really impressive performance gain... After a while I realized i'm never generating empty leaves at a high level in the tree, so i fixed it up. Even with that, i noticed disappointing performance. I came accross scenes where the number of visited leaves per ray was next to 1, I had a performance of 2Mrays per second per core on these scenes. But in more randomish scenes, it seems more mixed. The performance is still arround the 1Mrays per second (more likelly under, 600K-900K is something i usually get in average..). The problem is still the number of leaves traversed before finding an intersection. It tends to blow off if the "eye" of the camera is inside the kd-tree, and the renders are insanelly slow then. I've been profiling as well the program and i saw the ratio spent in the intersection test was only 33% of the CPU time, against 60% in the kd-tree traversal. It means if i optimize primitive intersection tests in assembler with SSE, i'll only get a limited performance boost. This is, once again, a little disappointing..

So next i'll take more care about my kd-tree, as the render speed is really dependant upon the quality of the tree that is built. I'll try to implement another heuristic, that aims at maximizing the empty space outside the remaining geometry when splitting along a plane. It would enable to make bigger empty leaves, at higher levels in the tree. However, it's not necessarly good to split once you find a little empty room somewhere in the scene, since deeper trees mean more traversal time.

I decided to share the source, hope it can be usefull to someone. Do you mind if it doesn't comes along with my whole engine? I'm too lazy to do a clean package now.. However, the code has little dependancy with the engine, it only uses the logfile, and a self-resizing vector class...

Still, it can be interresting as it shows how to do a clean and "not too slow" recursive kd-tree traversal (i've been fighting with it, really, i'm sure it can help). It also shows how to build a kd-tree from a set of primitives.

Next, i think i'll read up this MLRTA (Multi level raytracing algorithm) paper i just downloaded. It shows off really mindblowing performance results, but the people who have implemented it arround on the net have been disappointed from what i've been reading on some other forum

.. I'll also implement this other heuristic to build the tree and see which results it gets.

Projects / Re: realtime raytracer project

« on: October 19, 2007 »

I have done quite a lot of progress today.

First i fixed the sphere splitting bug, which was caused by tree-building.
Then, i performed additional benchmarks using both of the 2 tree-building heuristics i have now..
The speed can be catastrophic as well as pretty good. I had performance variation from a factor of 1 to 10 depending on where i put the spheres arround, which is kind of.. Weird.

The statistics show that the traversal algorithms visits 2 to 20 leafs depending on the scene. This is most likelly the cause of the slowdown.

I improved the recursive traversal algorithm with early ray termination.

The shooter performance is now arround 600K 700K rays per second per core, on "good" scenes of 16k spheres... I have used this speed gain to render 1024*768 scenes instead of 640*480. This performance is still rather poor, so more work waits for me soon...

It has been pretty much a pain in the ass to build a "correct" tree that doesn't split spheres, but i guess more pain is on the way, I'll work on the SAH heuristic for example, as said, and for starters i'll try to make a tree that has empty leafs for the most empty space as possible. It enables rays to travel in these at minimal cost.

I'm also bound currently by kd-tree building time. At the moment, it takes 7-8 seconds to build the tree for 16k spheres. I don't include it in the rendertime of course, as it's pretty much some preprocessing step, as I mean to do some animation in the end. It's going to be a problem, because I fear the SAH will be even slower for building kd-trees.

For the quantity of sources, i'm sorry it has hurt you in some way chris, i didn't mean to offend anyone. Furthemore, I just noticed i'm arround 1k lines of code now for this project, and it's pretty much like it's going to increase, as I will add other tree-building algorithms, and shading... I still hardly imagine how i would end up reaching 10k lines of code with a raytracer though. I'd like to keep the code as simple and straightforward as possible.

I promise i share the sources as soon as I have a working SAH for building kd-trees.

General chat / Re: PS3

« on: October 19, 2007 »

Yes well, as it says in the wikipedia article, it's more like one CPU and 6 SPE that you can code on. The SPE are a hell to code, because they have only 128k of accessible memory each, for both code and data. You're forced to do DMA transferts from the main RAM by hand to "feed" them, and DMA transferts back to main memory to get the results... So, it's really challenging to get anything out of it...

Though the linux install and its Cell+framebuffer access could mean... raytracing on ps3

That'd be a real challenge...

Projects / Re: realtime raytracer project

« on: October 18, 2007 »

> Is suspect that it probably isn't the way you want to tackle the raytracer though as you seem to want perfect accuracy

Thanks for the hint, i see the guys have made some great adaptative RT tricks! impressive actually.
However, yes, I'll try to prevent any approximations to the tracer. So no fixed point, no adaptative subsampling.. Furthemore, as I'll probably add textures at some point, I don't want any weird looking artifacts.

One of my ideas was basicly to even do some HDR rendering and implement a tone compression algorithm in realtime. I have no idea if it's really possible though yet. Wait and see...

Projects / realtime raytracer project

« on: October 18, 2007 »

Hey,

I just wanted to post about one of my projects.
I'm attempting to code a state of the art raytracer.

Why raytracing?

Well, it's extremelly easy to code a raytracer for starters.
Even a state of the art raytracer with all optimisations fits in a few hundreds of lines of code. If you want to do a state of the art 3D engine with opengl, it will more likelly be thousands of lines of code.
The fastest raytracers use the kd-tree structure. It's an extremelly convinient structure, which has the very interresting propriety to scale very well with large scenes. I.e you can add as much geometry as you wish, it doesn't hurt the framerate. It has been proven that raytracers have a logarithmic complexity regarding the number of objects in the scene, whereas 3d acceleration and software engines have a linear complexity regarding the number of primitives.
Some may argue that the complexity of a hardware accelerated or software 3D engine can also be logarithmic regarding the number of objects, thanks to complex technics such as LOD, occlusion queries, impostors... However, these technics can also be implemented in a raytracer.
A raytracer can render pretty much every object that can be tested for intersections. Not only spheres, cylinders, conics and all the like, but also pixel-accurate volumes such as isosurfaces, volumetric effects, *accurate* reflections, *accurate* refractions, point light sources and area light sources, and global illumination. Furthemore, it is instinctive to add all these effects once you have a recursive procedure to trace a ray. However, these render-technics lead to ray multiplication, which hurts performance a lot.
You don't need a zbuffer to get correct object occlusions on the viewport. In fact, the inherent nature of a raytracer takes care of visibility determination.

So basicly, the raytracer's performance is measured in terms of rays per second, as the scene geometry has no real impact on render speed. I wish i could reach the state of the art raytracer's performance of 10M rays per second per core. It means that a state of the art raytracer can potentially render 800*600 scenes with whatever number of primitives in realtime, on nowaday's hardware. However, the number of cores per CPU is rapidly increasing, with dual cores being well spread already, and quad cores such as intel's Core2Quad being really affordable already. Furthemore, amd's phenom is coming along, as well as intel's nehalem with 8 hardware cores+hyperthreading in late 2008. As an endnote, intel demonstrated the larabee this year, a 80 core CPU which would be a true killer at raytracing! We expect it to hit the market arround 2010. There is also some experimental development done on hardware raytracing units. So yes, I think that raytracing is the future of realtime 3D visualisation.

My starting point is Ingo wald's excellent thesis on realtime raytracing and realtime global illumination. However, I forked from the state of the art technic to test a variant.

Here's the thesis: http://www.mpi-inf.mpg.de/~wald/PhD/

There are only a few points that have to be optimized for a state of the art raytracer:

- ray/primitive intersections
- kd-tree construction
- kd-tree traversal
- shading

Of course, these tasks have to be done through the use of SIMD instructions, such as SSE. I'm starting the code in C++ however, to get a chance to make the thing run before optimising it.

Here is the progress of the project currently, for each of the points above:

- ray/sphere intersections are implemented, and done in batch of 4. It enables to test 4 spheres at once later with SSE.
- kd-tree building is implemented with the naive algorithm and least efficient algorithm for raytracing: middle cut.
- kd-tree traversal has been implemented both in an iterative+stack way, and in a recursive fashion. The recursive is the fastest, because it has no stack datastructure overhead. Furthemore, i traverse for a single ray, whereas most state of the art raytracers test intersection for each primitive but for a packet of almost colinear rays.
- shading is done using basic gouraud diffuse model, but i rather display only normals for now.

The endpoint is performance: I get 250Krays per second per core. It means that a 16k spheres scene takes 1 second to render in 640*480 roughly. This poor performance is most likelly due to the number of tests i do per ray: 57. However, i test 4 spheres per leaf, instead of 1. It is also due to the poor quality of the kd-tree building algorithm i'm using. So there is a lot of room for performance improvement.

Next, i'll work on a SAH based heuristic to build kd-trees.
I noticed interesting research from september 2006 ( fresh, isn't it?

) on this topic: http://www.sci.utah.edu/~wald/RT06/papers/2006-09-18-RT06-Hunt.pdf

To finish, i attach a screenshot of the current raytraced scenes, and i'll post further improvements i do to the raytracer in this thread.

Challenges & Competitions / Re: Halloween Competition 2007!

« on: October 01, 2007 »

sounds good, hope i'll have some time to get in the compo

General coding questions / Re: Should texturing 160 OpenGL quads be SLOW?

« on: September 20, 2007 »

Hi,

Well, first of all, how much does the resolution affect your framerate? what framerate do you get in 640*480, 800*600, 1024*768... ? If the framerate varies a lot depending on resolution, it means you're fillrate-limited.
It can be the case with 160 quads if they cover all the screen for example.

In case you're fillrate limited, you should also check how the texture resolution affects performance, but it should be granted that your 32*32 textures fit in the GPU texturing-cache.

Then, there would be a test to do with the number of quads, to see if you're vertex-limited, which i beleive is not the case since you use only 160*4 vertices. And, on another side, you're using immediate mode (specifying each vertex with function calls on the CPU). In this mode, the hardware vertex-processor unit is most likelly crossing arms waiting for any data to come from the CPU. There is no real risk that you get vertex-processing power limited since the GPU requires that you send several thousands of triangles per CPU-call to be really busy.

The last solution, if your framerate does not depend on the resolution/number of vertex, is that your application is CPU-limited. The language you use may be too slow, or, you're making too many opengl calls for each primitive. Can you get extensions in blitzmax?
If you can, you should consider the way you're sending vertex data to the opengl. If the vertex are constant for each frame, you can store them in the video card memory and render them in a single opengl call, without a loop.
Elseway, if the vertex are moving, you're forced to update the contents of the video card's memory at every frame, so it sounds a bit useless. You can nonetheless use vertex arrays, since they're client side anyway.

To come to your code, first, i find it weird that you're using:
glLoadIdentity()
glTranslatef (x - radius, y - radius, 0)
to translate, since you can translate as well in the glVertex calls. You should prevent changing the modelview matrix inside a loop only for a few vertex... changing a matrix makes 16 floats to be transfered to the GPU, whereas your vertex are taking as well 16 floats.

As it has been said, you should enable texturing and bind your texture out of the loop too. Just keep the minimum OpenGL calls inside your loop, because those are quite CPU-expensive.

Well, hope i've been clear with those explanations and that they'll be useful.

C / C++ /C# / Re: C++ loading PNG files

« on: September 17, 2007 »

wow

i've been fighting with libpng and zlib, and more recently with the free independant jpeg group to get a png and jpeg reader. It took aaaages to make the png low level reader work fine. I can't beleive that it can be made so simple in OLE. great work Jim.

just in case you're interrested, my image loader stands there, but it requires external libraries of course :/ :
http://glsilent.svn.sourceforge.net/viewvc/glsilent/src/img/img_Image.cxx?revision=10&view=markup

you can feel free of browsing also arround all my engine sources. i doubt the opengl wrapper will make anyone interrested, this engine is getting old fashioned. I've been thinking of doing some redesign from scratch, but i don't know if i should make a software 3d engine and play with it on symbian/psp/nds, or switch to managed dx and c#.

wait and see..

General coding questions / Re: Need help to fix a little simple 4k

« on: August 22, 2007 »

ok, thanks to both for helping out

nice hint auld.. i think this tool will come handy, if i may say so..

General coding questions / Re: Need help to fix a little simple 4k

« on: August 22, 2007 »

true, i can already save as tga, but i meant that print screen seems not to work.. weird..

Useful links / Re: Ideas for your next demo...

« on: August 22, 2007 »

Thanks for the ideas.. I needed some

General coding questions / Re: Need help to fix a little simple 4k

« on: August 22, 2007 »

Hey va!n,

here's how i did my entry (based on auld's framework):

Code: [Select]

	DEVMODE dmScreenSettings;
	dmScreenSettings.dmSize		  = sizeof(dmScreenSettings);
	dmScreenSettings.dmPelsWidth  = 800;
	dmScreenSettings.dmPelsHeight = 600;
	dmScreenSettings.dmBitsPerPel = 32;
	dmScreenSettings.dmFields	  = DM_BITSPERPEL|DM_PELSWIDTH|DM_PELSHEIGHT;
	ChangeDisplaySettings(&dmScreenSettings,CDS_FULLSCREEN);

	PIXELFORMATDESCRIPTOR pfd;  
	pfd.cColorBits = 32;
	pfd.cDepthBits = 24;
	pfd.dwFlags    = PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER;	
	HDC hDC = GetDC ( CreateWindow("edit", 0, 
								WS_POPUP|WS_VISIBLE|WS_MAXIMIZE, 
								0, 0, 0 , 0, 0, 0, 0, 0) );         
	SetPixelFormat ( hDC, ChoosePixelFormat ( hDC, &pfd) , &pfd );
	wglMakeCurrent ( hDC, wglCreateContext(hDC) );

	ShowCursor(FALSE);

for screen init,

Code: [Select]

	wglMakeCurrent ( NULL, NULL );

for deinit...

though, i haven't found out how to take a screenshot of it.. It fails anytime, any idea how to make it possible to take screenshots with this?

it was indeed pretty funny that they ran out of beer last night

Pages: [1]

Show Posts

Messages - nystep

Projects / Re: realtime raytracer project

Projects / Re: realtime raytracer project

General chat / Re: England are t3h fux0rs

Projects / Re: realtime raytracer project

Projects / Re: realtime raytracer project

Projects / Re: realtime raytracer project

General chat / Re: PS3

Projects / Re: realtime raytracer project

Projects / realtime raytracer project

Challenges & Competitions / Re: Halloween Competition 2007!

General coding questions / Re: Should texturing 160 OpenGL quads be SLOW?

C / C++ /C# / Re: C++ loading PNG files

General coding questions / Re: Need help to fix a little simple 4k

General coding questions / Re: Need help to fix a little simple 4k

Useful links / Re: Ideas for your next demo...

General coding questions / Re: Need help to fix a little simple 4k

General chat / Re: Live from Buenzli 2007 Winterthur/Swiss

General chat / Re: Live from Buenzli 2007 Winterthur/Swiss

General chat / Re: Live from Buenzli 2007 Winterthur/Swiss