Hey,
I just wanted to post about one of my projects.
I'm attempting to code a state of the art raytracer.
Why raytracing?
- Well, it's extremelly easy to code a raytracer for starters.
Even a state of the art raytracer with all optimisations fits in a few hundreds of lines of code. If you want to do a state of the art 3D engine with opengl, it will more likelly be thousands of lines of code. - The fastest raytracers use the kd-tree structure. It's an extremelly convinient structure, which has the very interresting propriety to scale very well with large scenes. I.e you can add as much geometry as you wish, it doesn't hurt the framerate. It has been proven that raytracers have a logarithmic complexity regarding the number of objects in the scene, whereas 3d acceleration and software engines have a linear complexity regarding the number of primitives.
- Some may argue that the complexity of a hardware accelerated or software 3D engine can also be logarithmic regarding the number of objects, thanks to complex technics such as LOD, occlusion queries, impostors... However, these technics can also be implemented in a raytracer.
- A raytracer can render pretty much every object that can be tested for intersections. Not only spheres, cylinders, conics and all the like, but also pixel-accurate volumes such as isosurfaces, volumetric effects, *accurate* reflections, *accurate* refractions, point light sources and area light sources, and global illumination. Furthemore, it is instinctive to add all these effects once you have a recursive procedure to trace a ray. However, these render-technics lead to ray multiplication, which hurts performance a lot.
- You don't need a zbuffer to get correct object occlusions on the viewport. In fact, the inherent nature of a raytracer takes care of visibility determination.
So basicly, the raytracer's performance is measured in terms of rays per second, as the scene geometry has no real impact on render speed. I wish i could reach the state of the art raytracer's performance of 10M rays per second per core. It means that a state of the art raytracer can potentially render 800*600 scenes with whatever number of primitives in realtime, on nowaday's hardware. However, the number of cores per CPU is rapidly increasing, with dual cores being well spread already, and quad cores such as intel's Core2Quad being really affordable already. Furthemore, amd's phenom is coming along, as well as intel's nehalem with 8 hardware cores+hyperthreading in late 2008. As an endnote, intel demonstrated the larabee this year, a 80 core CPU which would be a true killer at raytracing! We expect it to hit the market arround 2010. There is also some experimental development done on hardware raytracing units. So yes, I think that raytracing is the future of realtime 3D visualisation.
My starting point is Ingo wald's excellent thesis on realtime raytracing and realtime global illumination. However, I forked from the state of the art technic to test a variant.
Here's the thesis:
http://www.mpi-inf.mpg.de/~wald/PhD/There are only a few points that have to be optimized for a state of the art raytracer:
- ray/primitive intersections
- kd-tree construction
- kd-tree traversal
- shading
Of course, these tasks have to be done through the use of SIMD instructions, such as SSE. I'm starting the code in C++ however, to get a chance to make the thing run before optimising it.
Here is the progress of the project currently, for each of the points above:
- ray/sphere intersections are implemented, and done in batch of 4. It enables to test 4 spheres at once later with SSE.
- kd-tree building is implemented with the naive algorithm and least efficient algorithm for raytracing: middle cut.
- kd-tree traversal has been implemented both in an iterative+stack way, and in a recursive fashion. The recursive is the fastest, because it has no stack datastructure overhead. Furthemore, i traverse for a single ray, whereas most state of the art raytracers test intersection for each primitive but for a packet of almost colinear rays.
- shading is done using basic gouraud diffuse model, but i rather display only normals for now.
The endpoint is performance: I get 250Krays per second per core. It means that a 16k spheres scene takes 1 second to render in 640*480 roughly. This poor performance is most likelly due to the number of tests i do per ray: 57. However, i test 4 spheres per leaf, instead of 1. It is also due to the poor quality of the kd-tree building algorithm i'm using. So there is a lot of room for performance improvement.
Next, i'll work on a SAH based heuristic to build kd-trees.
I noticed interesting research from september 2006 ( fresh, isn't it?

) on this topic:
http://www.sci.utah.edu/~wald/RT06/papers/2006-09-18-RT06-Hunt.pdfTo finish, i attach a screenshot of the current raytraced scenes, and i'll post further improvements i do to the raytracer in this thread.