At some point Taj, it would be really very cool if you could do a tutorial of how the maths works for this, and then how to go from there to the asm, and the asm to this miniscule exe. The thought process is as interesting as the code and end result - to me anyway 
Jim
OK you asked for it.
Brain Dump -
how to get curved surfaces in 1kOK first off, a little 1k background. A windows 1k can be split into 450 bytes for windowing, a counter, swapping buffers etc. This leaves less than 500 bytes to do the dirty work. So we are looking for a routine around 350 bytes, plus a few models, plus an effect, plus colours and lighting. Ye gods.
So what can we dismiss quickly? NURBS wont fit, too much data. Subdivision surfaces are too big code. OK so standard stuff wont work. We could fall back to doing something abstract from Paul Bourkes web pages (again) or maybe this :
http://mathworld.wolfram.com/SurfaceofRevolution.html. Surfaces of revolution. These are simple and work by defining a curve in 2d and the rotating this around an axis to get 3d. Hence the name. So we have a viable technique for producing complex curved surfaces in 3d in 1k.
However what I wanted to do was a specific object, not an abstract shape. It turns out surfaces of revolution can do this. If we define a set of specific points rather than a curve in 2d, we can rotate these around the axis and join them up into a mesh of triangles. See image 1 below. So we define 6 points and we have the outline of a pillar. Its not curved yet, its just a set of points of course.
To rotate around the y axis, we can use sin and cos. The x position of the point is the radius of rotation. So now to get nice smooth surfaces around the object (in this case a pillar from a church) we simply use very small angles (say 5 degrees) and step each point in the object around a circle. That simple! We then join these points into triangles. This is cool because a single assembly command fsincos can return the sin and cosine of the angle so we know this code will be very small.
But wait we now have a problem. We know we can do smooth surfaces around the object but what about up and down it. Surely just joinging the points will result in straight edges up and down. What we really want is more like image2 below whnere the outline of the object (the pillar) is curved too.
So lets consider some options:
1. We could define a lot of points (PRO, easy to do, no more code, CON large amount of data, only one object maximum possible)
2. Use a curved interpolator between the points like bezier or spline in 2d. (PRO, known to work, CON Too much code)
3. Use a simpler curved interpolator.
Here is a very simple curved interpolator and an explanation of how to interpolate a set of points into a curve:
http://local.wasp.uwa.edu.au/~pbourke/other/interpolation/Notice the cosine interpolator. We may be able to fit this into 1k as cos can be a single instruction of assembler in our code. So now we have a scheme for stepping slowly along *between* the points in small incremental steps using an interpolator. This means we can now go up and around the object in smooth curves! We are almost there.
OK but I'm
still not ready to start as I dont know how to store objects yet. Image 3 below shows a normal/typical way. Each point would be stored as an (x,y) pair. Hmm but each float is 4 bytes so the goblet would be 14 (points) x 2 (x,y) x 4 bytes. = 112 bytes. No it wont compress well as each point is quite different. Integers too are 4 bytes, though they will compress better (reasons are complex so I wont detail here). But still 100 bytes give or take is too much when our total budget for code and everything is <500!
So lets get smart:
1. Lets limit the range of values to say 1..16. Thats 4 bits per co-ordinate.
2. Lets store x,y in the same byte. Thats an entire 2d vertex in 1 byte.
3. Lets use 1 byte to store the numbr of vertices in the outline (4 bits = up to 16 vertices) and also store how to draw it (lines, triangles, quads etc, another 4 bits).
This will complicate code a little but its basically & and >> to extract the x,y from a single byte and those functions are very very cheap.
So now the goblet fits into 15 bytes! The pillar in 7. We are in business.
OK you have a working theory so now you write the code. This is what happens: its way too big. Normally its around 1200-1300 bytes. This is depressing and scary the first time. So now you need to know how to get things smaller.
Rule 1: Do not use math library, use your own (asm) math functions. Here are some useful ones for GCC:
asm ("fsin" : "=t" (a): "0" (a)); // not a=sin(a);
asm ("fcos" : "=t" (a): "0" (a)); // not a=cos(a);
asm ("fsincos" : "=t" (newct), "=u" (newst) : "0" (t)); // not newct=cos(t); newst=sin(t);
Yes thats real GCC C code above. It will save loads of bytes. You save pulling in the maths lib (thats 20+ bytes saved) you save the first function call into maths lib (that could be 14 bytes per function (ie 14 for cos, 14 for sin)) and you save the second fucntion call which is going to be maybe 6 bytes in this case after compression. It doesnt sound a lot but believe me it all helps.
I discovered the cosine interpolator was too much floating point maths. So I got clever and defined a very very simple curved interpolator. I'm sorry but I think its my major advantage here so I'm not going to reveal it. However the point is when you get stuck, understand the basics of the algorith and ditch what you really dont need. In my case Do I really need the curves to be continuous through points or do I just want curves between any two points? OK then well given two points any curve between them will do.
Some things I discovered. We can go up and around the object smoothly in as many steps as we like. Lets call the up value t and the around value s. This is great because now to get texture co-ordinates on the object we simply call glTexture2f(s,t); One line of code to generate our texture co-ordinates. Alternatively you can use s and t in some creative way to define colour (glColour3f(s,t,s+t); ) at each vertex. Normals were tricky. I discovered one last trick.
Using s,t as input to our up and around co-ordinate generators, we get back a a point in space (x,y,z). In my case these are in the range (-1.0..1.0, 0..1.0,-1.0..1.0). So lets do this: glNormal3f (x,2*y-1,z); What would the normals look like? Anyone? No? Anyone? No? Anyone? They would be spherical normals. This means *any* object we define would be lit as if it was a sphere(*). Guess what? You cant tell. The eye is completely fooled (if you dont add specular light). It is quite impossible to tell and is very cheap.
(*) this was misleading as Jim pointed out, the lighting on the object is calculated in the same way it would be for a sphere is a better way to say it...
Result. We have curved objects, texture co-ordinates and fake normals for very very little code.
In the end I plumped for more objects and dumped the texture and lighting. Mainly this was because setting up good textures and lighting was too expensive. Nontheless you can see how sometimes with some creativity, cutting corners etc. things get really really small.
I will provide source code for this article to members who request it- with no warranty at all. Its a mess but the two routines I use to generate the objects are pretty clear when read with this article.
Thats it for now, maybe this should be moved to the general discussion section?
Anyway I hope you have at least gained some insight into how hard it is to do a decent 1k intro. Its not a matter of just reducing code size but also of finding 2-3 tricks for making algorithms smaller and also knowing where to start. Data types and stored data must be reduced to the minimum and this requires creativity and an understanding of how data types are stored/converted and how they convert.
The same is true of 4k of course.