I've been planning a multithreaded mandelbrot for some time - mostly as an example for this place. Â I did a very basic experiment on my HT CPU (2 threads generating 1/2 the set each) and got very nearly the optimum 2x speedup. Â That surprised me as I thought HT was a gimmick, but it's obvious for CPU-bound stuff, esp. floating point stuff, it's a big gain.
The plan is to have 2 or more worker threads that get given a square of the screen to generate. Â If it's more than a single thread should handle then it'll queue up sub-squares of that square for other threads to work on. Â There are some really cool estimates for which squares are 'complicated' and which are 'simple'. Â They probably apply to buddhabrot too.
Multithreading is going to be huge, especially as the number of cores goes from HT (1 core, multiple hardware threads) to Core Duo (2 core) and Core Quadro (4 core) to PS3 (about 12 cores!) Â to Mesh computing (unlimited cores scattered all over the world). To get the best performance on modern CPUs everything is going to be multithreaded. Â Intel keep sending me application notes on how it all works, if anyone's interested.
Jim