if you want to plot out the whole screen the fastest way I know of is to use asm and blit from the start of the screen to the end, much like you do with the for loop... also if your just plotting something that isn't full screen, asm again is way faster than nesting two for loops, don't know why but this was the case in the tests i did.....
Heres a routine i used to clear the screen to any colour when i first started messing around with asm.... Theres still some things you could do to make this a tad faster but its fast enough.....
sub pgl_cls(byval col as integer = rgb(0, 0, 0))
asm
mov ebx, [pgl_screen_ptr]
mov edx, [pgl_screen_length]
mov ecx, [col]
lea eax, [ebx + edx]
rep:
mov [ebx], ecx
add ebx, 4
cmp ebx, eax
jbe rep
end asm
end sub
and to blit a full screen image with an alpha value using mmx..... again this was written a while ago so theres probably some optimisations that could be done, also it relies on some variables that are pre-determined at the start of the graphics lib I put together, so its pretty useless as usable code to copy and paste but you get the idea..... Jim helped out a lot with this when I was putting it together.... Cheers again Jim... Also dont be surprised if the comments dont make any sense, its been chopped and changed a bit while i was doing it.....
sub pgl_blit_screen(byval src_buffer as integer ptr, byval alpha as ubyte)
dim p_alpha as integer = alpha shl 16 or alpha shl 8 or alpha
asm
mov eax, [src_buffer]
mov ebx, [pgl_screen_ptr]
mov ecx, [pgl_screen_length]
lea ecx, [ebx + ecx]
add eax, 12
pxor mm6, mm6
movd mm2, [p_alpha] 'mm2 has alpha value
punpcklbw mm2, mm6 'unpack it for processing
dloop:
movd mm0, [eax] 'image pixel col in mm0
movd mm1, [ebx] 'screen pixel col in mm2
punpcklbw mm0, mm6 'unpack mm0 using blank mm6 register as interleave
punpcklbw mm1, mm6 'same for screen color
psubw mm0, mm1 'subtract image color from screen color
pmullw mm0, mm2 'multiply resulting color by alpha
psrlw mm0, 8 'divide that by 255
add eax, 4 'increment image position
paddb mm0, mm1 'and then add screen color back in
packuswb mm0, mm0 'repack image pixel back to lower 32 bits of mm0
movd [ebx], mm0 'move result back to screen
add ebx, 4 'increment screen position
cmp ebx, ecx 'check for end of screen
jbe dloop 'if were not there wrap!!!!
emms 'restore floating point stuff, fuck this is expensive..
end asm
end sub