Author Topic: (Sort_table)Sort without compare. Very fast soting algo.  (Read 7177 times)

0 Members and 1 Guest are viewing this topic.

Offline relsoft

  • DBF Aficionado
  • ******
  • Posts: 3303
  • Karma: 47
    • View Profile
Some people already benchmarked this against crt's quicksort and quick sort is a lot slower than this. Anyways, this would be great for non-z buffered sw 3d engines.

Happy easter to everyone!!! Yo shockie, sup man?

Code: [Select]
''Table sort
''A very fast sorting algo
''I use it for like every sorting stuff I need (mostly 3d during the DOS days)
''I heard this was used to sort polys on the playstation
''pros:
''1. No messy swaps, pivot points(Quick), half-pivots(shell)
''2. very fast (fastest general purpose sort algo I've come to know)
''cons:
''1. Maximum value in an element should be <= MAX_DEPTH
''2. More memory needed(directly propotional to MAX_DEPTH)
''
''Relminator 2010
''http://rel.betterwebber.com
''
''
''Notes:
''1. Array to sort need not be ordinal (ie 1,2,3,4... or 2,1,4,3..)
''They can be (1,4,6,11,99).
''
''2. Max Elements of array to sort need not be equal to MAX_DEPTH.
''this algo would work as long as values the array holds <=MAX_DEPTH
''

'' EDIT: duplicate handling added using a very simple implementation
'' of a singly linked-list


'' max number of array elements for our table
const as integer MAX_DEPTH = 1023

'' max array elements for our array to sort
const as integer MAX_ELEM = 127

'' filler for Tsort_table array
'' if Tsort_table(i).index = dummy means that the index is unoccupied
const as integer DUMMY = -(2^16)

'' the sort table
type Tsort_table
index as integer
next_index as Tsort_table ptr '' use this for duplicates ;*)
start_index as Tsort_table ptr '' foo speed
end type


'' our sort table array
dim as Tsort_table s_table(MAX_DEPTH)

'' our unsorted array
dim as integer int_list(MAX_ELEM)

'' clear or table
for i as integer = 0 to MAX_DEPTH
s_table(i).index = dummy
s_table(i).next_index = new Tsort_table ''create list
if s_table(i).next_index = 0 then
print "Out of memory!!!"
''exit(1)
endif
s_table(i).start_index = s_table(i).next_index ''foo speed
next i


randomize

'' our unsorted array
for i as integer = 0 to MAX_ELEM
int_list(i) =  int(rnd*MAX_ELEM) '' comment this for a no duplicate array
next i



'' sort via sortable
for i as integer = 0 to MAX_ELEM

'' index in not occupied
if s_table(int_list(i)).index = dummy then
s_table(int_list(i)).index = int_list(i)
else '' index is occupied so make special linked list

'' iterate until found a free element

'' get last index
dim as Tsort_table ptr p_temp
    p_temp = @s_table(int_list(i))
    while (p_temp->next_index)
        p_temp = p_temp->next_index 
    wend
   
    '' add next
    p_temp->next_index = new Tsort_table
    if p_temp->next_index = 0 then
print "Out of memory!!!"
''exit(1)
    endif

'' fill current with duplicate value
    p_temp->index = int_list(i)
    s_table(int_list(i)).next_index = p_temp ' increment the list

endif

next i




'' print
print "index","unsorted" , "sorted"
dim as integer j = 0
for i as integer = 0 to MAX_DEPTH
'' print only if our table index is has a value in it ie != 0
if s_table(i).index <> dummy then
print  j, int_list(j), s_table(i).index
j += 1
if s_table(i).start_index then
dim as Tsort_table ptr p_stable = s_table(i).start_index
while p_stable->next_index '' do the list
print j,int_list(j),p_stable->index
j += 1
p_stable = p_stable->next_index
wend
endif
endif

next i




sleep

''delete linked list
for i as integer = 0 to MAX_DEPTH

'' delete only if depth buffer <> dummy
if s_table(i).index <> dummy then
dim as Tsort_table ptr p_temp, p_current
    p_temp = s_table(i).start_index
    p_current = p_temp
    while (p_current)
        p_temp = p_current
        p_current = p_current->next_index       
        delete p_temp       
        p_temp = 0
    wend   
   
    delete s_table(i).start_index
    delete s_table(i).next_index
   
    s_table(i).start_index = 0
    s_table(i).next_index = 0

endif

next i


end




Challenge Trophies Won:

Offline Shockwave

  • good/evil
  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 17409
  • Karma: 498
  • evil/good
    • View Profile
    • My Homepage
Nicely done Rel, that's worth a karma boost :)

Thanks for posting it.
What's prompting you to look into fast sorts? Writing something new?
Shockwave ^ Codigos
Challenge Trophies Won:

Offline relsoft

  • DBF Aficionado
  • ******
  • Posts: 3303
  • Karma: 47
    • View Profile
Nicely done Rel, that's worth a karma boost :)

Thanks for posting it.
What's prompting you to look into fast sorts? Writing something new?

Well. my daughter wanted me to port one of my games to the NDS so I was trying to relearn some stuff. C++, Fixed point math, software rendering etc.  My main prob right now is the very big DL of devkit pro arm.  My connection here is limited. I wish they'd have a torrent DL. :*(

What's new buddy?
Challenge Trophies Won:

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Quote
this would be great for non-z buffered sw 3d engines.
Funny; I implemented almost the same a week ago.
The Playstation 1 ordered polygons this way, too, but it usually requires a bit of fine-tuning.
For polygons it makes sense to store 1/z to keep more accuracy in the front and less in the back.
A linked list often kills a bit of performance due to pointer-chasing and cache-misses.

Quote
quick sort is a lot slower than this.
The order of polygons doesn't change completely from one frame to the next.
QuickSort behaves much better if you just resort an existing ordering.
Challenge Trophies Won:

Offline Jim

  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 5301
  • Karma: 402
    • View Profile
In Sony talk, this is called OTAG sorting.  I think the hardware could handle stuff in this format directly so there was some kind of win if you did it.  I remember deliberately emulating the algorithm for the PC software renderer I had written so we could debug scenes with big polygons (which is where this algorithm dies in real life - the problem being picking a good z value for a polygon, especially ones that are long and thin in depth or z) on the PC before they got to the PS.

Jim
Challenge Trophies Won:

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
big polygons [...] is where this algorithm dies in real life
True, but that's not much better with a common sorting algorithm either.
And in case of the PSX its' affine mapping usually died earlier so the hardware was perfectly well designed ;)
Challenge Trophies Won:

Offline Jim

  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 5301
  • Karma: 402
    • View Profile
Just reminiscing :)  Chopping big polys for this algo so that the splits ended up on whole-numbered uv boundaries so the textures didn't slide around all over the place is no fun at all!
K+ to relsoft.
Jim
Challenge Trophies Won:

Offline relsoft

  • DBF Aficionado
  • ******
  • Posts: 3303
  • Karma: 47
    • View Profile
Thanks for the K guys!

BTW,

HellFire: Do you know of a better way to handle duplicates?  Linked lists are baaaad.

Jim: Thanks for the original name.  What does OTAG mean?
Challenge Trophies Won:

Offline Jim

  • Founder Member
  • DBF Aficionado
  • ********
  • Posts: 5301
  • Karma: 402
    • View Profile
I think it just means Order Tag.
Jim
Challenge Trophies Won:

Offline hellfire

  • Sponsor
  • Pentium
  • *******
  • Posts: 1294
  • Karma: 466
    • View Profile
    • my stuff
Do you know of a better way to handle duplicates?  Linked lists are baaaad.
Well, you'll usually use ordering tables on hardware that can't afford regular sorting so the number of polygons you want to draw will be rather small.
In this case it can be an option to make the table big enough for all situations.
This won't work out for eg a massive multiple-player game, though...

Generally:
Keep the data that you want to store in the table as small as possible (eg keep a shared vertex-buffer and just store indices).
Don't allocate single elements but blocks of reasonable size. If a block is full link a new block.
Manage your own heap which provides contiguous data to avoid jumping randomly in memory.
Analyse your scenes (what depth usually holds the most/fewest polygons) and preallocate your tables accordingly.

We usually take a "layered" approach destinguishing back- and foreground.
This way related polygons are still drawn together instead of mixing them all up.

But ordering (painter's algorithm) also means worst-case overdraw.
At some level of scene-complexity a bit more "thought" starts to pay out.
Challenge Trophies Won:

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
Here's something like the sort I usually use, again it doesn't use comparisons and can be customised for use. In this case it can deal with any number of elements with 16 bit integer depth values. It won't be as fast as yours Rel but maybe some advantages and I think still faster than most other sorts, especially with a large number of elements.

Code: [Select]
'Radix sort - by Stonemonkey
sub radix_sort(byval depth_list as integer ptr,byval sort_list as integer ptr,byval max_elements as integer)
   
    dim as integer radix_count1(0 to 255),radix_count2(0 to 255),depth
    for i as integer ptr=@depth_list[0] to @depth_list[max_elements-1]
        radix_count1(*i and &hff)+=1
        radix_count2((*i shr 8)and &hff)+=1
    next
   
    dim as integer radix_position1(0 to 255),radix_position2(0 to 255)
    for i as integer=0 to 254
        radix_position1(i+1)=radix_position1(i)+radix_count1(i)
        radix_position2(i+1)=radix_position2(i)+radix_count2(i)
    next
   
    dim as integer temp_sort_list(0 to max_elements-1)
    for i as integer=0 to max_elements-1
        depth=depth_list[i] and &hff
        temp_sort_list(radix_position1(depth))=i
        radix_position1(depth)+=1
    next
   
    for i as integer=0 to max_elements-1
        depth=(depth_list[temp_sort_list(i)] shr 8)and &hff
        sort_list[radix_position2(depth)]=temp_sort_list(i)
        radix_position2(depth)+=1
    next
   
end sub


sub main
   
    dim as double timed
   
    dim as integer ptr depth_list,sort_list
    dim as integer max_elements=1000000 'make this 10 to display results
   
    depth_list=new integer[max_elements]
    sort_list=new integer[max_elements]
   
   
    for i as integer=0 to max_elements-1
        depth_list[i]=rnd*65535
        if max_elements<11 then print depth_list[i]
    next
   
    timed=timer
    radix_sort(depth_list,sort_list,max_elements)
    timed=timer-timed
   
    print
   
    for i as integer=0 to max_elements-1
        if max_elements<11 then print sort_list[i];" ";depth_list[sort_list[i]]
    next
   
    print "sorted ";max_elements;" in ";timed;" seconds"
end sub


main
sleep
end
« Last Edit: April 07, 2010 by Stonemonkey »

Offline relsoft

  • DBF Aficionado
  • ******
  • Posts: 3303
  • Karma: 47
    • View Profile
Cool stuff stone.  Very useful.
Challenge Trophies Won:

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
Another one that I think is kind of in between the two methods:

Code: [Select]
'Monkey sort - by StoneMonkey
'
'depth_list = integer pointer to list of depth values in the range 0->max_depth
'
'sort_list = integer pointer to destination for sorted list of indices, initial values do not matter
'
'elements = integer value of the number of elements in the list to be sorted
'           both depth_list and sort_list MUST contain at least this number of elements
'
'max_depth = the max integer value an element can have in the depth list
'            default resolution=16 bit (0-65535), reduce for less memory use and higher speed
'
'code will crash if a depth value is outside the range 0->max_depth or if
'the number of elements allocated to either the sort or depth list is less than
'the value of 'elements'
'
sub monkey_sort(byval depth_list as integer ptr,_
                byval sorted_list as integer ptr,_
                byval elements as integer,_
                byval max_depth=65535)
               
    dim as integer count(0 to max_depth),position(0 to max_depth)
    dim as integer pointer c=@count(0),p=@depth_list[elements-1]
   
'the action of this loop can be performed when writing into depth list for extra speed
    for i as integer ptr=@depth_list[0] to p
        *(c+*i)+=1
    next
   
    p=@position(max_depth-1)
    for i as integer ptr=@position(0) to p
        *(i+1)=*i+*c
        c+=1
    next
   
    for i as integer=0 to elements-1
        c=@position(depth_list[i])
        sorted_list[*c]=i
        *c+=1
    next
   
end sub

Offline relsoft

  • DBF Aficionado
  • ******
  • Posts: 3303
  • Karma: 47
    • View Profile
Monkey sort?. LOL. Great algo though.
Challenge Trophies Won:

Offline Stonemonkey

  • Pentium
  • *****
  • Posts: 1315
  • Karma: 96
    • View Profile
from your OP:
Quote
great for non-z buffered sw 3d engines.

Not only for non z-buffered sw but z-buffered sw rendering can benefit hugely with fast sorting too.