Dark Bit Factory & Gravity
PROGRAMMING => Other languages => ASM => Topic started by: Stonemonkey on September 10, 2016
-
Has anyone here done any?
At the moment the inner loop of my gradient filled tris looks like this:
tri_loop:
Vcvt.s32.f32 q1,q0 //convert 4 floats to ints
Vmov r2,r3,s4,s5 //move r&g ints to arm registers
Vmov r4,s6 //move b int to arm register
Orr r3,r2,lsl #8 //shift red and or with green
Orr r4,r3,lsl #8 //shift r&g and or with blue
str r4,[r0],#4 //write to pixel address and modify pointer
Vadd.f32 q0,q2 //add colour deltas
Cmp r0,r1 //compare pointer with final address
Ble tri_loop //repeat if less or equal
But I'm wondering if there's a way to shuffle the bytes in q1 to take the first bytes from each 32 bits and put them together to write to memory instead of moving into the arm registers and doing the or'ing and shifting.
-
Found a way, this is now 16:16 fixed point gradient triangle inner loop.
Triangle_loop:
Vtbl.u8 d2,{d0,d1},d6 //shuffle bytes from d0/d1 to d2, d6=table
Fsts s4,[r6] //Write to pixel address
Add r6,#4 //modify pixel pointer
Vadd.s32 q0,q2 //add colour deltas
Cmp r6,r7 //compare pixel address to final address
Ble triangle_loop //repeat if less or equal