atm it's just using a single cube being repositioned so I can't share the verts yet.
Trying to think of ways not to have to loop through every position in the 3d grid too.
It's much faster to fill only the relevant parts of the volume-grid using a bounding box for each sphere.
The box just needs to be big enough to make 1/distance insignificant to the test-threshold.
Since the boxes will overlap you have to make sure to visit every sub-cube only once when extracting the surface.
You can even use an "inner" bounding box for all distances which are definately bigger than the test-threshold. This area doesn't need any distance calculation (there can't be any surface anyway) and can be fill with a constant value instead (I never tried this, though and maybe it's more hassle than advantage).
There are also several approximations for 1/sqr which are sufficiently precise for this task, the most prominent is probably
this one and SSE's
RSQRTPS can even calculate 4 in parallel.
As you've already calculated 1/sqr of
(x-cx)^2 + (y-cy)^2 + (z-cz)^2 for the previous voxel, you can use that as a pretty good approximation for
(x-cx+1)^2 + (y-cy)^2 + (z-cz)^2 and just do one newton-iteration for refinement (this is not very precise near 0 but that area isn't very interesting anyway).
With your 2x2x2 appraoch you're now doing 8x the work.
When parsing the sub-cubes along the x-axis, you can always reuse the 4 left test-flags and just shift them to another position in the bitmask.
If you save the flags of the previous row you only need to update two flags.
And if you also save the flags of the last slice you only need to update a single flag.
Try to add some early-out-test if the flags signal a complete full (all bits set) or empty (all bits zero) sub-cube as it's nothing to do then.
You don't need face-connectivity information to calculate the normals.
You can get the normal-vector at each grid-point from the gradient of the volume field:
normal(x,y,z)= normalize(
grid(x+1, y, z ) - grid(x-1, y, z ),
grid(x, y+1, z ) - grid(x, y-1, z ),
grid(x, y, z+1) - grid(x, y, z-1)
);The normals can then be interpolated like the vertex-positions.