With mipmaps your calculating one resulting level from both deltaU,deltaV so each pixel is somewhere between two mip-levels.
With ripmaps you you select different maps for different deltaU,deltaV so each pixel is essentially an average of four maps (and you probably don't want to do that in a software-renderer).
However, you probably do perspective correction every "n" pixels and interpolate linearly in between.
So what you've got is one set of (deltaU,deltaV) for each span:
prev current next
+-------+-------+-------+
(du,dv) (du,dv) (du,dv)
From each set of (du,dv) you can extract one ripmap (integer part of the deltas) and a subprecision level for trilinear filtering (the fractional part).
As long as two neighbouring sets of (du,dv) indicate the same ripmap everything is fine and doesn't need further handling.
Otherwise you have to subdivide the current span again and calculate where exactly *one* of the deltas reaches a new ripmap-level (the fractional part of delta[n+1]-delta[n] is 0).
As in this sheme only one level of anistropy changes at a time you just have to interpolate between two ripmaps which is essentially just like trilinear interpolation with mipmaps.