Catalase Website IndexJamesTechnoOptimisation •3D Optimisation •

Fast 3D Transform for 68000

Molecular Biologists may want to view 3D models of protein molecules on a computer screen and to rotate them to look at them from different viewpoints

Each atom of the molecule has some 3D coordinates, (x,y,z).  In standard implementations of molecular viewers, to convert to positions on a screen these (x,y,z ) coordinates are multiplied by a 3 by 3 rotation matrix R to get screen coordinates (X,Y,Z ).  We have:

X = R11 x + R12 y + R13 z
Y = R21 x + R22 y + R23 z
Z = R31 x + R32 y + R33 z

If we were applying perspective we would then scale X and Y down by the distance from the viewer, i.e. divide each of these by Z .  However, a non perspective 'orthogonal projection' is just fine for this kind of a molecular model.

The above with a direct implemntation takes nine multiplications and six additions per transformed atom's coordinates.  On a 68000 processor multiplications take 77 cycles and additions take 4 cycles, so each atom transformed takes at least 725 cycles.

There are several ways to reduce this work.  The matrix R can be decomposed into two 2D rotation matricese.  The calculation then requires eight multiplications and four additions taking just 632 cycles.  However, a much larger saving can be made on this problem.  The approach presented here for rapid rotation eliminates multiplication entirely from point transformation calculations.

The trick is to notice that most of the time the molecule is being rotated about a fixed axis.  We can use this to pre-prepare our coordinates.

A point moving in a unit circle has coordinates ( cos t, sin t, 0 ).  After being transformed by R we get:

X = R11cos t + R12sin t

Similar equations hold for Y and Z .  These are the parametric equations for an elliptical path on the screen - which is what a circle looks like in orthogonal perspective.

For a more general circle we need to be able to adjust the radius A and the position of the centre of the circle, B:

X = A(R11cos t + R12sin t + B

By some trigonometric magic this is equivalent to:

X = M(cos( t + θ)) + B

The cosine and sine waves have here been combined to create a single cosine wave with a new amplitude M and phase θ. We need to do this calculation for each coordinate, so there are three multiplications, six additions and three look ups of cosines from a prebuilt cosine table.  We only need ten bit accuracy on the cosines, so a table is perfectly O.K.  The 68000 code for this is:

Cycles
movem.w (a1)+,d0-d2  ;fetch m, theta and base value            (15)
muls    0(ao,d1),d0  ;replace d0 with m*cos(t+theta)           (77)
clr.w   d0           ;clear out insignificant bits             (04)
swap    d0           ;bring sig bits into lower word           (04)
move.w  d0,(a2)+     ;ship result out                          (05)
TOTAL: 109

Since the same calculation is also done for Y and Z the time taken is 327 cycles per atom.

Can we do better?

With some more trig the X equation is converted to:

X = 512( cos( t + θ + φ) + cos( t + θ - φ)) + B

The trick here is in choosing φ. If φ is zero the two cosine waves reinforce each other and the X has maximum possible variation as t varies.  We need this if 'M' was large.  As φ is increased towards 90o the two cosines become increasingly out of phase with each other, decreasing the variation in X as t varies.  We need this if 'M' was small.  The scale factor of 512 is in the equation because without it M would be restricted to the range -2 < M < 2.  The 512 scale factor is absorbed into the precomputed table of cosines.  The 68000 code now looks like this:

Cycles
movem.w (a1)+,d0-d2  ;fetch basevalue, theta+phi, theta-phi    (15)
move.w  d0,(a2)+     ;ship result out                          (05)
TOTAL:  42

Notice that there are no multiplications.  This method reduces the time taken per atom to 126 cycles, a near six fold reduction over the direct implementation.

It turns out that the calculations needed to get the φ's, θ's and B's are about twice as 'expensive' as the original matrix multiplication scheme.  Typically a molecule rotates for at least 30 frames about one axis before a new axis is chosen so the cost of computing these φ's, θ's and B's is paid back many times over.

The method was well suited to processors of the mid 80's with slow (or even software-only) multiply instructions.  At that time it made sense to trade multiply instructions for the relatively much faster table look up of cosines.  The technique is not well suited to pentium class processors.  On pentiums using a look up table for cosines, which will cause many cache misses, is an expensive alternative to multiplication.

However, the technique of pre-processing coordinate sets prior to rendering a sequence of frames from related viewpoints is still useful in current graphics programming.  With the right pre processing one can eliminate 'hidden objects' and set levels of object detail for a whole sequence of frames rather than calculate it on a per-frame basis.

Summary

For a 68000 processor a progression of optimisations was as follows:

 Direct implementation 725 Cycles Factored rotation matrix 632 Cycles M cos(phi) method 327 Cycles cos(phi+theta) + cos(phi-theta) method 126 Cycles

The near six fold saving is through recognising that a calculation can be done once on the coordinates that then makes significant savings on a sequence of subsequent frames.