Charles Petzold

Taper Transforms with Matrix3DProjection: An Analytical Approach

July 31, 2009
Roscoe, N.Y.

The new Matrix3DProjection class in Silverlight 3 allows programmers to apply 3D transforms to 2D elements, but it also makes available non-affine 2D transforms, including a type of transform sometimes known as a "taper transform." In a blog entry last week I showed how to derive non-affine transforms from rectangles whose corners are moved to arbitrary coordinates. That's a rather interactive approach. Today I'd like to look at taper transforms in a more analytical manner.

Let's begin with the basics. A two-dimensional linear transform can be represented by a 2×2 matrix:

 M11 M12 M21 M22

I've identified the cells using property names of the Matrix structure defined in the Windows Presentation Foundation and Silverlight. You can use this matrix to transform points (x, y) to points (x', y') with standard matrix multiplication:

 x y
×
 M11 M12 M21 M22
=
 x' y'

The transform formulas are:

x' = M11•x + M21•y
y' = M12•x + M22•y

With this 2×2 matrix transform, you can scale in the horizontal direction (by setting M11) or the vertical direction (M22), and you can perform rotation and shear by various combinations of the values of the four cells. The default matrix that performs no transformation has a diagonal of 1's:

 1 0 0 1

But there's a problem here: Although you can scale, rotate, and shear, you can not perform the type of transform known as translation, which simply shifts an object to another location on the 2D plane.

The mathematician August Ferdinand Möbius (1790–1868) realized that translation could be included in the 2D transform by adding an extra dimension to make what are called homogeneous coordinates. Basically, two dimensional translation is equivalent to three-dimensional shear. (A more extensive discussion appears on pages 300-306 of my book 3D Programming for Windows.) The transform matrix looks like this (again, using property names of the WPF and Silverlight Matrix structure):

 M11 M12 0 M21 M22 0 OffsetX OffsetY 1

Translation factors named OffsetX and OffsetY have been added as a third row. When applying this transform matrix, a two-dimensional point (x, y) is first converted to a three-dimensional point (x, y, 1) on the XY plane where Z equals 1, and than that point is multiplied by the matrix:

 x y 1
×
 M11 M12 0 M21 M22 0 OffsetX OffsetY 1
=
 x' y' 1

The 3D point that results from this calculation is also on the XY plane where Z equals 1, so the Z coordinate can simply be ignored. The transform formulas are:

x' = M11•x + M21•y + OffsetX
y' = M12•x + M22•y + OffsetY

This is the standard two-dimensional affine transform supported by the Windows Presentation Foundation and Silverlight. The values in the third column of the matrix are constants and cannot be changed. These values are necessary to keep the entire transform restricted to the XY plane. Consequently, the affine transform can never transform a square into anything other a parallelogram.

But suppose you could change those values in the third column. What would happen? What would a non-affine transform look like?

Here's a hypothetical two-dimensional 3×3 matrix capable of non-affine transforms:

 M11 M12 M13 M21 M22 M23 OffsetX OffsetY M33

Just as with the affine transform matrix, we can represent a two-dimensional point (x, y) as a three-dimensional point (x, y, 1) and multiply the point by that transform:

 x y 1
×
 M11 M12 M13 M21 M22 M23 OffsetX OffsetY M33
=
 x' y' z'

Now we're in big trouble. We've broken out of two dimensions, or at least the XY plane where Z equals 1, as the transform formulas show:

x' = M11•x + M21•y + OffsetX
y' = M12•x + M22•y + OffsetY
z' = M13•x + M23•y + M33

This is a problem because we're still trying to work in two dimensions, and somehow we have to project that three-dimensional point back on the XY plane where Z equals 1. The standard solution (and perhaps the simplest) is to divide all three coordinates by z':

(x', y', z') → (x'/z', y'/z', z'/z') → (x'/z', y'/z', 1)

Now we're back on the XY plane where Z equals 1, but with a potential problem: Division is involved, so there might well be singularities where z' equals zero, which would result in infinite coordinates. This is what makes the transform "non-affine." By definition, affine transforms do not involve infinity.

Try to get a little intuitive feel for the effect of the third column on the transformed coordinates. By default, M33 is 1; decreasing that value performs an overall positive scaling on the coordinates, and increasing it makes everything uniformly smaller. That's not very interesting, and usually M33 is kept at 1 for convenience. When M33 equals one, a positive value of M13 causes X and Y values to decrease as X gets larger, so that the image tapers in the positive X direction. Similarly a positive value of M23 causes X and Y values to decrease as Y gets larger. Negative values of M13 and M23 cause X and Y values to increase for a reverse tapering.

Silverlight 3 has not introduced a 3×3 matrix where the values of the third column can be set for non-affine transforms. However, Silverlight 3 has lifted the Matrix3D structure from WPF, and made it available for creating Matrix3DProjection objects, which can then be set to the Projection property of 2D Silverlight elements.

As you've seen, 2D graphics requires 3D homogeneous coordinates and a 3×3 matrix to allow translation to be defined along with the linear transforms. Analogously, translation in three-dimensional space is equivalent to skewing in four-dimensional space, so the 3D transform is a 4×4 matrix. These are the property names of the Matrix3D structure:

 M11 M12 M13 M14 M21 M22 M23 M24 M31 M32 M33 M34 OffsetX OffsetY OffsetZ M44

A point in 3D space (x, y, z) is represented as a 4D point (x, y, z, 1) for multiplication by the matrix:

 x y z 1
×
 M11 M12 M13 M14 M21 M22 M23 M24 M31 M32 M33 M34 OffsetX OffsetY OffsetZ M44
=
 x' y' z' w'

Notice that the fourth dimension is represented by the letter W because we've run out of letters after Z. The transform formulas are:

x' = M11•x + M21•y + M31•z + OffsetX
y' = M12•x + M22•y + M32•z + OffsetY
z' = M13•x + M23•y + M33•z + OffsetZ
w' = M14•x + M24•y + M34•z + M44

Points are projected back into 3D space with the following process:

(x', y', z', w') → (x'/w', y'/w', z'/w', w'/w') → (x'/w', y'/w', z'/w', 1)

Non-affine transforms are required in 3D graphics for perspective effects: Objects seem to get smaller as they recede from the viewer's vantage point. This is a three-dimensional taper transform. Additionally, at some point prior to rendering, all the Z coordinates are collapsed for the two-dimensional video display or printer.

In Silverlight 3, we're not really dealing with 3D space. There aren't even Point3D and Vector3D structures to help us manipulate points in the third dimension. We're really only transforming 2D points of UIElement derivatives. Matrix3D is a bit of overkill for this purpose, but it's a familiar entity (at least to WPF programmers). Conceptually, the two-dimensional coordinate point (x, y) is treated as a four-dimensional coordinate point (x, y, 0, 1) for the matrix multiplication:

 x y 0 1
×
 M11 M12 M13 M14 M21 M22 M23 M24 M31 M32 M33 M34 OffsetX OffsetY OffsetZ M44
=
 x' y' z' w'

Notice that the third row of the matrix has no effect on the result. The transform formulas are consequently somewhat simpler:

x' = M11•x + M21•y + OffsetX
y' = M12•x + M22•y + OffsetY
z' = M13•x + M23•y + OffsetZ
w' = M14•x + M24•y + M44

Points are projected back into 3D space normally by dividing by w':

(x', y', z', w') → (x'/w', y'/w', z'/w', w'/w') → (x'/w', y'/w', z'/w', 1)

You might assume that the Z coordinate is simply ignored for rendering these points on the two-dimensional surface of the video display, but that is not the case. My experimentation reveals that any point where the Z coordinate is less than zero, or greater than one, is clipped. I will try to explore this problem and solutions in future blog entries.

For two-dimensional taper transforms, you can leave cells in the third column (as well as the third row) at their default values so the matrix transform looks like this:

 x y 0 1
×
 M11 M12 0 M14 M21 M22 0 M24 0 0 1 0 OffsetX OffsetY 0 M44
=
 x' y' 0 w'

Because Z coordinates aren't involved at all in this calculation, you can visualize the transform as a non-affine 3×3 matrix such as one I showed earlier, but just with slightly different property names for the last column:

 M11 M12 M14 M21 M22 M24 OffsetX OffsetY M44

The transform formulas are:

x' = M11•x + M21•y + OffsetX
y' = M12•x + M22•y + OffsetY
w' = M14•x + M24•y + M44

The final transformed point is projected back onto the plane from which it came by dividing by w':

(x', y', w') → (x'/w', y'/w', w'/w') → (x'/w', y'/w', 1)

For simplicity (and to help avoid singularities), I will keep M44 at its default value of 1, but this isn't a requirement.

Let's begin with a UIElement derivative, such as an Image element containing a bitmap:

As with RenderTransform, the Projection transform is relative to the upper-left corner of the element to which it's applied. For purposes of calculating the transformed points, the upper-left corner of the element is (0, 0). The element has a width (which I'll refer to as W) and a height (H). The upper-right corner is (W, 0), the lower-left corner is (0, H), and the lower-right corner is (W, H).

Suppose you want to define a taper transform that causes the right side of the image to be half its untransformed height, and you want the top edge of the image to remain horizontal. In other words, when x equals 0, you want y to be unchanged, but when x equals W, you want values of y be halved, which means that w' should be 2, and M14 should be 1/W:

 1 0 1/W 0 1 0 0 0 1

If you use this Matrix3D with the Matrix3DProjection class applied to the Projection property of the Image element, you'll get:

It's actually not quite what I envisaged. I wanted the width of the Image to remain the same, but here it's being halved just like the height at the right side. This can be corrected by setting M11:

 2 0 1/W 0 1 0 0 0 1

And now the image is correct:

If you want the tapered side to be 1/3 of its normal height, you can set M14 to 2/W, and M11 to 3:

 3 0 2/W 0 1 0 0 0 1

And here it is:

If you prefer to taper at the top rather than the bottom (and let's go back to a right side that is half its normal height), the matrix is a little more complex. The value of M14 is still 1/W, and M11 is still 2, but a skewing factor must be introduced to shift the thing down:

 2 H/W 1/W 0 1 0 0 0 1

When x is zero, y' equals y, but when x is W, y' equals y + H:

It's also possible to taper towards the center of the right side with a matrix that looks much the same but with a slightly different skew value:

 2 H/(2W) 1/W 0 1 0 0 0 1

Here it is:

You can generalize all the formulas for the matrix cells, and then derive similar matrices for tapering on the bottom, top, or left sides in the three variations, or you can use a class I've written called TaperTransform. This class has the following properties:

TaperFraction of type double: Normally set to a value greater than 0 but less than 1, although values can be greater than 1 for reverse tapering.
TaperSide, a member of the TaperSide enumeration: Left, Top, Right, Bottom.
TaperCorner, a member of the TaperCorner enumeration: LeftOrRight, RightOrBottom, or Both.
TargetWidth of type double: The width of the target element.
TargetHeight of type double: The height of the target element.