Charles Petzold



Rudimentary 3D on the 2D HTML Canvas

September 29, 2024
New York, N.Y.

I’ve been working on a graphical web project that requires an interactive roll of a pair of dice. Dice are cubes and would be most convincingly rendered with 3D graphics. But because these dice are only a tiny part of this project, I was reluctant to call out the big guns. I didn’t want to wrestle with the low-level WebGL to do 3D graphics or even a friendlier WebGL wrapper such as three.js.

Moreover, most of the project involving the dice uses conventional HTML5 Canvas graphics, and you can’t mix those 2D graphics and WebGL on the same canvas element. (WebGL supports 2D graphics as well as 3D but it’s a whole different animal.) For these reasons, I felt I could simulate 3D dice with much simpler 2D graphics and a little math.

I started with a wireframe display of a cube:

15

30

45

You can use the three sliders to rotate the cube around three orthogonal axes: the horizontal X axis, the vertical Y axis, and the Z axis that conceptually extends out of the screen. Because these rotations are compounded, the effects are mathematically rather complex and might be puzzling.

This is a familiar optical illusion known as the Necker Cube. Foreground and background are ambiguous, and sometimes as you’re rotating the cube, foreground and background seem to switch back and forth.

This cube is rendered using simple 2D graphics available on the HTML5 Canvas element: each face is a square path constructed with moveTo and lineTo calls and rendered with stroke. The 3D illusion is accomplished entirely by positioning the points in the right place, and that’s done with matrix algebra.

The CubeWireFrame.js file contains the code to define and render the cube. The definition of the 3D vertices of the cube at the start of the CubeWireFrame constructor makes use of DOMPoint objects, which can accommodate either 2D (x, y) points or 3D (x, y, z) points. Conceptually, the cube is centered on the point (0, 0, 0) and extends 1 unit in all dimensions. I use a right-hand coordinate system, which means that positive values of the Z axis extend out of the screen toward the viewer. (If the index finger of your right hand points in the direction of increasing X values, and your middle finger points to increasing Y values, then your thumb points to the direction of increasing Z values.) The four points that define each face of the cube are given in a clockwise order. (This is important later.)

Four private properties are defined prior to the constructor and are exposed through public getters and setters. The Render method in the CubeWireFrame class draws the cube. This job is facilitated by a DOMMatrix object, which can represent a 4-by-4 transform matrix suitable for 3D graphics. The matrix is built through calls to rotateSelf and scaleSelf. (Alternatively, rotate and scale calls could be used to create a new matrix with each call.) However, this matrix isn’t used in any API calls. I could have defined my own 4-by-4 matrix class and used that. Instead, each of the 3D points in the cubeVertices array is subjected to this 3D transform by a call to the transformPoint method of DOMMatrix. These are then converted to 2D points for the drawing calls by simply ignoring the Z coordinate and using the negative of the Y coordinate. (In 3D, positive values of Y go up; in 2D, they go down.)

All the slider logic is handled in the DieDemo class in the DieDemo.js file, which is responsible for obtaining the graphics context and calling the Render method in the CubeWireFrame class. The DieDemo class is instantiated in the first part of the Demos3D.js file, which instantiates everything else in this blog entry.

But the goal here is to draw a die rather than a wireframe cube. Each face must be rendered with a familiar dot pattern. And at any time, not all six faces should be displayed: Any face of the die that is not oriented towards the viewer should not be drawn because it can’t be seen.

For drawing the six faces of the die, I had a few choices: The fillStyle property of the HTML Canvas is customarily set to a solid color, or a linear or radial gradient, but it can also be set to a CanvasPattern object created by the createPattern method. This can be a bitmap, an SVG image, a video frame, or an off-screen Canvas element created specifically for this purpose. I chose this final option, which allowed me to draw the six faces of the die with familiar 2D graphics calls.

As you rotate the wireframe cube, you’ll notice that each face stops being a square and instead becomes a rhombus, which is a type of parallelogram. This means that any image used for a die face has to be scaled and skewed. Fortunately the HTML5 Canvas supports the standard 2D affine transform that allows skewing. This type of transform is represented by a 3-by-3 matrix multiplication:

| x y 1 | × | a b 0 c d 0 e f 1 | = | x y 1 |

This results in the following transform formulas:

ax+cy+e=x bx+dy+f=y

You can define such a 2D matrix using the constructor of the same DOMMatrix object that I used for the 3D transforms. That matrix is then passed to the setTransform method of the object returned from createPattern to transform a pattern in addition to any transforms set on the graphics context. In the general case, a, b, c, d, e, and f must be obtained by solving two sets of three simultaneous equations for the following mapping:

You only need to solve for three sets of points because the fourth comes along for the ride.

However, in the case where you’re using an entire bitmap or off-screen canvas for your pattern, the mapping simplifies considerably because the source coordinates (at the left) are often zero. In this diagram w is the width of the image and h is the height:

Using the upper-left, upper-right, and lower-left points, the simultaneous equations crumble into something almost trivial and you can derive:

a=(x1x0)/w b=(y1y0)/h c=(x3x0)/w d=(y3y0)/h e=x0 f=y0

Another issue is that surfaces in 3D space conceptually have both a front and a back. For the six faces of a die, the front of each face is the outside of the cube and the back is the inside. These images of the die dots must be oriented on the front side. Moreover, only those faces with front sides facing the user (either fully or partially) should be drawn.

The orientation of a surface in 3D space is described by something called a surface normal. This is a 3D vector that is orthogonal to the surface; in the case of a face of a cube, the normal points outward from the face. In my simple coordinate system, if the Z coordinate of the normal is positive, then the normal is conceptually pointing out of the screen and the viewer should be able to see that surface.

This normal can be easily calculated by the vector cross product. The cross product of two 3D vectors A and B is symbolized as A × B. The result is a third vector that is orthogonal to both A and B. For a right-hand coordinate system, the direction of this vector is given by another right-hand rule: Curve the fingers of your right hand to sweep from A to B. The thumb points in the direction of the cross product. The cross product is not commutative.

Because I thought I might need some additional vector functions, I defined a Vector3 class in the Vector3.js file. This is not a complete Vector3 class but has some essentials that I use in the blog entry.

Here’s the result:

1

15

30

45

Notice that the value of the die is also displayed.

The Die3D class that implements this graphic extends the CubeWireFrame class and is defined in the Die3D.js file. The constructor is devoted to creating six canvas objects on which are rendered the six standard faces of a die. Because each of these faces has the same background color, each of these images is outlined in black to make the faces distinct. Otherwise the background of the faces would blend into each other and you wouldn’t be able to see the edges.

The Render method loops through the cube vertices and calculates a cross product from the left and top sides. This is a the normal vector pointing out from the face of the surface. The surface is only visible if the Z coordinate of this normal is positive. By keeping track of the maximum Z coordinate, it’s also possible to determine which side is facing towards the viewer. This is displayed as the Die Value. The Render method creates a path based on the 3D transformed vertex points and uses those transformed points to calculate a matrix to transform the pattern as described above.

This is truly “rudimentary” 3D, as the title of this blog entry indicates. It would be possible to use this same technique to display other types of polyhedra, but only convex polyhedra. If one face partially obscures another face, then rather complex clipping would have to be implemented. Similar problems would arise when displaying multiple overlapping polyhedra. This is stuff that a 3D system like WebGL handles automatically.

Also, there’s no perspective. Objects in the background are the same size as objects in the foreground. Perspective in 3D is accomplished through a camera transform, but perspective has the effect of transforming rectangles not to parallelograms but to irregular quadrilaterals, sometimes with infinite dimensions, in which case it would not be possible to define a 2D transform to stretch the patterns to cover the die faces.

There’s also no concept of light providing different levels of illumination of the faces of the die. But that’s possible to implement using the surface normals. For example, here’s a dodecahedron whose 12 faces are illuminated based on a hypothetical light source from the upper left:

Click the Animate button to see it spin, and for the sides to change shade based on their orientation.

This job was facilitated by some code I wrote for my book 3D Programming for Windows: Three-Dimensional Graphics Programming for the Windows Presentation Foundation (Microsoft Press, 2008). In addition to the demonstration programs for the book, I also created a 3D media library, all of which is available in this ZIP file. Unfortunately, many of these demos are standalone XAML files, which no longer run in the browser, and even the WPF EXE files don’t seem to be running properly. But the media library includes definitions of a bunch of primitive 3D objects, including the dodecahedron.

In 3D programming, three-dimensional objects are constructed from triangles rather than the squares that I used for the die. Triangles are the simplest form of polygon and are guaranteed to lie entirely on a plane. Each of the pentagonal faces of the dodecahedron is constructed from five triangles that share a point in the center of the pentagon. A collection of such triangles (along with other information) is known as a mesh. In WPF 3D, the MeshGeometry3D class defines the following properties:

  • Positions, a collection of all the 3D vertices of the triangles in the object
  • Normals, a corresponding collection of the 3D surface normals for each vertex
  • TextureCoordinates, a corresponding collection of 2D coordinates for displaying 2D images (such as a bitmap) on the triangles
  • TriangleIndices, a collection of groups of three indices into the other three collections
  • The equivalents of these are common in other 3D programming environments.

    For my WPF 3D book, I created a bunch of mesh classes to generate these mesh geometries, and for this exercise, I converted some of these classes from C# to JavaScript, simplifying them in the process: DodecahedronMesh derives from PolyhedronMeshBase, which derives from FlatSurfaceMeshBase, which derives from MeshGeneratorBase, which originally derived from the WPF Animatable class but into which I incorporated the four property definitions from MeshGeometry3D.

    The DodecahedronDemo class instantiates the DodecahedronMesh class and handles the input event from the checkbox to initiate a requestAnimationFrame animation to increase the value of the rotationAngle field. The render method uses that value along with three rotationAxis constants to create a matrix transform implementing an axis-angle rotation.

    The render method then loops through all triangle indices three at a time. These are used to index and transform both the mesh positions that define the triangles and the normals. The transformed positions are used to draw each triangle. The normals are used for lighting effects.

    The constructor defines several fields that govern the appearance of each surface. For this demo, I also wrote a rudimentary Color class, and the materialBrush field is set to a Color object for cyan. This is the basic color of each face. The ambientLight field is a dark gray with color components of 0x40, while the directionalLight is a light gray with color components of 0xC0. In addition, the directionalLight has a direction vector from the upper left.

    The direction of directional light is specified with a 3D vector. The intensity of directional light on a surface can be calculated as the cosine of the angle between the direction vector and the negative of the surface normal. Fortunately, the angle between two vectors is available in the Vector3 class in connection with the vector dot product.

    For example, if the light direction vector and the negative of the surface normal are the same, the cosine of the angle between these vectors is 1 and light is shining directly on the surface. If the light direction vector is at right angles to the surface normal, then the cosine is 0 and the directional light has no effect. The surface would then be black were it not for ambient light. The full calculation of the color of the faces is in the Render method of DodecahedronDemo.

    Can this technique be extended to render curved surfaces such as a sphere?

    Not in the general case. Here’s the problem: When a triangle mesh for a sphere is defined, the normals are not orthogonal to each triangle unless you want something that looks like this:

    It’s almost a sphere but it’s not smooth. When a mesh geometry for a sphere is defined (as you can see in SphereMesh), the surface normals are not orthogonal to each triangle but instead point out from the center of the sphere. These surface normals are different for each vertex of the triangle, so each triangle needs to be rendered with a color gradient that in an exaggerated way looks something like this:

    In real life, the three vertices of the triangle wouldn’t be three distinct colors like red, green, and blue, but only slightly different shades of the same color. In the HTML5 Canvas, only two types of gradients are available: a linear gradient and a radial gradient, and neither is adequate for this job. I can imagine such a triangle being filled by a series of lines, each of which has a different linear gradient, but I didn’t attempt such a job and, in fact, this tri-colored triangle is the only image in this blog entry created using WebGL. (See the WebGLTriangle.js file if you’re curious.)

    However, if the curvature of an object is only in one direction — such as a cylinder — then the triangles are probably defined so that two vertices in each triangle have the same normal, in which case the gradient looks more like this:

    And that is a linear gradient. One edge of this triangle is a line of constant color:

    To color a triangle with such a gradient, the createLinearBrush method must be called with two points that specify the start and end of the gradient line. This gradient line extends from the third vertex of the triangle and is at right angles to the constant-color line:

    All that’s required now is the point where those two lines intersect.

    Lines can be mathematically defined in several different ways. Perhaps my favorite is a parametric equation based on a point (p) and a vector (v):

    line(t)=p+t·v

    Because lines are conceptually infinite in length, t ranges over all the positive and negative reals rather than just from 0 to 1.

    A line can be defined with two points p0 and p1 by calculating the vector corresponding to those points:

    line(t)=p0+t(p1p0)

    which in two dimensions corresponds to these two parametric equations:

    x(t)=x0+t(x1x0) y(t)=y0+t(y1y0)

    One of the conveniences of defining two-dimensional lines in this way is that it’s easy to find a perpendicular line. In two dimensions, if a vector is (a, b) then a perpendicular vector can be either (–b, a) or (b, –a), which point in opposite directions to each other.

    The line of constant color can be expressed like this with a C subscript on t:

    x0+tC (x1x0) y0+tC (y1y0)

    The perpendicular gradient line has a G subscript on t:

    x2+tG (y1y0) y2+tG (x0x1)

    But notice the expressions in parentheses: The first equation uses the Y component of the constant-color vector, and the second is the negative of the X component of that vector. That’s the perpendicular vector.

    The intersection of those lines is where the corresponding pairs of parametric equations equal each other:

    x0+tC (x1x0) = x2+tG (y1y0) y0+tC (y1y0) = y2+tG (x0x1)

    And now it’s just two equations with two unknowns (tC and tG), only one of which needs to be determined to get that intersection point. This is done in the CylinderDemo file, which makes use of the CylinderMesh class, and here it is:

    We can’t do something similar with a sphere, but we can decorate a sphere with a texture, for example, a bitmap, perhaps this map of the earth that I downloaded from the NASA website:

    Normally if you’re using an image like this for a 3D texture, you’d simply set a style of display: none on the img element so it wouldn’t be visible.

    This bitmap is 1440 pixels wide and 720 pixels high, but the big advantage is that it’s an equirectangular projection: If it were superimposed on a Cartesian coordinate system, the coordinates would correspond linearly to lines of longitude and latitude, 4 pixels per degree. This is good because when calculating a sphere mesh such as in SphereMesh, relative texture coordinates are calculated so that an image like this can be wrapped around the sphere.

    In a real 3D graphics system, the overlaying of an image on a mesh is done pretty much automatically. When simulating 3D in a 2D system, each triangular fragment of the image needs to be mapped to a triangle of the mesh. Now there’s no avoiding solving two sets of three equations and three unknowns for deriving a transform for the pattern. The mesh my code generates for the sphere has 1,224 triangles, all of which require a different transform for the pattern, which is why the SphereDemo class defines a calculateTransform method that uses Cramer’s Rule for the number crunching. Here’s the result:

    The animation is a compound transform of three rotations: a rotation around the Y axis at 1 revolution every 4 seconds to simulate the daily rotation of the Earth on its axis, then a rotation of 23° around the Z axis to simulate the axial tilt of the Earth, then another rotation around the Y axis every 24 minutes as the Earth makes its way around the Sun.

    To clarify this, I added an axis that seems to extend from the north and south poles, but what a hassle: Depending on the tilt, the two components had to be drawn either before or after the Earth itself, and even then it just looks like a line instead of an axis.

    So at this point, one might question whether WinGL is really that much harder than simulating 3D graphics using 2D.