Orthographic Projection

In which you develop a mechanism for rendering a different chunk of the world than the unit cube.

You already know how to transform objects in model space to world space through rotation, scaling, and translation. Later you will learn how to set up a matrix that goes from world space to eye space. What you want to focus on right now is how to transform objects from eye space to normalized space.

First you must decide what chunk of the world you want to be visible. One way to do this is to define a box in eye space coordinates using six parameters. You decide the left edge of the box, the right, the bottom, the top, the near, and the far.

Another name for this box is the viewing volume. Explore how the six parameters define the viewing volume in this interactive renderer:

The white sphere represents the viewer's eye. The wireframe box reflects the current parameters. Move the box around by editing the inputs. The preview in the bottom-left shows what the viewer sees. Ask yourself a few questions:

You pick x-coordinates for the box's left and right faces, y-coordinates for its bottom and top faces, and z-coordinates for its near and far faces. Recall that in eye space the viewer is located at the origin and by convention looking down the negative z-axis. That means that \(\mathrm{near}\) and \(\mathrm{far}\) should be negative, since they define planes in front of the viewer. However, many graphics libraries expect you to pass them as positive numbers, with \(\mathrm{near}\) less than \(\mathrm{far}\). The libraries negate the values internally. Following the same convention, the box spans these intervals:

$$ \begin{aligned} x &\rightarrow \mathrm{left} \ldots \mathrm{right} \\ y &\rightarrow \mathrm{bottom} \ldots \mathrm{top} \\ z &\rightarrow -\mathrm{near} \ldots -\mathrm{far} \\ \end{aligned} $$

From these six parameters you compute the box's dimensions:

$$ \begin{aligned} \mathrm{width} &= \mathrm{right} - \mathrm{left} \\ \mathrm{height} &= \mathrm{top} - \mathrm{bottom} \\ \mathrm{depth} &= \mathrm{far} - \mathrm{near} \\ \end{aligned} $$

The box that you have defined must be transformed into the unit box of normalized space. This is done in two steps:

  1. Translate the center of your box to the origin.
  2. Scale your box so it fits into the range [-1, 1] on all dimensions.

To translate your box to the origin, you substract away the box's midpoint. The midpoint is the average of its coordinates:

$$ \begin{aligned} \mathrm{midpoint}_x &= \frac{\mathrm{right} + \mathrm{left}}{2} \\ \mathrm{midpoint}_y &= \frac{\mathrm{top} + \mathrm{bottom}}{2} \\ \mathrm{midpoint}_z &= -\frac{\mathrm{near} + \mathrm{far}}{2} \end{aligned} $$

Note that the z-coordinate is negative to correct for \(\mathrm{near}\) and \(\mathrm{far}\) being positive. The transformation that subtracts away the midpoint is a translation represented by this matrix:

$$ \begin{bmatrix} 1 & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{2} \\ 0 & 1 & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{2} \\ 0 & 0 & 1 & \frac{\mathrm{near} + \mathrm{far}}{2} \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

After applying this translation, your box is centered around the origin, with half of the box on each side of the origin. That means your box now has the following bounds:

$$ \begin{aligned} x &\rightarrow -\frac{\mathrm{width}}{2} \ldots \frac{\mathrm{width}}{2} \\ y &\rightarrow -\frac{\mathrm{height}}{2} \ldots \frac{\mathrm{height}}{2} \\ z &\rightarrow \frac{\mathrm{depth}}{2} \ldots -\frac{\mathrm{depth}}{2} \\ \end{aligned} $$

The near face of the box now intersects the positive z-axis, while the far face intersects the negative z-axis.

The bounds of the unit box into which you are trying to squeeze spans these intervals:

$$ \begin{aligned} x &\rightarrow -1 \ldots 1 \\ y &\rightarrow -1 \ldots 1 \\ z &\rightarrow -1 \ldots 1\
\end{aligned} $$

Note that WebGL expects the near face to map to -1 and the far face to 1. This is flipped from the eye space convention.

You go from your box to the unit box by dividing by the half-width, half-height, and negative half-depth. This scale matrix performs that division by multiplying by the reciprocals of the half-dimensions:

$$ \begin{bmatrix} \frac{2}{\mathrm{width}} & 0 & 0 & 0 \\ 0 & \frac{2}{\mathrm{height}} & 0 & 0 \\ 0 & 0 & \frac{2}{\mathrm{-depth}} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

The z-factor is negated in order to flip the z-coordinates and match what WebGL expects. If only your computer graphics forebears had decided to make the viewer look along the positive z-axis, you wouldn't have to deal with these inconsistencies.

Since both the translation and scaling are represented as 4x4 matrices, you can chain them together:

$$ \begin{bmatrix} \frac{2}{\mathrm{width}} & 0 & 0 & 0 \\ 0 & \frac{2}{\mathrm{height}} & 0 & 0 \\ 0 & 0 & \frac{2}{\mathrm{-depth}} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \times \begin{bmatrix} 1 & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{2} \\ 0 & 1 & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{2} \\ 0 & 0 & 1 & \frac{\mathrm{near} + \mathrm{far}}{2} \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

The translation must be applied first, so it goes on the right. Multiplying them through yields this matrix:

$$ \begin{bmatrix} \frac{2}{\mathrm{width}} & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{\mathrm{width}} \\ 0 & \frac{2}{\mathrm{height}} & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{\mathrm{height}} \\ 0 & 0 & \frac{2}{\mathrm{-depth}} & \frac{\mathrm{near} + \mathrm{far}}{\mathrm{-depth}} \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

You will often see this matrix expressed strictly in terms of the original six parameters:

$$ \begin{bmatrix} \frac{2}{\mathrm{right} - \mathrm{left}} & 0 & 0 & -\frac{\mathrm{right} + \mathrm{left}}{\mathrm{right} - \mathrm{left}} \\ 0 & \frac{2}{\mathrm{top} - \mathrm{bottom}} & 0 & -\frac{\mathrm{top} + \mathrm{bottom}}{\mathrm{top} - \mathrm{bottom}} \\ 0 & 0 & \frac{2}{\mathrm{near} - \mathrm{far}} & \frac{\mathrm{near} + \mathrm{far}}{\mathrm{\mathrm{near} - \mathrm{far}}} \\ 0 & 0 & 0 & 1 \end{bmatrix} $$

This is the matrix that turns your viewing volume into the unit cube. It is called an orthographic projection, a term that contrasts it from a perspective projection. In a perspective projection, distant objects appear smaller than near ones and parallel lines converge in the horizon. In an orthographic projection, the depth of objects does not change their appearance and parallel lines remain parallel.

Since orthographic projections don't provide the cues that viewers expect as they make sense of a 3D scene, they are mostly used when the viewer needs to precisely manipulate objects in a Cartesian grid, such as in a 3D modeling program or a game with an isometric view.

Exercises

Try adding an orthographic projection to one of your renderers by completing the following exercises.

Add method Matrix4.ortho to your library of code. Have it accept the six parameters: left, right, bottom, top, near, and far. Return the 4x4 matrix described above.
In JavaScript, upload an orthographic projection matrix as a uniform with code like the following:
const normalizeFromEye = Matrix4.ortho(left, right, bottom, top, near, far);
shaderProgram.setUniformMatrix4('normalizeFromEye', normalizeFromEye);
Adjust the vertex shader so that it applies this matrix to the incoming vertex position.