LookAt

In which you are introduced to the look-at matrix, which lets you put the camera anywhere and is part of the graphics canon.

Imagine putting a camera in a virtual world. What information must you provide in order to situate it in the world? These two pieces of information might come to mind:

For example, you might position a player at the top of a hill looking down at the charred ruins of a village. But these two pieces of information alone are not quite enough to uniquely specify the view. Your don't know if the player's feet are on the ground or if the player is suspended upside down in the talons of an eagle. These two situations lead to very different views of the world.

To nail down which view you want, you must provide one additional piece of information:

With these three pieces of information, all of which are in world space coordinates, you can build a matrix that transforms world space into eye space. Many graphics libraries call this matrix the look-at matrix and provide a function to build it:

function lookAt(from, to, worldUp)
  // build a matrix

The parameters correspond to the three pieces of information needed to situate the camera.

The matrix that transforms world space into eye space is made of two operations: a translation that puts the camera at the origin, and a rotation that swings the focal direction to the negative z-axis. These two matrices combine in pseudocode to form the matrix:

function lookAt(from, to, worldUp)
  // ...
  matrix = rotater * translater

The translation must turn the camera's world space position into \(\begin{bmatrix}0&0&0\end{bmatrix}\), which it does by subtracting away the camera's position:

function lookAt(from, to, worldUp)
  translater = translate(-from.x, -from.y, -from.z);
  // ...

The rotation is more involved. Before you can construct the rotation matrix, you must be aware of these helpful and non-obvious properties of all rotation matrices:

$$ \begin{bmatrix} a & b & c & 0 \\ \ldots & \ldots & \ldots & 0 \\ \ldots & \ldots & \ldots & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} $$

You derive the rows using these facts and the three parameters. The world vector that maps to eye vector \(\begin{bmatrix}0&0&1\end{bmatrix}\) is the normalized vector that leads from the object of focus to the camera's position:

normalize(from - to)

However, later on you will need a vector that points in the opposite direction. You might as well compute this inverse now. The vector that leads from the camera to the object of focus is often called the forward vector:

forward = normalize(to - from)

The forward vector is the camera's focal direction. It states which way the camera is pointing.

The inverse of this forward vector forms the third row of the rotation matrix since it becomes the positive z-axis in eye space:

$$ \begin{bmatrix} ? & ? & ? & 0 \\ ? & ? & ? & 0 \\ -\mathrm{forward}_x & -\mathrm{forward}_y & -\mathrm{forward}_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} $$

You need the world vector that becomes eye vector \(\begin{bmatrix}1&0&0\end{bmatrix}\). This vector aligns with the viewer's outstretched right arm. At first blush, your parameters don't seem to offer much information about this right direction. However, you do know the forward and up directions. The right vector is perpendicular to both of these. If you cross them, you'll have your right vector:

right = cross(forward, worldUp)

The right vector forms the first row of the matrix:

$$ \begin{bmatrix} \mathrm{right}_x & \mathrm{right}_y & \mathrm{right}_z & 0 \\ ? & ? & ? & 0 \\ -\mathrm{forward}_x & -\mathrm{forward}_y & -\mathrm{forward}_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} $$

The only row missing is the world vector that becomes eye vector \(\begin{bmatrix}0&1&0\end{bmatrix}\). One of the parameters to your lookAt function is the world's up vector, so it must be the one to form the middle row, right? No, not always.

Remember your player standing at the top of the hill? If the player's feet are planted on the ground, you'd send along \(\begin{bmatrix}0&1&0\end{bmatrix}\) as the world's up vector. But if the player is looking down at the village, then the forward vector and up vector are not perpendicular to each other. In rotation matrices, all vectors must be perpendicular. This means that you can't use worldUp directly in your matrix.

The up vector that goes in your matrix must be perpendicular to the right and forward vectors you've already computed. You therefore cross them to get the camera's up vector:

up = cross(right, forward)

The camera's up vector forms the second row of your matrix:

$$ \begin{bmatrix} \mathrm{right}_x & \mathrm{right}_y & \mathrm{right}_z & 0 \\ \mathrm{up}_x & \mathrm{up}_y & \mathrm{up}_z & 0 \\ -\mathrm{forward}_x & -\mathrm{forward}_y & -\mathrm{forward}_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} $$

This rotation matrix effectively swings the focal direction so that it aligns with the negative z-axis of eye space. When combined with the translation matrix described earlier, you have your eyeFromWorld matrix. With it, you gain the ability to place the eye anywhere in the world and looking in any direction.