Derivations of the perspective projection matrix, whether in books or on the web, always feel either overly complicated or completely lacking in detail–sometimes the perspective projection matrix is just stated without much explaination. In surveys of image projection, that is projection of a 3D scene to 2D, orthographic projection is presented as a contrasting method without relation to perspective projection. While the light model–how light travels to the image plane–underlying the different projection types differ, both can be formulated as projective transformations from their respective view volumes to the canonical view volume. Once in the canonical frame, $x$ and $y$ coordinates are already considered projected, which is to say they are in image plane coordinates; so both perspective projection and orthographic projection are image projections on each depth slice of their respective view volumes. When rendering to the screen, depth information is required to determine those objects and parts of objects not occluded by other objects.

At least at the level of formulas perspective projection and orthographic projection look similiar enough to believe perspective projection could be written in terms of orthographic projection, which IMHO greatly simplifies its derivation. In this note I derive the perspective projection matrix by factoring it through the orthographic view volume.

Given a 3D scene in world coordinates, how does an image of the scene form in the camera, and how do 3D scene points project to the image plane? The short answer to the later question is the camera matrix. And this short note is devoted to deriving the camera matrix. First some mathematical preliminaries.

From Pencils to Points

Consider the generic scenario $\vec{x} \in \mathbb{R}^{n+1}$ and a plane $x_{n+1} = f$. Pencils in $\mathbb{R}^{n+1}$, that is lines which pass through both the origin and $x_{n+1} = f$, map to unique points in plane $x_{n+1} = f$. Figure #1 shows a pencil intersecting plane $x_{n+1} = f$ at unique point $\tilde{x}$.

Under the guise of rotating solid spherical harmonic functions, I want to investigate the computational aspects of representing rotations of homogeneous polynomials. The connection being that harmonic homogeneous polynomials are solid spherical harmonics. My goal in this note is to rotate solid homogeneous polynomials ( in variables $x$, $y$, and $z$ ) by computing representations of $SO(3)$ on \(\mathcal{P}_d\), homogeneous polynomials of degree $d$. One nice aspect of computing representations of $SO(3)$ is that only algebra is required. Differential equations typically used to describe/derive spherical harmonics make no appearance in what follows.
... read more

is the area form of $S^n$, in terms of its coordinate functions. While trying to understand this formula, I was reminded of the cross product formula, which is an alternating sum of determinants. Well, sure, alternating forms like \(\Gamma_i = dx_1 \wedge \cdots \wedge \widehat{dx_{i}} \wedge \cdots \wedge dx_{n+1}\) are really determinants in disguise. Suppose $v_1, \ldots,v_n$ are tangent vectors at some point on $S^n$. Individually, 1-forms $dx_{j}(v)$ are linear functionals and act on tangent vectors $v \in T_{p}S^n$. Essentially, $dx_{j}(v_k )$ is a portion of the derivative $Dx_j$ in the $v_k$ direction; the directional derivative. Alternatively, it is the action of $v_k$ on coordinate $x_j$. I summarize these two descriptions below.
... read more

3D medical images in the NifTI format store data from MRI acquisitions of some anatomy, like a human brain or heart. When I was a postdoc @Penn my lab was primarily interested in studying brains. Each image associates grayscale values ( in the simplest case ) to discrete coordinates $(i,j,k)$, which describe voxel locations. And for reasons best explained by NifTI FAQ it’s useful/important to align the acquired image to some other coordinate system. This alignment is stored in the image header, as a rigid motion plus an offset. Here’s a bit of NifTI documentation motivating the need to keep the alignment.

This method can also be used to represent "aligned"
coordinates, which would typically result from some post-acquisition
alignment of the volume to a standard orientation (e.g., the same
subject on another day, or a rigid rotation to true anatomical
orientation from the tilted position of the subject in the scanner).

The NifTI standard allows for orientation reversing transforms, but in this post I focus on proper rotations. These rigid motions must preserve volume and orientation; rigidity necessitates linearity. Being a geometric property, volume is preserved when this linear transformation is an isometry; which is to say, for some matrix $A$, $A A^T = I$. Moreover, orientation is preserved when $\det(A) > 0$. The intersection of all these requirements is the group of 3D rotations, usually referred to as the special orthogonal group,
... read more

While learning about Voronoi diagrams from Computational Geometry, I had trouble justifying a single statement from a proof regarding the complexity ( as a function of $n$ sites ) of Voronoi diagrams. For completeness I’ll restate the theorem here.

Theorem. For $n \geq 3$, the number of vertices in the Voronoi diagram of a set of $n$ point sites in the plane is at most $2n-5$ and the number of edges is at most $3n-6$.

To establish the stated bounds, the proof hinges on the constancy of the Euler’s characteristic of graphs (that embed in $S^2$) and every vertex in the Voronoi diagram has degree at least three. Really? At the point in the proof when this fact is invoked we know that all the $n$ sites are not colinear; so given any pair of sites there’s a third site that does not lie on the line passing through them. Three points induce a vertex. But why?? Voronoi vertices have a specific meaning/interpretation; they are equidistant from sites, which means that a Voronoi vertex is the center of circle that passes through at least three sites.
... read more

In a post a few years ago I discussed primality testing. For some reason I thought I described the basic prime number sieve (I’m pretty sure there is only one) there, but apparently I didn’t. No worries. The prime number sieve solves a different problem; that of generating a list of the first $N$ primes. One could use a primality testing method to do this as well; namely filter a list of numbers with the primality test as a predicate; but sieve based methods are faster and conceptually much easier.

The prime number sieve iteratively filters out all the composite numbers, with the caveat that the leading number in the input is a prime. For \(\{ 2,3,4,5,6,7,8,9,10 \}\), we put aside $2$, the smallest number, and filter out all multiples of $2$; then put aside $3$ and filter out multiples of $3$, and so on. This process, shown below as a decreasing sequence, terminates with a set of primes.
... read more

While trying to grok $SO(3) \cong \mathbb{RP}^3$ and understand various parametrizations of $SO(3)$, I wandered a bit (okay maybe more that litte:) ) and started to think about projective space itself. It is well know that real projective spaces alternate between being orientable and non-orientable, as dimension increases. Specifically odd dimensional projective spaces are orientable but even ones are not. For example, $\mathbb{RP}^1$ and $\mathbb{RP}^3$ are orientable but $\mathbb{RP}^2$ is not.

Jeffery Week’s Shape of Space gives a Flatland inspired explanation of orientations on the projective plane and projective 3-space. It’s a fantastic book and a wonderful example of storytelling as a teaching method. In this note, I want to invoke Week’s style and describe orientations on real projective spaces of any dimension.
... read more