Saturday, September 5, 2015

Distributing Points on Spheres and other Manifolds

One of the classic problems of random-number generation and generally representing probability distributions is the problem of uniform distribution of points on a ($2$-)sphere (we, of course, clarify the dimension, having gone on too many extradimensional journeys lately—but we'll quickly see that these methods are good for those cases too!). Namely, how does one pick points at random on a sphere such that every spot on it is equally likely? Naïvely, one tries to jump to latitude and longitude. But this is because we are trained to think of the sphere in terms of parametrizations (yes, we like those, obviously, but they are only a means of representation but not the be-all and end-all of geometric objects), and not in terms of quantities defined directly on the sphere itself. Uniform for points directly on the sphere need not correspond to uniform with respect to some real parameters defining those points. After all, the whole point of parametrization is to distort very simple domains into something more complicated. What will happen, for example, with latitude and longitude, if one distributes points uniformly in latitude, i.e., the interval $[-\pi/2,\pi/2]$ and then also distributes points uniformly in longitude, the interval $[-\pi,\pi]$, then, after mapping to the corresponding points on a sphere, one will get points concentrated near the poles:

 Sphere with a distribution of 2000 points, uniform in latitude and longitude, generated by MATLAB. Notice it clusters at the poles. (Here the north pole of the sphere is tipped slightly out of the plane of the page so we can see the points getting denser there. Click to enlarge.)

This occurs essentially because, for example, at different latitudes, a degree of longitude can be a very different physical distance (~70 miles at the equator and goes to zero at the poles): the numerical correspondence does not match up to other more relevant physical measures. In more fancy-schmancy speak, uniformity in latitude and longitude (or the equivalent spherical $\varphi$ and $\theta$) means uniformly in some imaginary rectangle that we deform to a sphere. Actually, without much work, we can already intuitively see what we have to do to achieve uniformity on the sphere in terms of those coordinates: make the distribution thin out when the latitude indicates we're near the poles, i.e., we want the distribution of points in the parameter rectangle to look like this:

 Nonuniform distribution of points on the parameter rectangle $(\theta,\varphi)$. Notice the points get more sparse at the top and bottom, corresponding to $\varphi$ at its extreme values.
How do we express this in formulas? We need to take a look at the area element. The naïve uniform distribution gives $\frac{1}{2\pi^2}d\varphi\; d\theta$, $\frac{1}{2\pi^2}$ times the area element of the rectangle $[0,\pi]\times[0,2\pi]$, where the $1/2\pi^2$ is there to make the integral come out to $1$, as required for all probability densities. But anyone who has spent a minute in a multivariable calculus class probably got it drilled into their heads that the area element of the sphere, namely, that which measures the area of a piece of sphere of radius $r$, is $r^2 \sin \varphi\; d\varphi\; d\theta$ (or $r^2\sin \theta\; d\theta\; d\phi$ if you're using the physicists' convention… that's a whole 'nother headache right there). We'll take $r = 1$ here, and of course, we have to now multiply by $\frac{1}{4\pi}$ to make the density integrate to $1$. It is reasonable that actual physical area would correspond to an actual uniform distribution, because the area of some plot of land on a sphere will be the same no matter how the sphere is rotated: to distribute one sample point per square inch everywhere on a sphere would indeed truly give a uniform distribution. So this gives us the solution: we can still uniformly distribute in "longitude" $\theta \in [0,2\pi)$ (which accounts for the factor of $\frac{1}{2\pi} d\theta$), since the area element doesn't have any functional dependence on $\theta$ other than $d\theta$. For the $\varphi$, however, we need to weight the density, in such a manner that if we take $d\varphi$ for the moment to be the classical interpretation as an infinitesimal or just small increment in $\varphi$, the proportion of points landing between $\varphi$ and $\varphi+ d\varphi$ is approximately $\frac{1}{2}\sin\varphi \;d\varphi$ (the $\frac{1}{2}$ comes from removing the previous factor of $1/2\pi$ from the total $1/4\pi$). But that, we recognize, as $\frac{1}{2}d(-\cos \varphi)$, so if we define a new variable $u = -\cos \varphi$, then our area form is $\frac{1}{2} du\; \frac{1}{2\pi} d\theta$. Thus if we uniformly distribute $u$ in $[-1,1]$ (an interval of length $2$, so justifying the $\frac{1}{2}$), and only then calculate $\varphi = \cos^{-1}(-u)$ (and uniformly distribute $\theta$ in $[0,2\pi]$ as before), we do in fact get a uniform distribution on the sphere.

 Sphere with a uniform distribution of 2000 points (click to enlarge), generated by MATLAB

So What's the More General Thing Here?

As veteran readers might have expected, this situation is not as specific as it may seem. Transforming probability density functions (pdfs), even just on a line, sometimes seems very mysterious, because it doesn't work the same way as other coordinate changes for functions one often encounters. It is not enough to simply evaluate the pdf at a new point (corresponding to the composition of the pdf with the coordinate change). Instead what is reported in most books, and sometimes proved using the Change of Variables theorem, is some funny formula involving a lot of Jacobians. But this is the real reason: it's because the natural object associated to probability densities is a top-dimensional differential form (actually, a differential pseudoform, which I have spent some time advocating, for the simple reason that modeling things in a geometrically correct manner really clarifies things—just think, for example, how one may become confused by a right-hand rule?). A way to view such objects, besides as a function times a standardized "volume" form, is a swarm, which is precisely one of these distributions of points (although they don't have to be random, generating them this way is an excellent way to get a handle on it).

If we have some probability distribution on our space, and near some point it is specified (parametrized) by variables $x = (x_1, x_2,\dots,x_n)$, then the pdf should (locally) look like $\rho(x_1,x_2,\dots,x_n) dx_1 dx_2\dots dx_n$. Most probability texts, of course, just consider the $\rho$ part without the differential $n$-form part $dx_1\dots dx_n$. But when transforming coordinates to $y$ such that $x = f(y)$, then the standard sources simply state that $\rho$ in the new coordinates is $\rho(f(y)) \left|\det \left(\frac{\partial f}{\partial y}\right)\right|$. But really, it's because the function part $\rho(x)$ becomes $\rho(f(y))$ as a function usually would, and the $dx_1 \dots dx_n$ becomes $\left|\det \left(\frac{\partial f}{\partial y}\right) \right| dy_1\dots dy_n$. In total, we have the transformation law
$\rho(x_1,\dots,x_n) dx_1 \dots dx_n=\rho(f(y_1,\dots,y_n)) \left|\det \left(\frac{\partial f}{\partial y}\right) \right| dy_1\dots dy_n.$
The usual interpretation of the $n$-form $\rho(x) dx_1 \dots dx_n$ is simply that, when integrated over some region of space, gives the total quantity of whatever such a form is trying to measure (volume, mass, electric charge, or here, probability).

Another Curious Example

One perennial interesting example is the computation of two independent, normally distributed random variables (with mean $0$ and variance $\sigma^2$), and considering the distribution of their polar coordinates. For one variable, the normal distribution is, using our fancy-schmancy form notation,
$\rho(x)\; dx = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-x^2/2\sigma^2} dx$
From the definition of independence, two normally distributed variables have a density obtained by multiplying them together:
$\omega = \rho(x)\rho(y) \; dx\;dy = \frac{1}{2\pi\sigma^2} e^{-(x^2 + y^2)/2\sigma^2} dx\;dy.$
So if we want the distribution in polar coordinates, where $x = r \cos \theta$ and $y = r \sin \theta$, this means we transform the form part as the usual $dx\;dy = r\; dr\; d\theta$, and write the density part as $\frac{1}{2\pi\sigma^2}e^{-r^2/2\sigma^2}$. In total,
$\omega = \frac{1}{2\pi\sigma^2} r e^{-r^2/2\sigma^2} dr \; d\theta$
This factors back out into two independent pdfs (i.e., $r$ and $\theta$ are independent random variables) as
$\omega = \left(\frac{1}{\sigma^2} r e^{-r^2/2\sigma^2} dr\right) \left( \frac{1}{2\pi} d\theta\right)$
which means we can take $\theta$ and uniformly distribute it in $[0,2\pi]$ as before. To get $r$, we can proceed by two ways. First, we could imitate what we did earlier with the sphere: use integration by substitution or the Chain Rule, to deduce that $\frac{1}{\sigma^2} r e^{-r^2/2\sigma^2} dr= d\left(- e^{-r^2/2\sigma^2}\right)$, and thus, making the substitution $v = - e^{-r^2/2\sigma^2}$, we now have $\omega = \frac{1}{2\pi} dv\; d\theta$. The range of $v$ has to be in $[-1,0)$, since $e^{-r^2/2\sigma^2}$ goes to zero as $r \to \infty$ and has a maximum at $1$. With this, we uniformly distribute $v \in [-1,0)$ and invert the function:
$r=\sqrt{-2\sigma^2 \log (-v)}.$

Alternatively, we can substitute $u = r^2/2\sigma^2$ (and the inverse, $r = \sqrt{2\sigma^2 u}$). Then $du = (r/\sigma^2) dr$, and $\frac{1}{\sigma^2} re^{-r^2/2\sigma^2} dr = e^{-u} du$, with $u \geq 0$. This means $u$ is distributed as an exponential random variable with parameter $\lambda = 1$. However, this is often computed using uniform $[0,1]$ variables as well, for which we simply end up using something similar to the above method. Still, however, it gives another way of understanding the distributions: it certainly is very interesting that two normally distributed random variables becomes, via such an ordinary transformation such as polar coordinates, (the square root of) an exponentially distributed random variable and a uniform random variable.

What about three independent normally distributed variables (with mean zero and all the same variance)? It turns out the $r$ coordinate is not so easy (it's a gamma distribution), but the $\varphi$ and $\theta$ variables yield a uniform distribution on the sphere! However, it shouldn't be too surprising: given three independent such variables, there should be no bias on their directionality in $3$-space. We can also turn this around to give ourselves an alternate method of uniformly distributing points on a sphere (since normal random variables are very easy to generate): take three such variables, consider it a point in $\mathbb R^3$, and divide by its magnitude (I thank Donald Knuth's Art of Computer Programming, Volume 2 for this trick; it has an excellent discussion on random number generation).