Monday, September 21, 2015

The Configuration Space of All Lines In the Plane

Time for another classic post from the old site, while I'm at it… As we have seen in a bunch of previous posts, the notion of configuration space has held a prominent place in my mathematical explorations, because, I can't emphasize enough, it's not just the geometry of things that you can directly see that are important; it's the use of geometric methods to model many things that is.

Consider all possible straight lines in the plane (by straight line I mean those that are infinitely long). This collection is a configuration space—a catalogue of these lines. What does that mean? Of course, in short words, it's a manifold, and we've talked about many of those before. But it always helps to examine examples in detail to help develop familiarity. Being a manifold means we can find local parametrizations of our collection by $n$-tuples of real numbers. How do we do this for lines in the plane?

Finding Some Coordinates: The Slope-Intercept Equation

First off, almost anyone reading this, regardless of mathematical background, probably was drilled at one time or another on the following famous equation, called the slope-intercept equation:
$y = mx + b.$ The variable m if you recall is the slope of a line, and b is the y-intercept which means where the line crosses the y-axis. However few people realize that what they're doing, really, is indexing every non-vertical line in the plane with some point in another plane (called, say, the mb-plane rather than the old traditional xy-plane). That is, we have a correspondence between points in one abstract mb-plane to the lines in the xy-plane, where m is the slope, and b the y-intercept. It doesn't get all lines, though—vertical lines have infinite slope, and you can't have a coordinate on any plane with an infinite value. That is, you have charted out the space of all possible nonvertical lines with points in a different plane.

Of course how would we catch the vertical lines? We chart it out using different coordinates. There's the possibility of using inverse slope and x-intercept, which merely means we write the equation using x as a function of y. In other words, all non-horizontal lines are given by:
$x=ky + c.$ It's essentially obtained by reflecting everything about the diagonal line x = y and finding the usual slope and intercept. In other words we can chart out all non-horizontal lines on this new kc-plane. So you could represent the collection of all lines in the plane by having two charts... an atlas or catalogue of all lines, with these two sheets, the mb-plane and the kc-plane.

However, note that most lines in the plane—ones that are neither vertical nor horizontal, can be represented in both forms. That is, they correspond to points on both sheets. So, we would say in our usual terminology that these two sheets, or charts, determine a manifold of all possible lines; we just need to check that the overlap is smooth. But let's stop to think about what that means. We can think of it as trying to glue together both sheets to form just a single catalogue. The method of gluing is that we glue together the points that represent the same line in the plane. There is a nice, exact formula for the gluing, which is very simply determined: $(m,b)$ and $(k,c)$ represent the same line if and only if $y = mx + b$ and $x = ky + c$ are equations of the same line. All we have to do to convert from one to the other is solve for x in terms of y, that is, invert the functions. So we solve
$y = mx + b \iff y-b = mx \iff y/m - b/m = x \iff x = (1/m) y - b/m$ that is, if $k = 1/m$ and $c = -b/m$, then $(m,b)$ and $(k,c)$ represent the same line in the two sheets. So to "glue" the sheets together, we glue every $(m,b)$ in the mb-plane to $(1/m,-m/b)$ in the kc-plane. Obviously the sheets need to be made of some very stretchable material, because it is going to be awfully hard to glue points together. Actually it's pretty hard to physically do this so don't try this at home, but just try to imagine it (don't you just love thought experiments?). For example, points in the mb-plane $(1,1)$, $(2,2)$, $(3,3)$, and $(4,4)$ get glued to the corresponding points $(1,-1)$, $(1/2,-1)$, $(1/3,-1)$, and $(1/4,-1)$. You glue them in a very weird way, but if you suppose for a moment, that you allow all sorts of moving, rotating, shrinking, stretching in this process (topological deformations), but without tearing, creasing, or collapsing, you can preseve the "shape" of this space, and yet make it look like something more familiar. This would be our new catalogue of lines. In addition, the catalogue has an additional property: nearby points in the catalogue translate to very similar-looking lines.

So, What Is It?

One should wonder what kind of overall "shape" our nice spiffy catalogue has, after gluing together the two possible charts we've made for it. As it turns out, its shape is the Möbius strip! That's right, the classic one-sided surface (without, as it turns out, its circle-boundary).

That is to say, if you give me a point on the Möbius strip, it specifies one and only one line in the plane. One would not, initially, be able see why non-orientability enters the picture. But a little interpretation is in order. First, if we take a particular line and rotate it through 180 degrees, we get the same line back. Everything in between gives every possible (finite) slope. It so happens that as far as slopes of lines is concerned, $\infty=-\infty$, and if you go "past" this single projective infinity, as they call it, you go to negative slopes. In other words, if you start on a journey on rotating a line through 180 degrees, from vertical back to vertical, you come back to the same line, except with orientation reversed (because what started out as pointing up now points down).

If you fix an origin and declare that it correspond to a certain special line in the plane, and then select a "core circle" for the Möbius strip, then as you travel around this circle, the distance traveled represents rotation angle for this special line. Traveling from the origin along the core circle and making one full loop should correspond to rotating the special line by 180 degrees. If you instead move up or down on the core circle, you instead end up sliding the line along a perpendicular direction, without changing its angle. So moving up and down the strip corresponds to parallel sliding of lines, and moving around the strip along a circle corresponds to rotating a line.

The Derivation

The specific formula we use is $F(m,b) = \left(\cot^{-1} m, \frac{b}{\sqrt{m^2 + 1}}\right),$ which sends the line to the angle it makes with the $y$-axis, and its signed perpendicular distance to the origin (the sign is determined by $b$). For the other chart, $F(k,c) = \begin{cases}\left( \cot^{-1} \left( \frac{1}{k}\right), \dfrac{c}{\operatorname{sgn}(-k)\sqrt{k^2+1}}\right) \quad & \text{ if } k \neq 0 \\ (0,-c) & \quad \text{ if } k = 0 \end{cases}$ We'll show how we got this in an update to this post, or perhaps a "Part 2." This can be readily checked by simply substituting the transition charts for $(m,b)$ and $(k,c)$. However, the extra case for $k=0$ here is simply gotten by taking the limit as $k$ goes to zero from above in the other case. What proves that it is a Möbius strip is that, if we take the limit as $k$ goes to zero from below, it will approach $(\pi,c)$ instead of $(0,-c)$. This would make it discontinuous, unless we decide to identify $(\pi,c)$ with $(0,-c)$: the $c$ going to $-c$ means we take the strip at $\pi$ and flip it around to glue it to the strip at $0$ (see this post for another example of defining a Möbius strip this way). Technically, we need an infinitely wide Möbius strip for this, but we can always scrunch it down into a finite-width strip without its boundary circle (using something like arctangent). It's just that the closer you get to the edge, the quicker things go off to infinity.

The animation is an example "path" through "line space," The blue dot travels around the white circle, and the line in the plane that corresponds to it is the blue line. The red line is a reference line perpendicular to the blue one, and always passes through the origin. The distance the blue dot from the core circle (in turquoise) indicates how far from the origin that the blue and red lines intersect. Because of the "scrunching down," though, the closer to the edge of the strip we get, the more dramatic the change in distance the blue line is from the origin. Here it is again in the plane with the Möbius strip as a picture-in-picture reference:

The more general object here we are describing is closely related to the notion of Grassmannian manifold, which are all $k$-dimensional subspaces in an $n$-dimensional vector space (the only difference is that Grassmannians only consider spaces through the origin).

Friday, September 18, 2015

Two Classic Clifford Tori Animations

After much rummaging around my hard drive, I finally found some Clifford tori animations from my old site that clearly give a much better sense of how the (stereographically projected) tori change as the angle $\varphi$ changes from $0$ to $\pi/2$ (in the notation of the last 2 posts on this subject). Here, we've used the Clifford circles to see its effect on them as well.

It starts off with the first degenerate case of one single unit circle, and expands from there. We see that it eventually comes very close to the other degenerate case, that of the straight line, the $z$-axis.

Our next video requires more explanation. This time, we take one particular torus, namely the one of identical radii $\frac{1}{\sqrt{2}}$ (in the video, these two particular circles are highlighted red and blue). Now, a $3$-sphere, like any sphere, can be rotated (by a matrix, or a whole path of matrices, in $SO(4)$). Thus, of course, such a rotation can always be realized as a rigid motion of the ambient $4$-space containing this $3$-sphere. It is possible to continuously rotate it so that the torus within has beginning and ending configurations that look the same, except that the red and blue circles have been swapped. If we restrict ourselves to $3$-space, such a rigid motion is impossible, but if we allow ourselves to let the torus pass through itself, then it, too, can be done. However, visualizing the $3$-sphere version in stereographic projection, with a $4$-space rotation, we effectively allow ourselves to distort distances (actually the $4$-space distance is not distorted; the distortion we see is an artifact of the stereographic projection), and add a "point at infinity," so a continuous rotation is allowed to take things through that point. The rotation of the ambient $3$-sphere does not preserve our usual set of nested tori, as can be seen by letting a matrix in $SO(4)$ act directly on the coordinates of our parametrization: it jumbles up all the components. So, of course, the torus undergoes a completely different kind of motion than in our previous "expander" video.

What happens is we inflate our inner tube, so a part of it gets puffed up to infinity, and wraps back around, turning the torus inside-out. In fact, after wrapping back around, we're "inflating" the outside of the torus. Or equivalently, getting back to donuts with frosting, the dough gets bigger and bigger, and when wrapping back around, almost all of space (plus a point at infinity) is dough, and the frosting bounds an inner-tube-shaped pocket of air.

Anyway, the full turning inside-out (which also swaps the red and blue circles, as promised) occurs exactly halfway through the movie (the rotation continues to restore the torus to its original state in the second half). Notice how the stripes on the torus which started out horizontal now are vertical, and what used to be the "apple core" shape which surrounds the donut hole now has become a "donut segment." Plus it just looks totally awesome!

Friday, September 11, 2015

Clifford Circles

 30 Clifford circles with with $\varphi = \pi/8$ and $\theta_0$ ranging from $0$ to $2\pi$
A discussion about Clifford tori would never be complete without a corresponding discussion about Clifford circles. These were featured as the logo of the UCSD math site for many years (not the case anymore, though, but I saved a screenshot!):

Just as the Clifford tori foliate $S^3$, the Clifford circles foliate each of the Clifford tori. As part of ongoing efforts to revisit algebraic topology, this example is one of the best to explore. We begin with classical mapping, called the Hopf fibration, which maps $S^3$ to $S^2$ by
$p\begin{pmatrix}x_1\\x_2\\x_3\\ x_4\end{pmatrix} =\begin{pmatrix}2(x_1 x_3 + x_2 x_4)\\ 2(x_2 x_3 -x_1x_4)\\ x_1^2 + x_2^2 -x_3^2 -x_4^2\end{pmatrix}.$
In fancy-schmancy homotopy-theory speak, $p$ is a generator of $\pi_3(S^2)$. Considering our previous parametrization $F$ from last time:
$F\begin{pmatrix}\varphi \\ \alpha \\ \beta\end{pmatrix} = \begin{pmatrix} \cos \alpha \cos \varphi \\ \sin\alpha\cos\varphi \\ \cos\beta\sin\varphi \\ \sin\beta\sin\varphi \end{pmatrix},$
we recall that the last coordinate of $p$ is simply $A^2 - B^2$, or $\cos^2(\varphi) - \sin^2(\varphi)$. Although I tell my students to not bother memorizing trigonometric identities, we derived that from a $\sin^2 \alpha + \cos^2 \alpha = 1$ and $\sin^2 \beta + \cos^2 \beta = 1$. We can further simplify that last coordinate to $\cos(2\varphi)$.

The key property that we want to demonstrate is that each point of $S^2$ corresponds to a whole circle in $S^3$ (for any $\boldsymbol \xi$ in $S^3$, the circle in question is the inverse image $p^{-1}(\boldsymbol\xi)$), and that each of these circles corresponding to distinct $\boldsymbol \xi$, though disjoint, are nevertheless linked together.

To do this, we visit the first two coordinates:
$2(x_1x_3 + x_2x_4) = 2(\cos\alpha \cos\varphi \cos\beta \sin\varphi+ \sin\alpha\cos\varphi \sin\beta\sin\varphi)$ $= 2\sin\varphi\cos\varphi(\cos\alpha\cos\beta +\sin\alpha\sin\beta) = \sin(2\varphi) \cos(\alpha-\beta),$ where the last equation is gotten either by trolling the back of a calculus book for some trig identities, or using complex numbers. Similarly,
$2(x_2x_3 - x_1x_4) = 2(\sin\alpha \cos\varphi \cos\beta \sin\varphi- \cos\alpha\cos\varphi \sin\beta\sin\varphi)$ $= 2\sin\varphi\cos\varphi(\sin\alpha\cos\beta -\cos\alpha\sin\beta) = \sin(2\varphi) \sin(\alpha-\beta).$
All together, we have $p \circ F\begin{pmatrix}\alpha\\ \beta\\ \varphi\end{pmatrix} = \begin{pmatrix}\sin(2\varphi)\sin(\alpha-\beta) \\ \sin(2\varphi)\cos(\alpha-\beta) \\ \cos(2\varphi)\end{pmatrix}.$
Letting $\theta = \alpha-\beta$, we see that this looks almost like our standard parametrization of a $3$-sphere, except that the polar angle $\varphi$ is off by a factor of $2$. No matter, it's still a sphere; we just have to take care to remember that the range of this $\varphi$ is $[0,\pi/2]$ rather than $[0,\pi]$. This confirms, incidentally, that $p$ really maps onto the sphere, rather than merely mapping into some amorphous blob in $\mathbb{R}^3$, as one can only assume at first, because the destination of the map $p$ has $3$ coordinates. The important thing to realize is that given any $\boldsymbol \xi$ in $S^2$, there is a unique $\varphi_0$ in $[0,\pi/2]$ and $\theta_0$ in $[0,2\pi]$ that correspond, under the usual parametrization, to $\boldsymbol \xi$. So, to calculate $p^{-1}(\boldsymbol \xi)$, we have to see how much we can change $\alpha$, $\beta$, and $\varphi$ in order to always get $(\varphi_0,\theta_0)$. By our above computations, this is easy: $\alpha$ and $\beta$ must satisfy $\alpha - \beta = \theta_0$, and $\varphi = \varphi_0$ is already completely determined. So our only degree of freedom here is $\alpha-\beta$: given some point in the fiber $p^{-1}(\boldsymbol \xi)$, if we add the same thing to both $\alpha$ and $\beta$, we will stay in the fiber. This means the fiber has a parametrization that looks like
$\beta \mapsto F\begin{pmatrix}\theta_0 + \beta \\ \beta \\ \varphi_0\end{pmatrix} = \begin{pmatrix} \cos (\theta_0 + \beta) \cos \varphi \\ \sin(\theta_0 + \beta)\cos\varphi \\ \cos\beta\sin\varphi \\ \sin\beta\sin\varphi \end{pmatrix}.$
We finally finish things off by composing with the stereographic projection $P$ as before:
$\beta \mapsto \frac{1}{1-\sin\beta\sin\varphi } \begin{pmatrix}\cos (\theta_0 + \beta) \cos \varphi \\ \sin(\theta_0 + \beta)\cos\varphi \\ \cos\beta\sin\varphi \end{pmatrix}.$
Plotting this out, this gives us the nice circles shown at the start of the post. We'll continue to explore the properties of $p$ and its visualizations as we move along in algebraic topology.

Wednesday, September 9, 2015

Some Cool Views of Projective Space

While finally revisiting one of the geometry books on my shelf, Glen Bredon's Topology and Geometry, I encountered an exercise about showing that the projective space is homeomorphic to the mapping cone of a map that doubles a circle on itself (the complex squaring map $z \mapsto z^2$). The mapping cone has a nice visualization, first as a mapping cylinder, which takes a space $X$ and crosses it with the interval $I$ to form $X \times I$ (thus forming a "cylinder"), and then glues the bottom of it to another space $Y$ using a given continuous map $f : X \to Y$. Finally, to make the cone, it collapses the top to a single point. Of course, this can be visualized as deforming the bottom part of $X \times I$ through whatever contortion $f$ does, which might include self-intersection (and of course, it could be more gradual). So I used a good old friend, parametrizations, to help set up an explicit example. Take a look!

 An immersion of projective space into $\mathbb R^3$. Shown as a mesh to make the self-intersecting portion visible. Looks a little like a molar, although one would hope I take good enough care of my teeth to not have that many holes in it…
 A cutaway view, now as a more solid surface, basically illustrating it now as a mapping cylinder (it is homeomorphic to projective space minus a disk, which is a Möbius strip). Anyone know a good glassblower so we can make vases that look like this?
 A view from the open top, allowing us to see the self-intersecting part of the surface from above
The equation, in "cylindrical coordinates", is $r = (2z+\cos\theta) \sqrt{1-z^2}$ for $0\leq z \leq 1$ (for the closed surface), $0 \leq z \leq 0.97$ (for the cutaway), and $0 \leq \theta \leq$ (what else?) $2\pi$. I say it in scare quotes because technically, it allows negative values of $r$. For fun, though (and to make it totally legit, even if you have qualms about negative radii), we rewrite it as a (Cartesian) parametrization (by substituting for $r$):
$\begin{pmatrix} x \\ y \\ z\end{pmatrix} = \begin{pmatrix} (2u + \cos v)\sqrt{1-u^2} \cos v\\ (2u + \cos v) \sqrt{1-u^2}\sin v \\ u\end{pmatrix}$
with $0\leq u \leq 1$ or $0.97$, and $0 \leq v \leq 2\pi$.

The motivation for this is that the equations $r = a + \cos\theta$, as $a$ varies, goes from a loop traversed once to a loop that folds over itself exactly in a $2:1$ manner:

but now visualized stacked on top of one another. Then to collapse the top, we shrink the diameter by a function that vanishes at $1$ and approaches $0$ at infinite slope to make it smooth (topologically speaking, it is valid to let it collapse to a corner point, though). Flowers, however, are not included (you'll have to fiddle with things like $r = \cos(k\theta)$ if you want that…)

Saturday, September 5, 2015

Distributing Points on Spheres and other Manifolds

One of the classic problems of random-number generation and generally representing probability distributions is the problem of uniform distribution of points on a ($2$-)sphere (we, of course, clarify the dimension, having gone on too many extradimensional journeys lately—but we'll quickly see that these methods are good for those cases too!). Namely, how does one pick points at random on a sphere such that every spot on it is equally likely? Naïvely, one tries to jump to latitude and longitude. But this is because we are trained to think of the sphere in terms of parametrizations (yes, we like those, obviously, but they are only a means of representation but not the be-all and end-all of geometric objects), and not in terms of quantities defined directly on the sphere itself. Uniform for points directly on the sphere need not correspond to uniform with respect to some real parameters defining those points. After all, the whole point of parametrization is to distort very simple domains into something more complicated. What will happen, for example, with latitude and longitude, if one distributes points uniformly in latitude, i.e., the interval $[-\pi/2,\pi/2]$ and then also distributes points uniformly in longitude, the interval $[-\pi,\pi]$, then, after mapping to the corresponding points on a sphere, one will get points concentrated near the poles:

 Sphere with a distribution of 2000 points, uniform in latitude and longitude, generated by MATLAB. Notice it clusters at the poles. (Here the north pole of the sphere is tipped slightly out of the plane of the page so we can see the points getting denser there. Click to enlarge.)

This occurs essentially because, for example, at different latitudes, a degree of longitude can be a very different physical distance (~70 miles at the equator and goes to zero at the poles): the numerical correspondence does not match up to other more relevant physical measures. In more fancy-schmancy speak, uniformity in latitude and longitude (or the equivalent spherical $\varphi$ and $\theta$) means uniformly in some imaginary rectangle that we deform to a sphere. Actually, without much work, we can already intuitively see what we have to do to achieve uniformity on the sphere in terms of those coordinates: make the distribution thin out when the latitude indicates we're near the poles, i.e., we want the distribution of points in the parameter rectangle to look like this:

 Nonuniform distribution of points on the parameter rectangle $(\theta,\varphi)$. Notice the points get more sparse at the top and bottom, corresponding to $\varphi$ at its extreme values.
How do we express this in formulas? We need to take a look at the area element. The naïve uniform distribution gives $\frac{1}{2\pi^2}d\varphi\; d\theta$, $\frac{1}{2\pi^2}$ times the area element of the rectangle $[0,\pi]\times[0,2\pi]$, where the $1/2\pi^2$ is there to make the integral come out to $1$, as required for all probability densities. But anyone who has spent a minute in a multivariable calculus class probably got it drilled into their heads that the area element of the sphere, namely, that which measures the area of a piece of sphere of radius $r$, is $r^2 \sin \varphi\; d\varphi\; d\theta$ (or $r^2\sin \theta\; d\theta\; d\phi$ if you're using the physicists' convention… that's a whole 'nother headache right there). We'll take $r = 1$ here, and of course, we have to now multiply by $\frac{1}{4\pi}$ to make the density integrate to $1$. It is reasonable that actual physical area would correspond to an actual uniform distribution, because the area of some plot of land on a sphere will be the same no matter how the sphere is rotated: to distribute one sample point per square inch everywhere on a sphere would indeed truly give a uniform distribution. So this gives us the solution: we can still uniformly distribute in "longitude" $\theta \in [0,2\pi)$ (which accounts for the factor of $\frac{1}{2\pi} d\theta$), since the area element doesn't have any functional dependence on $\theta$ other than $d\theta$. For the $\varphi$, however, we need to weight the density, in such a manner that if we take $d\varphi$ for the moment to be the classical interpretation as an infinitesimal or just small increment in $\varphi$, the proportion of points landing between $\varphi$ and $\varphi+ d\varphi$ is approximately $\frac{1}{2}\sin\varphi \;d\varphi$ (the $\frac{1}{2}$ comes from removing the previous factor of $1/2\pi$ from the total $1/4\pi$). But that, we recognize, as $\frac{1}{2}d(-\cos \varphi)$, so if we define a new variable $u = -\cos \varphi$, then our area form is $\frac{1}{2} du\; \frac{1}{2\pi} d\theta$. Thus if we uniformly distribute $u$ in $[-1,1]$ (an interval of length $2$, so justifying the $\frac{1}{2}$), and only then calculate $\varphi = \cos^{-1}(-u)$ (and uniformly distribute $\theta$ in $[0,2\pi]$ as before), we do in fact get a uniform distribution on the sphere.

 Sphere with a uniform distribution of 2000 points (click to enlarge), generated by MATLAB

So What's the More General Thing Here?

As veteran readers might have expected, this situation is not as specific as it may seem. Transforming probability density functions (pdfs), even just on a line, sometimes seems very mysterious, because it doesn't work the same way as other coordinate changes for functions one often encounters. It is not enough to simply evaluate the pdf at a new point (corresponding to the composition of the pdf with the coordinate change). Instead what is reported in most books, and sometimes proved using the Change of Variables theorem, is some funny formula involving a lot of Jacobians. But this is the real reason: it's because the natural object associated to probability densities is a top-dimensional differential form (actually, a differential pseudoform, which I have spent some time advocating, for the simple reason that modeling things in a geometrically correct manner really clarifies things—just think, for example, how one may become confused by a right-hand rule?). A way to view such objects, besides as a function times a standardized "volume" form, is a swarm, which is precisely one of these distributions of points (although they don't have to be random, generating them this way is an excellent way to get a handle on it).

If we have some probability distribution on our space, and near some point it is specified (parametrized) by variables $x = (x_1, x_2,\dots,x_n)$, then the pdf should (locally) look like $\rho(x_1,x_2,\dots,x_n) dx_1 dx_2\dots dx_n$. Most probability texts, of course, just consider the $\rho$ part without the differential $n$-form part $dx_1\dots dx_n$. But when transforming coordinates to $y$ such that $x = f(y)$, then the standard sources simply state that $\rho$ in the new coordinates is $\rho(f(y)) \left|\det \left(\frac{\partial f}{\partial y}\right)\right|$. But really, it's because the function part $\rho(x)$ becomes $\rho(f(y))$ as a function usually would, and the $dx_1 \dots dx_n$ becomes $\left|\det \left(\frac{\partial f}{\partial y}\right) \right| dy_1\dots dy_n$. In total, we have the transformation law
$\rho(x_1,\dots,x_n) dx_1 \dots dx_n=\rho(f(y_1,\dots,y_n)) \left|\det \left(\frac{\partial f}{\partial y}\right) \right| dy_1\dots dy_n.$
The usual interpretation of the $n$-form $\rho(x) dx_1 \dots dx_n$ is simply that, when integrated over some region of space, gives the total quantity of whatever such a form is trying to measure (volume, mass, electric charge, or here, probability).

Another Curious Example

One perennial interesting example is the computation of two independent, normally distributed random variables (with mean $0$ and variance $\sigma^2$), and considering the distribution of their polar coordinates. For one variable, the normal distribution is, using our fancy-schmancy form notation,
$\rho(x)\; dx = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-x^2/2\sigma^2} dx$
From the definition of independence, two normally distributed variables have a density obtained by multiplying them together:
$\omega = \rho(x)\rho(y) \; dx\;dy = \frac{1}{2\pi\sigma^2} e^{-(x^2 + y^2)/2\sigma^2} dx\;dy.$
So if we want the distribution in polar coordinates, where $x = r \cos \theta$ and $y = r \sin \theta$, this means we transform the form part as the usual $dx\;dy = r\; dr\; d\theta$, and write the density part as $\frac{1}{2\pi\sigma^2}e^{-r^2/2\sigma^2}$. In total,
$\omega = \frac{1}{2\pi\sigma^2} r e^{-r^2/2\sigma^2} dr \; d\theta$
This factors back out into two independent pdfs (i.e., $r$ and $\theta$ are independent random variables) as
$\omega = \left(\frac{1}{\sigma^2} r e^{-r^2/2\sigma^2} dr\right) \left( \frac{1}{2\pi} d\theta\right)$
which means we can take $\theta$ and uniformly distribute it in $[0,2\pi]$ as before. To get $r$, we can proceed by two ways. First, we could imitate what we did earlier with the sphere: use integration by substitution or the Chain Rule, to deduce that $\frac{1}{\sigma^2} r e^{-r^2/2\sigma^2} dr= d\left(- e^{-r^2/2\sigma^2}\right)$, and thus, making the substitution $v = - e^{-r^2/2\sigma^2}$, we now have $\omega = \frac{1}{2\pi} dv\; d\theta$. The range of $v$ has to be in $[-1,0)$, since $e^{-r^2/2\sigma^2}$ goes to zero as $r \to \infty$ and has a maximum at $1$. With this, we uniformly distribute $v \in [-1,0)$ and invert the function:
$r=\sqrt{-2\sigma^2 \log (-v)}.$

Alternatively, we can substitute $u = r^2/2\sigma^2$ (and the inverse, $r = \sqrt{2\sigma^2 u}$). Then $du = (r/\sigma^2) dr$, and $\frac{1}{\sigma^2} re^{-r^2/2\sigma^2} dr = e^{-u} du$, with $u \geq 0$. This means $u$ is distributed as an exponential random variable with parameter $\lambda = 1$. However, this is often computed using uniform $[0,1]$ variables as well, for which we simply end up using something similar to the above method. Still, however, it gives another way of understanding the distributions: it certainly is very interesting that two normally distributed random variables becomes, via such an ordinary transformation such as polar coordinates, (the square root of) an exponentially distributed random variable and a uniform random variable.

What about three independent normally distributed variables (with mean zero and all the same variance)? It turns out the $r$ coordinate is not so easy (it's a gamma distribution), but the $\varphi$ and $\theta$ variables yield a uniform distribution on the sphere! However, it shouldn't be too surprising: given three independent such variables, there should be no bias on their directionality in $3$-space. We can also turn this around to give ourselves an alternate method of uniformly distributing points on a sphere (since normal random variables are very easy to generate): take three such variables, consider it a point in $\mathbb R^3$, and divide by its magnitude (I thank Donald Knuth's Art of Computer Programming, Volume 2 for this trick; it has an excellent discussion on random number generation).

Tuesday, September 1, 2015

Clifford Tori

As part of the ongoing celebration that is the relaunch, here, we finally give the long-awaited in-depth look at the site's namesake hinted at in the opening post. Perhaps surprisingly in a post that ostensibly is about tori, we begin our story with a different well-known (and hopefully loved) family of spaces: the spheres. The $1$- and $2$-dimensional spheres hardly need introduction, being undoubtely the first curved geometrical objects that one studies, the ordinary circle and sphere (which for mathematicians, only refers to the boundary surface, not the interior, of a ball, so is $2$-dimensional). These shapes of course appear all the time in nature, since they satisfy many optimality properties. Much, of course, has been written about their "perfect form."

Higher-dimensional spheres are easy to define: the set of all points a given distance from a given point. Where, of course, "all points" start out as being in some higher dimensional space, say, n. High-dimensional spheres have interesting applications, such as in statistical mechanics, where the dimension of the state space in question is on the order of $10^{24}$ (every particle gets its own set of 3 dimensions! Again, this is why state space is awesome). Spheres occur in this context because the total energy of the system remains constant, so the sum of the squares of all their momenta has to be constant—namely, the momenta are all a certain "distance" from the origin. As crazy as it may sound, it is, at its root, a description (using Hamiltonian mechanics) of many more than the two particles that we spent a whole 5-part series talking about! But of course, it'd take us a bit far afield to explore this (at least in this post; we should eventually feature some statistical-mechanical calculations here, because it is illustrative of how we can deal with overwhelmingly large-dimensional systems and, rather amazingly, be able to extract useful information from it!).

So let's get back to nearly familiar territory: $n = 3$, the $3$-sphere $S^3$ (the boundary of a ball in 4-dimensional space $\mathbb{R}^4$). Being $3$-dimensional, one would think we can visualize it or experience it viscerally somehow. As noted by Bill Thurston, the key to visualization of $3$-manifolds is to imagine living inside a universe that is shaped like one (this is also elaborated upon, working up from $2$-dimensional examples, by Jeffrey Weeks, with interactive demos). This isn't so straightforward to visualize (literally), because it is a curved 3-dimensional space. There are a number of ways of doing this; they give the essence of $S^3$ by understanding it in terms of some more familiar objects from lower dimensions, such as slicing by hyperplanes, and of course, by tori.

The Clifford Tori as a Foliation of $S^3$

So where do tori (nested or not) come in? Let's get back to the defining formula of $S^3$,
$x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1.$ If we consider groups of $2$ terms, $x_1^2 + x_2^2 = A^2$, and $x_3^2 + x_4^2 = B^2$, we have, $A^2 + B^2 = 1$. But for each fixed $A$ and $B$ satisfying that relation, we get two circles: one for the coordinates $(x_1,x_2)$ and another for the coordiantes $(x_3,x_4)$. We can use this information to help parametrize $S^3$. Let's first ask: what are some familiar things satisfying $A^2+B^2 = 1$? If we let $A = \cos \varphi$ and $B = \sin \varphi$, then $A$ and $B$ will always satisfy this relation for any $\varphi \in \mathbb{R}$: so $\varphi$ is one possible parameter of this system, and the full possible range of $A^2$ and $B^2$ is assumed by letting $\varphi$ vary from $0$ to $\frac \pi 2$. This gives us: $x_1^2 + x_2^2 =\cos^2 \varphi$ and $x_3^2 + x_4^2 = \sin^2 \varphi.$
In other words, as $\varphi$ varies, the two sets of coordinates represent one circle that grows in radius, and another that shrinks, in such a way that the sum of the squares of the two radii are always $1$. This realizes the $3$-sphere as a collection of sets of the form $S^1(A) \times S^1(B) = S^1(\cos\varphi) \times S^1(\sin\varphi)$, the Cartesian product of circles of radius $A$ and $B$. But what is the Cartesian product of two circles? A torus. These tori fill up all of $S^3$ (except for two degenerate cases: two circles, corresponding to the Cartesian product of a unit circle and a single point—a circle of radius $0$). The technical term for this is that they foliate $S^3$ (from the Latin folium for "leaf"). They are called Clifford tori. Hard as it may be to believe, their intrinsic geometry, as inherited from $\mathbb{R}^4$ is flat (although this is not true of their extrinsic geometry). It means that if we have a sheet of paper, we could lay it flat on a Clifford torus in $\mathbb{R}^4$ (you can't do that in $\mathbb{R}^3$ with your garden-variety donut-shaped torus). However, a full study of the geometry of these tori will have to wait for another time. Here, we will be content to visualize them (which, unfortunately, will not preserve that flat geometry we are claiming they have). For the visualization, we use another tool, the stereographic projection. Before we get to that, we finally note that, for each given $\varphi$, each of the circles $S^1(\cos \varphi)$ and $S^1(\sin\varphi)$ can be further parametrized with other angles; we take
$(x_1,x_2) = (\cos \alpha \cos \varphi, \sin \alpha \cos \varphi)$ and $(x_3,x_4) = (\cos \beta \sin \varphi, \sin \beta \sin \varphi),$
where $0\leq \alpha,\beta \leq 2\pi$. So this means we have parametrized $S^3$ by 3 variables, $(\varphi,\alpha,\beta)$ varying over $[0,\pi/2]\times [0,2\pi]\times [0,2\pi]$.

The Stereographic Projection

There's a way to project (almost) the whole $3$-sphere into ordinary $3$-space $\mathbb{R}^3$. To understand how we can project the $3$-sphere (minus a point) into $3$-space, let's look at the analogous problem for the $2$-sphere, first. It so happens that the stereographic projection is extremely useful in that case as well, and is the source for many arguments involving "the point at infinity" in complex analysis. The way it works is to imagine screwing in a light bulb at the top of a sphere resting on a plane, and given a point on a sphere, its shadow cast on the plane by this light is the corresponding plane (shown in the figure below for a line: the dots connected by the blue rays correspond).

 Some corresponding points for the stereographic projection of a circle (here of radius $\frac 1 2$) to a line.
Notice that the closer it gets to the top, the farther out the rays go (thus, the farther out the corresponding point). And the top point falls on a horizontal line, which will never intersect the corresponding plane of projection: it is said to be sent to the "point at infinity". But the thing is, that extra point at infinity is, at least for the plane as we've defined it, truly extra, so the stereographic projection really simply maps the sphere minus one single point to the plane. In formulas, for a sphere of radius $\frac{1}{2}$ centered at $(0,0,\frac{1}{2})$, this is
$\left(\frac{x}{1-z},\frac{y}{1-z}\right).$
Of course, if we want to map the unit (radius 1 and origin-centered) sphere, we have to do a little finessing with an extra transformation, namely, $(x',y',z') = (2x, 2y, 2z-1)$. It so happens that we get the exact same formula back, just a different domain:
$\left(\frac{x}{1-z},\frac{y}{1-z}\right) = \left(\frac{\frac{1}{2}x'}{1-\frac{1}{2}z'-\frac{1}{2}},\frac{\frac{1}{2}y'}{1-\frac{1}{2}z'-\frac{1}{2}}\right) =\left(\frac{x'}{1-z'},\frac{y'}{1-z'}\right).$
For this reason, many people start off with this formula for the unit sphere instead. The picture associated to this actually is nice: it projects rays of light from the top, through the sphere, and onto a plane that slices the sphere exactly in half at its equator. The consequence is that the light hits the corresponding point in the plane first before reaching a point in the lower hemisphere (but, mathematically, we keep the mapping as always from the sphere to the plane, regardless which the light beam would hit first). It should also be noted that the inverse mapping, of course, will be different (we won't be needing the inverse mapping here, but it is also useful in other contexts, and can be derived using elementary, if tedious, means, via messy algebra).

 Stereographic projection of the unit, origin-centered sphere to the plane containing its equator. The projection associates the point $P$ to the point $Q$.

So we generalize this formula: for the unit 3-sphere, given as $x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1$, we have a stereographic projection $P$ of all of its points except the point $(0,0,0,1)$, to $\mathbb{R}^3$, as follows:
$P \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4\end{pmatrix} =\frac 1 {1-x_4}\begin{pmatrix} x_1\\ x_2 \\ x_3\end{pmatrix}.$
In order to visualize our Clifford Tori, then, we recall the parametrization derived above:
$\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4\end{pmatrix} = F\begin{pmatrix}\varphi \\ \alpha \\ \beta\end{pmatrix} = \begin{pmatrix} \cos \alpha \cos \varphi \\ \sin\alpha\cos\varphi \\ \cos\beta\sin\varphi \\ \sin\beta\sin\varphi \end{pmatrix}.$
The two circles that we want are when $\alpha$ and $\beta$ vary, so if we select a few values of $\varphi$ and draw the surface by the parametrization with the remaining variables, we will get different tori for each value of $\varphi$. But, of course, this still gives the torus as being in $\mathbb R^4$. So we compose this with the stereographic projection: for fixed $\varphi$, and letting $\alpha, \beta$ vary, we consider parametrizations
$\Phi \begin{pmatrix} \varphi \\ \alpha \\ \beta \end{pmatrix} = P \circ F\begin{pmatrix} \varphi \\ \alpha \\ \beta \end{pmatrix} = \frac{1}{\sin\beta\sin\varphi} \begin{pmatrix} \cos \alpha \cos \varphi \\ \sin\alpha\cos\varphi \\ \cos\beta\sin\varphi \end{pmatrix}.$

These are the tori given in the opening post (the tori are deliberately shown incomplete, with $\beta$ not going full circle, so that you can see how they change with different $\varphi$ and that they are nested. Note that this transformation does not preserve sizes, so even though we said that "one circle gets bigger as the other gets smaller", this is not what happens in the projection. Think of a sphere with circles of latitude; they grow from one pole to the equator and shrink back down from the equator to the other pole, but their stereographic projections to the plane just keep growing. The other view of this, and the site's logo, is simply a cutaway view of the stereographic projections by one vertical plane. As a note of thanks, bridging years, one of the sources of inspiration for learning about Clifford tori and their visualizations has come from Ivars Peterson's The Mathematical Tourist. (He maintains a blog as well).

We should finally note that this is not the same parametrization as our usual torus of revolution. Parametrizations are not unique. The Clifford tori have many interesting, topologically important properties that will certainly be fodder for future posts.

Saturday, August 29, 2015

Double Pendulum Concluded (Part 5): Discretization

 Path traced out in state space by a double pendulum
In this final part, we discretize the equations for the double pendulum, covered in Parts 1, 2, 3, and 4. As frequently noted, for this problem, we are in favor of symplectic methods, because qualitatively, the algorithms have very good long-time energy behavior, and you don't get nonsense like the pendulum speeding up uncontrollably. And in order to do this, we rephrased in terms of Hamiltonian mechanics in Part 4, from which we recall (setting $\mathbf{q} = (\theta_1,\theta_2)$):
$\dot{\mathbf{q}} = I^{-1} \mathbf{p}$$\dot{\mathbf{p}} = \frac{1}{2} \dot{\boldsymbol{\theta}}^t \frac{\partial I}{\partial \mathbf{q}}\dot{\boldsymbol{\theta}} - \frac{\partial V}{\partial\mathbf q}$
which, for this system, is
$\dot{p}_1 = \frac{1}{2}\dot{\mathbf{q}}^t \begin{pmatrix} 0 & m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) \\ m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) &0\end{pmatrix} \dot{\mathbf{q}} + (m_1 + m_2) g\ell_1 \sin \theta_1$$\dot{p}_2 = \frac{1}{2}\dot{\mathbf{q}}^t \begin{pmatrix} 0 & -m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) \\ -m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) &0\end{pmatrix} \dot{\mathbf{q}} + m_2 g\ell_2 \sin \theta_2,$
with the symmetric moment of inertia
$I = \begin{pmatrix} (m_1 +m_2) \ell_1^2 & m_2\ell_1\ell_2\cos(\theta_2-\theta_1) \\ m_2 \ell_1 \ell_2 \cos(\theta_2 -\theta_1) & m_2 \ell_2^2\end{pmatrix}.$
We use a one-step, first order method to simulate (with time step $h$). The usual thing to do is to consider the derivatives $\dot{\mathbf{q}}$ and $\dot{\mathbf{p}}$ as difference quotients, and solving for the later time step (denoted with the superscript $n+1$) in terms of the earlier (superscript $n$):
$\frac{\mathbf{q}^{n+1} - \mathbf{q}^n}{h} = I^{-1} \mathbf{p}$and$\frac{\mathbf{p}^{n+1} - \mathbf{p}^n}{h} = \frac{1}{2} \mathbf{p}^t I^{-1} \frac{\partial I}{\partial \mathbf{q}} I^{-1} \mathbf{p} - \frac{\partial V}{\partial \mathbf{q}}.$
Now of course, the subtlety is how to evaluate the quantities on the right-hand side; since now we don't assume that we have direct access to $\dot{\mathbf q}$, we have re-expressed it in terms of $\mathbf{p}$ in the second equation. Also, since $\partial I/\partial{\mathbf{q}}$ is some higher-order tensor, the notation becomes very confusing, very fast. If we write
$B = \frac{\partial I}{\partial \theta_1}$ then we find $\partial I/\partial \theta_2 = -B$, since, as we saw, $I$ only depends on $\theta_2 -\theta_1$. Therefore, we rewrite it as
$\frac{\mathbf{p}^{n+1} - \mathbf{p}^n}{h} = \frac{1}{2}\begin{pmatrix} \mathbf{p}^t B \mathbf{p} \\ -\mathbf{p}^t B \mathbf{p} \end{pmatrix}- \frac{\partial V}{\partial \mathbf{q}}.$
The Euler-B method uses $\mathbf{p}^{n+1}$ and $\mathbf{q}^n$ on on the right-hand side (we choose it instead of Euler-A, which would use $\mathbf{q}^{n+1}$ and $\mathbf{p}^n$, because the moment-of-inertia tensor $I$ depends nonlinearly on $\mathbf{q}$ and makes the resultant equations harder to solve—we'll see where that comes in, in a little bit).

Once we solve for $\mathbf{p}^{n+1}$, then it is clear how to solve for $\mathbf{q}^{n+1}$ (noting we evaluate $I$ at $\mathbf{q}^n$):
$\mathbf{q}^{n+1} = \mathbf{q}^n + h I^{-1} \mathbf{p}^{n+1}.$
Now $\mathbf{p}^{n+1}$ is harder:
$\mathbf{p}^{n+1} = \mathbf{p} ^n + h \left( \frac{1}{2} \begin{pmatrix} (\mathbf{p}^{n+1})^t B \mathbf{p}^{n+1} \\-(\mathbf{p}^{n+1})^t B \mathbf{p}^{n+1}\end{pmatrix}- \frac{\partial V}{\partial \mathbf{q}}\right)$
where, again, the $B$ and $\partial V/\partial{\mathbf{q}}$ are evaluated at $\mathbf{q}^n$. The trouble is, of course, that $\mathbf{p}^{n+1}$ appears on both sides, and it's a nonlinear function of $\mathbf{p}^{n+1}$ (though only quadratic nonlinearity, but nonlinearity, nonetheless). How do we solve this? By...

Using Newton's Method.

To clear the clutter, we let $\mathbf{x}$ be the unknown, namely $\mathbf{p}^{n+1}$, and simply write $\mathbf{p} = \mathbf{p}^n$ as known given data. Thus, bringing all of that junk to one side of the equation, we want to solve $F(\mathbf{x}) = 0$ where $F$ is:
$F(\mathbf{x}) = \mathbf{x} - \mathbf{p} - h\left( \frac{1}{2} \mathbf{x}^t B \mathbf{x}\begin{pmatrix} 1 \\ - 1\end{pmatrix}- \frac{\partial V}{\partial \mathbf{q}}\right).$

Newton's method is to consider the iteration $\mathbf{x}_{m+1} = \mathbf{x}_m - F'(\mathbf{x}_m)^{-1}F(\mathbf{x}_m)$. Now
$F'(\mathbf{x}) = \mathrm{Id} + h \begin{pmatrix} -\mathbf{x}^t B \\ \mathbf{x}^t B \end{pmatrix}$
where $\mathrm{Id}$ is the identity matrix. This, for small enough $h$, is certainly invertible. In fact, the Newton iterations tend to be stable simply because the presence of an identity matrix with small $h$ makes the problem well-conditioned. And as a starting point for Newton's method, we can use $\mathbf{x}_0 = \mathbf{p} = \mathbf{p}^n$, the value of $\mathbf{p}$ at the current timestep; it should be reasonably close, because it is the smooth solution to differential equation defined by a smooth vector field. Finally, we should stop the Newton iterations when the residual $F(\mathbf{x}_m)$ falls within a small tolerance, say, $10^{-7}$; we then simply accept $\mathbf{x}_m$ as the new value $\mathbf{p}^{n+1}$. Here's how to say that in MATLAB:

   A = [ mm*r1^2, m2*r1*r2*cos(q(1)-q(2));
        m2*r1*r2*cos(q(1)-q(2)), m2*r2^2]; % matrix I
   C = [0 , -m2*r1*r2*sin(q(1)-q(2)) ;
         -m2*r1*r2*sin(q(1)-q(2)), 0];     % matrix dI/dq
   B = -A\(C/A);                           % derivative of inverse matrix
                                           % -I^-1 dI/dq I^-1
   dVdq = [mm*g*r1*sin(q(1)); m2*g*r2*sin(q(2))];

   x = p;          % initialize Newton iteration
   F = [1.0; 1.0]; % set initially to something that'll give a big norm
   iter = 0;
   while (norm(F) > 1.0e-7 && iter <= itmax)
      F = x - p + h *(1/2* [x'*B*x; -x'*B*x] + dVdq); % current F
      DF = eye(2) + h*([x'*B; -x'*B]);                % Jacobian
      x = x - DF\F;                                   % update iteration
      iter = iter+1;
   end
   p = x;            % update p
   q = q + h* (A\p); % update q, using updated p

Of course, we should save up the values of $q$ as a history if we want to see where it's been in state space. Here's a video of just that: we save the values of $q$ and plot them as a trail following the current point. But in this time we show the angles as being on the abstract phase space, a torus, rather than as the actual angular displacements of a pendulum (as shown in Part 1):

The picture that opened this post is simply the final frame of the movie, showing everywhere in state space where the pendulum has been. It is this abstract representation of a physical system that leads to intriguing visualizations! The idea is, hopefully, in some other physical system, the state space of the variables might more clearly point to interesting features of the system. We shall frequently keep this philosophy in mind in our posts!

Wednesday, August 26, 2015

Hamiltonian Mechanics is Awesome (Double Pendulum, Part 4)

In this post, we work toward discretizing the equations for (and simulating) the double pendulum, considered in Parts 1, 2, and 3 of this series. It turns out that one very important qualitative behavior that we want is for the total energy to be conserved (one simulation I ran of this, directly discretizing the equations for $\ddot{\boldsymbol\theta}$, eventually ran amok over time, getting faster and faster, clearly something that would not occur in a real system). Thus, one place to look is symplectic methods, which we've remarked on before. However, at least with the symplectic methods I've learned, they seem more conceptually geared toward solving differential equations involving angular position and momentum, rather than angular position and velocity. This is not strictly true, as I have seen symplectic methods at work with position and velocity. However, this might be an artifact of the relation between velocity and momentum often being very trivial (e.g., just multiplying by some mass). In any case, the position-and-momentum viewpoint is more than important enough to be worth diving into now. This is referred to as Hamiltonian mechanics. We look at $H = T+V$ instead of $L = T-V$ and consider our variables as generalized position and momentum instead of generalized position and velocity (more generally, Hamiltonian mechanics can be derived from Lagrangian mechanics via the Legendre transform).

In fact, most basic physics courses tend to focus on the sum of kinetic and potential energies, saying energy is "converted" between these two forms. They just never used the word "Hamiltonian." It's not clear, at first, what the difference between the two approaches is, especially with simple examples where it's pretty trivial to go from one to the other. In mathematical or veteran Nested Tori reader terms, the difference is that of being equations on the tangent bundle vs. the cotangent bundle (called phase space). Momentum takes a life of its own in more complicated systems (and in fact, in quantum mechanics, it's not clear what something can even be the velocity of, so something called "momentum" crops up as its own fundamental quantity without reference to the velocity of anything).

So, what is this mystical momentum for our system? We've already calculated something which we called "angular momentum" for the double pendulum: $\mathbf{p} = \partial L/\partial \dot{\boldsymbol \theta}$, the partial "gradient" vector with respect to the angular velocities. This is what it is generally. (As a check, we note for a single particle of mass $m$ and kinetic energy $\frac 1 2 m v^2$, $\partial L/\partial \dot{\mathbf{x}}=\partial L/\partial {\mathbf{v}}$ is indeed the usual $m\mathbf{v}$.) For our example, we derived
$\mathbf{p} = \frac{\partial L}{\partial \dot{\boldsymbol{ \theta}}} = \begin{pmatrix} (m_1 +m_2) \ell_1^2 & m_2\ell_1\ell_2\cos(\theta_2-\theta_1) \\ m_2 \ell_1 \ell_2 \cos(\theta_2 -\theta_1) & m_2 \ell_2^2\end{pmatrix} \begin{pmatrix}\dot \theta_1 \\ \dot\theta_2 \end{pmatrix} = I\dot{\boldsymbol{\theta}}$
That's almost something like the case $\mathbf{p} = m\mathbf{v}$ for a single particle, with the rotational analogues of mass and velocity, though the "mass" here is much more complicated (depending on the positions). In particular, as a vector, it doesn't have to point in the same direction as the velocity, unlike in the particle case (that misalignment is also the cause of washing machines thumping with an unbalanced load). With this in mind, we rewrite the kinetic energy in terms of $\mathbf{p}$ as follows: assuming the matrix $I$ is invertible, we write $\dot{\boldsymbol\theta} = I^{-1} \mathbf{p}$, substitute in $T = \frac 1 2 \dot{\boldsymbol{\theta}}^t I \dot{\boldsymbol{\theta}}$, and write $\mathbf{q} = \boldsymbol{\theta}$ (the usual notation), to get
$H(\mathbf{q},\mathbf{p}) = T+V= \frac{1}{2} (\mathbf{p}^t I^{-1}) I (I^{-1} \mathbf{p}) + V(\mathbf q) =\frac{1}{2} \mathbf{p}^t I^{-1} \mathbf{p} + V(\mathbf q).$

Hamilton's Equations

Now that we have derived $H$, the Hamiltonian, in order to relate it to our actual equations of motion, we have Hamilton's equations:
$\dot{\mathbf{q}} = \frac{\partial H}{\partial \mathbf{p}}$ $\dot{\mathbf{p}} = -\frac{\partial H}{\partial \mathbf{q}}$
which are very nice: $\mathbf{p}$ and $\mathbf{q}$ are almost "dual" to each other in some sense. There's just that extra minus sign; this can be remembered by recalling that $H$ includes potential energy and thus differentiating with respect to $\mathbf{q}$ gives the gradient. But force is conventionally given as the negative of the gradient of the potential in physics. We thus find, plugging in the above:
$\dot{\mathbf{q}} = I^{-1} \mathbf{p}$
and
$\dot{\mathbf{p}} = -\frac 1 2 \mathbf{p}^t \frac{\partial I^{-1}}{\partial \mathbf{q}} \mathbf{p} -\frac{\partial V}{\partial \mathbf{q}}.$
The part that really complicates things and makes this different from the simpler examples is that the moment of inertia $I$ is dependent on $\mathbf{q}$, so we can't ignore that term when differentiating with respect to $\mathbf{q}$ (it is often just the $-\partial V/\partial\mathbf{q}$ term by itself). How do we calculate that derivative of $I^{-1}$? We step back down to components:
$\dot{p}_i = -\frac 1 2 \mathbf{p}^t \frac{\partial I^{-1}}{\partial \theta_i} \mathbf{p} -\frac{\partial V}{\partial \theta_i}= +\frac 1 2 \mathbf{p}^t I^{-1}\frac{\partial I}{\partial \theta_i}I^{-1} \mathbf{p} -\frac{\partial V}{\partial \theta_i}$ $= \frac{1}{2} \dot{\mathbf q}^t \frac{\partial I}{\partial \theta_i}\dot{\mathbf q}-\frac{\partial V}{\partial \theta_i}$
where we have employed the identity that the derivative of the inverse of a matrix is always multiplication, by the inverse on both sides, of the derivative of the matrix (and a sign reversal). Let's write out what all of this is. We note that $\partial I/\partial \theta_1 = - \partial I/\partial \theta_2$, because the only way the angles in appear in $I$ is the difference $\theta_2 - \theta_1$. Thus, we have
$\dot {\mathbf q} = I^{-1} \mathbf{p},$
and using that as an intermediate variable (because to write it all out explicitly is a real mess, and we will need to use it when discretizing it, anyway):
$\dot{p}_1 = \frac{1}{2}\dot{\mathbf{q}}^t \begin{pmatrix} 0 & m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) \\ m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) &0\end{pmatrix} \dot{\mathbf{q}} + (m_1 + m_2) g\ell_1 \sin \theta_1$ $\dot{p}_2 = \frac{1}{2}\dot{\mathbf{q}}^t \begin{pmatrix} 0 & -m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) \\ -m_2\ell_1\ell_2 \sin(\theta_2-\theta_1) &0\end{pmatrix} \dot{\mathbf{q}} + m_2 g\ell_2 \sin \theta_2.$
In the final part, we discretize these equations. That'll have to wait for next time.

Monday, August 24, 2015

The Euler-Lagrange equations for the Double Pendulum (Config Spaces, Part 3)

In this post, continuing the explorations of the double pendulum (see Part 1 and Part 2) we concentrate on deriving its equation of motion (the Euler-Lagrange equation). These differential equations are the heart of Lagrangian mechanics, and indeed really what one tries to get to when applying the methods (it's essentially a way of getting Newton's 2nd Law for complicated systems). And why it is awesome. This is, like the previous part, going to be a more technical than visual, but we'll get back to that in Part 4.

Recall from Part 2 that for the double pendulum,
$L(\theta,\dot \theta) = T - V = \frac 1 2 (m_1+m_2) \ell_1^2 \dot{\theta}_1^2 +\frac 1 2 m_2\left( 2 \dot{\theta}_1 \dot{\theta}_2\ell_1 \ell_2 \cos(\theta_2 - \theta_1) + \ell_2^2 \dot{\theta}_2^2\right)$ $+ (m_1+m_2) g \ell_1 \cos \theta_1 + m_2 g \ell_2 \cos \theta_2,$
where, $\theta_1$, $\theta_2$ are the angles from vertical of the two pendulums, $\dot \theta_1$, $\dot \theta_2$ are their angular speeds, $\ell_1$, $\ell_2$ are the length of the rods, and $m_1$, $m_2$ are their masses. This looks very imposing, so let's remark a little on the structure of that equation. In simpler situations, one often sees different squared speed terms, but here we have the interaction of the two parts of the system, in terms with products like $\dot \theta_1 \dot \theta_2$, as well as the $\cos(\theta_2 - \theta_1)$. This distinguishes the system from two completely isolated (uncoupled) pendulums; the Lagrangian already captures the aspect that energy gets transferred back and forth between the two pendulums.

The equations of motion are derived from considering an optimization problem, namely, what paths $\gamma$ through state space connecting two states optimizes the action integral
$\int_a^b L(\gamma(t),\dot\gamma(t)) dt.$
In a process quite similar to what's done in first-year calculus, we "take a derivative and set it to zero", which derives the equations
$\frac{\partial L}{\partial \theta_i} - \frac d {dt} \frac{\partial L}{\partial \dot \theta_i} = 0$
where $\partial L/\partial \dot \theta_i$ means the partial derivative with respect to $\dot \theta_i$ as if it were an independent variable. In fact, one way is to start off with a formula involving two independent variables $\theta$ and $\omega$; only after solving for the extremal path do we actually find $\omega = \dot \theta$ (this is a generalization called the Hamilton-Pontryagin principle).

$\partial L/\partial \dot {\boldsymbol \theta}$, angular momentum, and the moment of inertia

Let's now do the computation $\frac{\partial L}{\partial \dot \theta_1}$ (this quantity actually is the angular momentum of the first pendulum, $p_1$)—treating everything except $\dot \theta_1$ as a constant:
$p_1 = \frac{\partial L}{\partial \dot \theta_1} = (m_1 + m_2) \ell_1^2 \dot \theta_1 + m_2 \ell_1\ell_2 \cos(\theta_2-\theta_1)\dot \theta_2 .$
Note that unlike the more basic examples of angular momentum, this isn't simply dependent only on the (angular) velocity $\dot \theta_1$, but also on $\dot \theta_2$ and nonlinearly on $\theta_1$ and $\theta_2$. Similarly,
$p_2 = \frac{\partial L}{\partial \dot \theta_2} = m_2 \ell_2^2 \dot \theta_2 + m_2\ell_1\ell_2 \cos(\theta_2 - \theta_1) \dot \theta_1.$
This is more compactly expressed as
$\frac{\partial L}{\partial \dot{\boldsymbol{\theta}}}=\begin{pmatrix}p_1\\p_2\end{pmatrix} = \begin{pmatrix} (m_1 +m_2) \ell_1^2 & m_2\ell_1\ell_2\cos(\theta_2-\theta_1) \\ m_2 \ell_1 \ell_2 \cos(\theta_2 -\theta_1) & m_2 \ell_2^2\end{pmatrix} \begin{pmatrix}\dot \theta_1 \\ \dot\theta_2 \end{pmatrix}.$
This helps keep the unruliness under one umbrella. The square matrix in the above will be important, so let's call it $I$ (it is something like a moment of inertia for our system—but it's more complicated, principally because there is an interaction between its two component parts... It's not even a rigid body!). It should also be noted that the kinetic energy is $T = \frac 1 2 \dot{\boldsymbol{\theta}}^t I \dot{\boldsymbol \theta}$, so it still is quadratic in $\dot {\boldsymbol \theta}$, just like good ol' $\frac{1}{2} mv^2$.

The forces, $\partial L/\partial \boldsymbol{\theta}$

Due to the nonlinear dependence of the Lagrangian on the angular variables, the derivatives are trickier. But we forge on, because sticking to just the simplest examples won't help us gain as much insight into how things are used in practice. We have
$\frac{\partial L}{\partial \theta_1} = m_2 \ell_1 \ell_2 \dot \theta_1 \dot \theta_2 \sin(\theta_2-\theta_1) - (m_1 + m_2) g \ell_1 \sin \theta_1$
and
$\frac{\partial L}{\partial \theta_2} = -m_2 \ell_1 \ell_2 \dot \theta_1 \dot \theta_2 \sin(\theta_2-\theta_1) - m_2 g \ell_2 \sin \theta_2.$

Putting it together

Now we take the time derivatives of the momenta we derived above and subtract and set to zero to derive the Euler-Lagrange equations. That's straightfoward calculus; but different from simpler examples in that, of course, the chain rule must be applied to the $\cos(\theta_2 - \theta_1)$, so there will be an extra $\dot \theta_1$ or $\dot \theta_2$ term:
$\frac{d}{dt} \frac{\partial L}{\partial \dot{\boldsymbol{\theta}}} = \begin{pmatrix} 0 & -m_2\ell_1\ell_2\sin(\theta_2-\theta_1)(\dot \theta_2 - \dot \theta_1) \\ -m_2 \ell_1 \ell_2 \sin(\theta_2 -\theta_1)(\dot \theta_2 - \dot \theta_1) & 0\end{pmatrix} \begin{pmatrix}\dot \theta_1 \\ \dot\theta_2 \end{pmatrix}$
$+\begin{pmatrix} (m_1 +m_2) \ell_1^2 & m_2\ell_1\ell_2\cos(\theta_2-\theta_1) \\ m_2 \ell_1 \ell_2 \cos(\theta_2 -\theta_1) & m_2 \ell_2^2\end{pmatrix} \begin{pmatrix}\ddot \theta_1 \\ \ddot\theta_2 \end{pmatrix}.$
The nice thing about it is that when it is subtracted from $\partial L/\partial \boldsymbol \theta$, the cross terms with the $\dot \theta_1 \dot\theta_2$ cancel. Also, the matrix multiplying the $\ddot{\theta}_i$'s is none other than our moment-of-inertia matrix $I$. This finally gives us

$0=\frac{\partial L}{\partial \boldsymbol\theta} - \frac{d}{dt} \frac{\partial L}{\partial \dot{\boldsymbol\theta}} =\begin{pmatrix}-(m_1+m_2) g \ell_1 \sin \theta_1 - m_2 \ell_1\ell_2 \sin(\theta_2-\theta_1)\dot{\theta}_2^2 \\ -m_2 g \ell_2 \sin \theta_2 +m_2 \ell_1\ell_2 \sin(\theta_2-\theta_1)\dot{\theta}_1^2 \end{pmatrix} -I \begin{pmatrix} \ddot \theta_1 \\ \ddot \theta_2 \end{pmatrix}$
as the Euler-Lagrange equations. As the final equation of motion, we bring the $\ddot {\boldsymbol \theta}$ term to the other side:
$I \begin{pmatrix} \ddot \theta_1 \\ \ddot \theta_2 \end{pmatrix} = \begin{pmatrix}-(m_1+m_2) g \ell_1 \sin \theta_1 - m_2 \ell_1\ell_2 \sin(\theta_2-\theta_1)\dot{\theta}_2^2 \\ -m_2 g \ell_2 \sin \theta_2 +m_2 \ell_1\ell_2 \sin(\theta_2-\theta_1)\dot{\theta}_1^2 \end{pmatrix}$
(keep in mind that $I$ varies with $\theta_1$ and $\theta_2$, however). We discretize and solve this next time. For now, simply take note of the interesting nonlinear interactions: in the positions as the sine of the difference, and quadratic nonlinearity in the derivatives (note the reversal: the $\ddot\theta_1$ gets a $\dot \theta_2^2$ and vice versa).

Friday, August 21, 2015

Lagrangian Mechanics is Awesome (Double Pendulum, Part 2)

In this post, we give some calculational details of the double pendulum (introduced in Part 1). This derivation is available in several physics books at the undergraduate (upper division) level, usually fleshed out using Lagrangian mechanics, another relevant subject that is highly useful to conceptualizing these types of problems by putting them in a setting using notions of state spaces. More cool-looking demos (and also a streamlined derivation) are available in David Eberly's excellent book, Game Physics. As a warning, this will be considerably more technical than visual; I post this because I feel that some insight into the workings of these things helps us see where geometry can be useful in contexts that are not just literally pictures. Nevertheless, here's the picture for the upcoming analysis:

 $\ell_i$ are the length of the pendulum rods, $m_i$ are the masses, and $\theta_i$ are their angular displacements from vertical.
The use of Lagrangian mechanics spurs on the development of thinking in terms of state spaces by considering different kinds of coordinates that may be more convenient for the problem at hand. It is easier to think about the angular positions $\theta_1$ and $\theta_2$ of a pendulum than it is to derive it using x- and y-coordinates directly (and for our problem, physical constraints in the x- and y-coordinates, here made by assuming that the rods are rigid, make it somewhat less straightforward to deal with). Of course, we might use the x- and y-coordinates in parts of our derivation, but it's not so easy to solve for them.

Lagrangian mechanics also has us think of things in terms of energy, a quantity whose properties crop up in math a lot. Our goal in this post is to derive the Lagrangian L of the system. is the difference between total kinetic energy and total potential energy V. The total kinetic energy of our system should be the sum of that of the two particles,
$T = \frac 1 2 m_1 v_1^2 + \frac 1 2 m_2 v_2^2,$
and the potential energy (assuming that our pendulums are subject to downward acceleration g) depends only on the height of the two pendulum bobs:
$V = m_1 g y_1 + m_2 g y_2.$
However, if we want to express this in terms of the angular coordinates, we find, via the usual trigonometric arguments, that $y_1 = -\ell_1\cos \theta_1$ and since $y_2$ includes the height of the first pendulum, $y_2 = -\ell_1 \cos\theta_1 - \ell_2 \cos \theta_2$. In total, this means
$V = -(m_1+m_2) g \ell_1 \cos \theta_1 - m_2 g \ell_2 \cos \theta_2.$
Now the kinetic energy is a little trickier. Here, we assume that the first pendulum bob has linear velocity $\mathbf{v}_1$, and speed $|\mathbf{v}_1| = v_1$. Now, if we let the second pendulum bob have linear velocity $\mathbf{v}_{1,2}$ relative to the first, then the total velocity of the second bob is $\mathbf{v}_2 = \mathbf{v}_1 + \mathbf{v}_{1,2}$. The reason that $\mathbf{v}_{1,2}$ is useful is that it is calculated in exactly the same way as the first pendulum (because the second pendulum is, well, a pendulum).

So the total (squared) speed of the second pendulum bob is $v_2^2= v_1^2 + 2 \mathbf{v}_1 \cdot \mathbf{v}_{1,2} + v_{1,2}^2$.

Now, to figure out $\mathbf{v}_1 \cdot \mathbf{v}_{1,2}$, we use the geometric (length and angle) formulation of the dot product:
$\mathbf{v}_1 \cdot \mathbf{v}_{1,2} = v_1 v_{1,2} \cos \varphi$
where $\varphi$ is the angle between $\mathbf{v}_1$ and $\mathbf{v}_{1,2}$. But the angle between $\mathbf{v}_1$ and $\mathbf{v}_{1,2}$, since both of them are perpendicular to their respective rods, must be the same as the angle between the rods: $\varphi=\theta_2 - \theta_1$. Finally, to express the magnitudes $v_1$ and $v_{1,2}$, we use the relations $v_1 = \dot{\theta}_1 \ell_1$ and $v_{1,2} = \dot{\theta}_2 \ell_2$. This gives
$v_2^2 = \ell_1^2 \dot{\theta}_1^2 + 2 \dot{\theta}_1 \dot{\theta}_2\ell_1 \ell_2 \cos(\theta_2 - \theta_1) + \ell_2^2 \dot{\theta}_2^2$

This finally gives
$T =\frac 1 2 (m_1+m_2) \ell_1^2 \dot{\theta}_1^2 +\frac 1 2 m_2\left( 2 \dot{\theta}_1 \dot{\theta}_2\ell_1 \ell_2 \cos(\theta_2 - \theta_1) + \ell_2^2 \dot{\theta}_2^2\right),$
or,
$L = T-V = \frac 1 2 (m_1+m_2) \ell_1^2 \dot{\theta}_1^2 +\frac 1 2 m_2\left( 2 \dot{\theta}_1 \dot{\theta}_2\ell_1 \ell_2 \cos(\theta_2 - \theta_1) + \ell_2^2 \dot{\theta}_2^2\right)$$+ (m_1+m_2) g \ell_1 \cos \theta_1 + m_2 g \ell_2 \cos \theta_2.$
(If you're reading this on a mobile device, please turn it to Landscape mode to see the full equations.) Actually solving these equations for $\theta_1$ and $\theta_2$ will have to wait for another entry!

As a final note, kudos to MathJax which allows rendering of math formulas using $\LaTeX$!

Monday, August 17, 2015

The Configuration Space of a Double Pendulum

One of the classic ways of understand the torus is by anything that involves two circles, such as a system with two independent angular variables—roughly speaking, it is because the torus has two families of circles, one surrounding the tube, and another surrounding the hole. The classic example of such a system is a double pendulum, which is one pendulum hung off the pendulum bob of another.

Double pendulum: the path is traced by the second bob and fills a path within the annulus. If we imagine that the second circle goes into and out of the page instead, we can see it as a path on a torus.

Of course, it is not immediately apparent why one would find it advantageous to consider a whole torus, as opposed to two distinct circles, other than for cool visualization purposes, so we should explain this. First, many interesting possibilities arise when the two angular variables are not independent: they bear some complicated relationship imposed by, say, physics, in this case. Not all the possibilities are there, at least, in a short amount of time, and it merely traces some path through this grand torus. The dynamics of the interaction is something that makes the system transcend merely "having two angles".

This notion of "where all the variables live together" is historically what led to the development of the theory of manifolds. Think again to parametrizations and equations, where variables that we might only know, initially, lie in some Euclidean space, either get warped into more complicated spaces, or, when constrained, lead to funny subsets where, and only where, the constraint is satisfied. These state spaces (or configuration spaces) are everywhere, at least implicitly in any kind of modeling of real-world objects and their dynamics.

This visualization was coded in MATLAB, using a symplectic integrator (in order more accurately reflect the energy-conservation). We delve into the calculational specifics of this example in a future post.

Saturday, August 15, 2015

Parametric Pasta: Relaunch of Nested Tori

Today we relaunch Nested Tori, and celebrate with a bit of history. Visualization has always been something that has run strong in my mathematical quests since declaring it as my major at UCLA, and is why I really chose to take up the field of differential geometry, and go to graduate school at UCSD. Of course, that subject is far more abstract than simply what can be pictured, so in effort to stay grounded in the motivations and keep it real, I took up some numerical analysis. That was a refreshing viewpoint, and in fact, what eventually enabled me to get that PhD finished. Well, that, and other things. Suffice to say, my journey has brought me to a very interesting intersection of geometry, topology, real analysis, and numerical analysis (and despite a joke from a colleague, that intersection is not empty!).

During that time, of course, I had to teach, and enjoyed showing several visualizations to, in essence, breathe a soul into some of the calculations that we had them doing. Not that it would be soulless without such pictures, but it was evident that such a soul became more "accessible" after some form of visualization. And one of the favorites, and in fact, apparently what I'm most famous for (according to Google), is my Pasta Parametrization Quiz: Match the following pasta (click to enlarge) [UPDATE: See here for higher-res photos and a more readable formulas]

to the following parametrizations ("equations", also click to enlarge):

(the numbers are given by De Cecco). Despite the media coverage, no, I didn't actually administer it as a "pop quiz" ... handling student rebellions isn't exactly being my favorite task. But I bring this back not just to provide a follow-up, or as a bid to get more attention after my supposed 15 minutes have expired (but it should be pretty scandalous that it hasn't been featured here on Nested Tori, right? Right??) but also to provide an answer key and talk about some of the mathematical ideas surrounding it. If you haven't tried the quiz, please try it before the spoiler.

Really, I want explain a bit about the philosophy behind such a thing, because it motivates much of my work. Parametrizations are one of the building blocks of visualization, and indeed, a lot of other science, principally, anything that has to do with generating numerical data. This is because the numerical data often satisfies certain constraints: it lies on some high-dimensional surface possibly in some higher-dimensional space. The power of geometry is not (just) in its literal descriptions, but rather, some form of metaphorical descriptions. Parametrizations are a way of translating numbers roaming around in easy, planar (flat) sets, into more complicated warped versions. In each of the above parametrizations, one sees the domain of the parameters are things like rectangles, or in one case, the union of three distinct rectangles. The formulas contort the rectangles using those smooth formulas. More complicated objects could be made by simply taking a bunch together, with adjustments such as rotation and translation.

The other common way to describe such sets is to start with a high dimensional space and add in constraints by requiring the coordinates to satisfy certain equations. We've covered an example of this, actually (a way to generate nested tori).

Cavatappi: Equation (0.3)
I thought of this parametrization by first considering a torus (of course) which differs from this example only in how the z-coordinate is treated: if we imagine a torus continually wrapping around itself multiple times (hence the greater range of the parameter t), but at the same time, start changing the height, then it doesn't wrap around itself anymore, but rather, moves "out of its own way", vertically. The "β + α cos(s)" part is the horizontal component of the cross-sectional circle, and "α sin(s)" is the vertical component.

Fusilli: Equation (0.5)
This was conceived of as a variation of a helicoid, which in turn is like parametrizing a flat disk by polar coordinates—think of a rotating ray sweeping out a disk uniformly in time. Except, now, we add in a variation in the same way as we did for Cavatappi: move the entire ray vertically as it sweeps. The s²/2 term adds some additional curviness to the pasta surface as we move out. Finally, n=0,1,2 means that we take three of these separate helicoids, rotate them so that they're spaced evenly, and put all three of them together (this increases the density of the spirals).

Conchiglie Rigate: Equation (0.2)
This graphs a parabola in the radial direction, varying with respect to height. In addition, variation in the angle is used to moderate how sharp the parabola should be. The "rigate" part simply comes from the coloring scheme which uses stripes.

Penne Rigate: Equation (0.1)
The simplest parametrization: simply a skewed cylinder, which says radial distance from the z-axis is a constant, and angle and height are free to vary. The "rigate" part is like previous example.

Farfalle: Equation (0.4)
My students provided this to me as a challenge! "But what about bow-tie?" Each one of the three coordinates has a different tweak to make things work. The easiest is the x-coordinate, which yields the end ruffles. For the y-coordinate, we wanted to model the "pinch" as an inverted bell-like curve, giving y as a function of t (no, not the Gaussian curve; this is only quadratic instead of hyperexponential decay). As the parameter s varies, this rescales the curve until it is flat (s = 0), and then inverts it on the other side (s < 0). Finally, the z-coordinate should be wavy and most pronounced in the center. This is the perfect job of that bugbear of beginning calculus students everywhere: sin(θ)/θ.

All in all, this shows that putting together a lot of simple concepts can generate some cool-looking results, even if they are not exact (and indeed, dealing with less easily-modeled objects is why I got into numerical analysis, so approximation is possible, even without a precise formula).

Friday, February 13, 2015

Origami Dodecahedron

Physically made this one. 60 folded pieces. Took a couple of months.