The inverse-square law of magnetism

The observation that opposite charges attract while like charges repel, with a force proportional to the inverse square of distance, motivates the study of electrostatics. Although we often don’t solve problems in electrostatics using Coulomb’s law directly, relying instead on Gauss’s law or on techniques for solving Poisson’s equation for the electric scalar potential, there is no physical information contained in those techniques that isn’t already in Coulomb’s law. Therefore, the entire field of electrostatics, including the introduction of the electric field E itself, may be viewed as an exploration of the consequences of this basic fact about stationary electric charges (and continuous distributions thereof).

At the beginning of the chapter on magnetostatics in Introduction to Electrodynamics, Griffiths notes that parallel currents attract while antiparallel currents repel. You might think that he would go on to write down a law similar to Coulomb’s law that described this force between currents directly. However, this approach is not pursued. Instead, the magnetic field B and the Lorentz force law are introduced right away. In a subsequent section, the Biot–Savart law is used to compute the magnetic field of a long straight wire, which is then immediately used to find the force between two wires.

There is a good reason for this, which I’ll get to at the end. There’s just one small problem, which is that it’s not clear that magnetism actually obeys an inverse-square law and that the force that wire 1 exerts on wire 2 is equal and opposite to the force that wire 2 exerts on wire 1. Students are also confused by being asked to use the right-hand rule to assign an apparently arbitrary direction to the magnetic field. It may appear that the universe prefers one handedness over the other for some strange reason (which, for what it’s worth, is true for the weak interactions, but not the electromagnetic).

If there were an analogue to Coulomb’s law for magnetostatics, it would become clear that the magnetic force obeys an inverse-square law, obeys Newton’s third law, and does not involve an arbitrary choice of handedness. Obviously I don’t mean the Biot–Savart law; I mean a law that gives the force between currents directly. In fact, we can derive such a law using basic knowledge of vector calculus. Recall that the Biot–Savart law gives the magnetic field at a point due to some given source distribution:
\displaystyle B(x) = \frac{\mu_0}{4\pi} \int \frac{J(x') \times (x - x')}{\|x - x'\|^3} \, \mathrm{d}^3x'
The force this exerts on some other current distribution J(x) is given by applying the Lorentz force law and integrating again over x:
\displaystyle F = \frac{\mu_0}{4\pi} \iint J(x) \times \frac{J(x') \times (x - x')}{\|x - x'\|^3} \, \mathrm{d}^3x' \, \mathrm{d}^3 x
We can simplify this using the vector identity A \times (B \times C) = B(A \cdot C) - C(A \cdot B):
\displaystyle F = \frac{\mu_0}{4\pi} \iint J(x') \left(J(x) \cdot \frac{x - x'}{\|x - x'\|^3}\right) - (J(x) \cdot J(x')) \frac{x - x'}{\|x - x'\|^3} \, \mathrm{d}^3x' \, \mathrm{d}^3 x
To see that the integral over the first term vanishes, we can first exchange the order of integration and pull J(x') out of the inner integral, then observe that (x - x')/\|x - x'\|^3 is the gradient of -1/\|x - x'\|:
\displaystyle \iint J(x') \left(J(x) \cdot \frac{x - x'}{\|x - x'\|^3}\right) \, \mathrm{d}^3 x' \, \mathrm{d}^3 x = \int J(x') \int J(x) \cdot \nabla(-\|x - x'\|^{-1}) \, \mathrm{d}^3 x \, \mathrm{d}^3 x'
This allows us to perform integration by parts on the inner integral:
\displaystyle \int J(x) \cdot \nabla(-\|x - x'\|^{-1}) \, \mathrm{d}^3 x = \\ -\int \|x - x'\|^{-1} J(x) \cdot \mathrm{d}a + \int (\nabla \cdot J(x))\|x - x'\|^{-1} \, \mathrm{d}^3 x
where the surface integral is taken over some sufficiently large surface that completely encloses the current distribution J(x) acted on and therefore vanishes, and the volume integral vanishes because the divergence is zero for a steady current. We conclude that:
\displaystyle F = -\frac{\mu_0}{4\pi} \iint (J(x) \cdot J(x')) \frac{x - x'}{\|x - x'\|^3} \, \mathrm{d}^3x' \, \mathrm{d}^3 x
This now reads very much like Coulomb’s law:
\displaystyle F = \frac{1}{4\pi\epsilon_0} \iint \rho(x) \rho(x') \frac{x - x'}{\|x - x'\|^3} \, \mathrm{d}^3x' \, \mathrm{d}^3 x
Both laws describe inverse-square forces that act along the line joining the two elements; the difference in signs is because like charges repel while like (parallel) currents attract. It is now easy to see that the force exerted on J(x) by J(x') is equal and opposite to the force exerted on J(x') by J(x). The right-hand rule does not appear anywhere. Parallel currents attract and antiparallel currents repel, period.

We can directly use this law to compute the force between two long, straight, parallel current-carrying wires. For such a problem we would use the version of the law expressed in terms of currents, I, rather than current densities, J. (I derived the law using current densities because it’s more general that way, and we can easily argue that we can go from J to I but it’s less obvious why the form involving J‘s should be true given the form involving I‘s.)
\displaystyle F = -\frac{\mu_0 I_1 I_2 }{4\pi} \iint \frac{x - x'}{\|x - x'\|^3} \, \mathrm{d}\ell \cdot \mathrm{d}\ell'
To get the force per unit length, we just compute the inner integral at x = 0. We know by symmetry that the force will be perpendicular to the wires, so we can just throw away the longitudinal component. And the two wires are everywhere parallel so \mathrm{d}\ell \cdot \mathrm{d}\ell' = 1. It’s then clear that
\displaystyle \|F\|/L = \frac{\mu_0 I_1 I_2 }{4\pi} \int_{-\infty}^\infty \frac{d}{(d + x)^{3/2}} \, \mathrm{d}x = \frac{\mu_0 I_1 I_2 }{2\pi d}
This is basically the same calculation that we would do with the Biot–Savart law and the Lorentz force law; it’s simply a bit more direct this way.

Still, if magnetism were introduced in this way, it would be necessary to get from the inverse square law to the field abstraction somehow. With electrostatics this is easy: we just pull out \rho(x) and call everything else E(x). With the inverse-square law for the magnetic force, we can try to do something like this. It’s a little bit trickier because the force also depends on the direction of the current element J(x) (while charge density, of course, doesn’t have a direction). To represent an arbitrary linear function of J(x) that gives the force density f on the current element, we need to use a matrix field rather than a vector field:
\displaystyle F = -\frac{\mu_0}{4\pi} \iint \left[\frac{x - x'}{\|x - x'\|^3} J^t(x')\right] J(x) \, \mathrm{d}^3x' \, \mathrm{d}^3 x = \int B(x)J(x) \, \mathrm{d}^3 x
where the matrix field B evidently satisfies
\displaystyle B(x) = -\frac{\mu_0}{4\pi} \int \frac{x - x'}{\|x - x'\|^3} J^t(x')  \, \mathrm{d}^3x'
The problem is that what we’ve done here is, in some sense, misleading. The local law f = BJ we may be tempted to conclude is not really true, in that B(x)J(x)\, \mathrm{d}^3 x is not the real force density on the infinitesimal current element J(x). In performing an integration by parts earlier, we essentially averaged out the term we wanted to get rid of over the entire target distribution. But this destroyed the information about what the true distribution of force is over the target current loop, with observable consequences unless the loop is rigid. In other words, we can only compute the total force on the loop, but not the stress tending to deform the loop. The true Lorentz force law that has the correct local information also has the form f = BJ, but the matrix B is antisymmetric; we also need an antisymmetrization (wedge product) in the equation for B. It’s not clear to me whether it’s possible to get from the inverse-square law to the true Lorentz force law and the true Biot–Savart law, or whether the loss of local information was irreversible. So, as I said, there is a very good reason why we don’t just postulate the inverse-square law and take that as the basis for magnetostatics: it is (probably) actually incomplete in a way that Coulomb’s law is not. Still, I feel that it contains some useful intuition.

Posted in Uncategorized | 1 Comment


I was recently reading the C99 rationale (because why the fuck not) and I was intrigued by some comments that certain features were retired in C89, so I started wondering what other weird features existed in pre-standard C. Naturally, I decided to buy a copy of the first edition of The C Programming Language, which was published in 1978.

"The C programming Language" by Kernighan and Ritchie

Here are the most interesting things I found in the book:

  • The original hello, world program appears in this book on page 6.

    In C, the program to print “hello, world” is

         printf("hello, world\n");

    Note that return 0; was not necessary.

  • Exercise 1-8 reads:

    Write a program to replace each tab by the three-column sequence >, backspace, -, which prints as ᗒ, and each backspace by the similar sequence ᗕ. This makes tabs and backspaces visible.

    … Wait, what? This made no sense to me at first glance. I guess the output device is assumed to literally be a teletypewriter, and backspace moves the carriage to the left and then the next character just gets overlaid on top of the one already printed there. Unbelievable!

  • Function definitions looked like this:
    power(x, n)
    int x, n;

    The modern style, in which the argument types are given within the parentheses, didn’t exist in K&R C. The K&R style is still permitted even as of C11, but has been obsolete for many years. (N.B.: It has never been valid in C++, as far as I know.) Also note that the return value has been enclosed in parentheses. This was not required so the authors must have preferred this style for some reason (which is not given in the text). Nowadays it’s rare for experienced C programs to use parentheses when returning a simple expression.

  • Because of the absence of function prototypes, if, say, you wanted to take the square root of an int, you had to explicitly cast to double, as in: sqrt((double) n) (p. 42). There were no implicit conversions to initialize function parameters; they were just passed as-is. Failing to cast would result in nonsense. (In modern C, of course, it’s undefined behaviour.)
  • void didn’t exist in the book (although it did exist at some point before C89; see here, section 2.1). If a function had no useful value to return, you just left off the return type (so it would default to int). return;, with no value, was allowed in any function. The general rule is that it’s okay for a function not to return anything, as long as the calling function doesn’t try to use the return value. A special case of this was main, which is never shown returning a value; instead, section 7.7 (page 154) introduces the exit function and states that its argument’s value is made available to the calling process. So in K&R C it appears you had to call exit to do what is now usually accomplished by returning a value from main (though of course exit is useful for terminating the program when you’re not inside main).
  • Naturally, since there was no void, void* didn’t exist, either. Stroustrup’s account (section 2.2) appears to leave it unclear whether C or C++ introduced void* first, although he does say it appeared in both languages at approximately the same time. The original implementation of the C memory allocator, given in section 8.7, returns char*. On page 133 there is a comment:

    The question of the type declaration for alloc is a vexing one for any language that takes its type-checking seriously. In C, the best procedure is to declare that alloc returns a pointer to char, then explicitly coerce the pointer into the desired type with a cast.

    (N.B.: void* behaves differently in ANSI C and C++. In C it may be implicitly converted to any other pointer type, so you can directly do int* p = malloc(sizeof(int)). In C++ an explicit cast is required.)

  • It appears that stdio.h was the only header that existed. For example, strcpy and strcmp are said to be part of the standard I/O library (section 5.5). Likewise, on page 154 exit is called in a program that only includes stdio.h.
  • Although printf existed, the variable arguments library (varargs.h, later stdarg.h) didn’t exist yet. K&R says that printf is … non-portable and must be modified for different environments. (Presumably it peeked directly at the stack to retrieve arguments.)
  • The authors seemed to prefer separate declaration and initialization. I quote from page 83:

    In effect, initializations of automatic variables are just shorthand for assignment statements. Which form to prefer is largely a matter of taste. We have generally used explicit assignments, because initializers in declarations are harder to see.

    These days, I’ve always been told it’s good practice to initialize the variable in the declaration, so that there’s no chance you’ll ever forget to initialize it.

  • Automatic arrays could not be initialized (p. 83)
  • The address-of operator could not be applied to arrays. In fact, when you really think about it, it’s a bit odd that ANSI C allows it. This reflects a deeper underlying difference: arrays are not lvalues in K&R C. I believe in K&R C lvalues were still thought of as expressions that can occur on the left side of an assignment, and of course arrays do not fall into this category. And of course the address-of operator can only be applied to lvalues (although not to bit fields or register variables). In ANSI C, arrays are lvalues so it is legal to take their addresses; the result is of type pointer to array. The address-of operator also doesn’t seem to be allowed before a function in K&R C, and the decay to function pointer occurs automatically when necessary. This makes sense because functions aren’t lvalues in either K&R C or ANSI C. (They are, however, lvalues in C++.) ANSI C, though, specifically allows functions to occur as the operand of the address-of operator.
  • The standard memory allocator was called alloc, not malloc.
  • It appears that it was necessary to dereference function pointers before calling them; this is not required in ANSI C.
  • Structure assignment wasn’t yet possible, but the text says [these] restrictions will be removed in forthcoming versions. (Section 6.2) (Likewise, you couldn’t pass structures by value.) Indeed, structure assignment is one of the features Stroustrup says existed in pre-standard C despite not appearing in K&R (see here, section 2.1).
  • In PDP-11 UNIX, you had to explicitly link in the standard library: cc ... -lS (section 7.1)
  • Memory allocated with calloc had to be freed with a function called cfree (p. 157). I guess this is because calloc might have allocated memory from a different pool than alloc, one which is pre-zeroed or something. I don’t know whether such facilities exist on modern systems.
  • Amusingly, creat is followed by [sic] (p. 162)
  • In those days, a directory in UNIX was a file that contains a list of file names and some indication of where they are located (p. 169). There was no opendir or readdir; you just opened the directory as a file and read a sequence of struct direct objects directly. Example is given on page 172. You can’t do this in modern Unix-like systems, in case you were wondering.
  • There was an unused keyword, entry, said to be reserved for future use. No indication is given as to what use that might be. (Appendix A, section 2.3)
  • Octal literals were allowed to contain the digits 8 and 9, which had the octal values 10 and 11, respectively, as you might expect. (Appendix A, section 2.4.1)
  • All string literals were distinct, even those with exactly the same contents (Appendix A, section 2.5). Note that this guarantee does not exist in ANSI C, nor C++. Also, it seems that modifying string literals was well-defined in K&R C; I didn’t see anything in the book to suggest otherwise. (In both ANSI C and C++ modifying string literals is undefined behaviour, and in C++11 it is not possible without casting away constness anyway.)
  • There was no unary + operator (Appendix A, section 7.2). (Note: ANSI C only allows the unary + and - operators to be applied to arithmetic types. In C++, unary + can also be applied to pointers.)
  • It appears that unsigned could only be applied to int; there were no unsigned chars, shorts, or longs. Curiously, you could declare a long float; this was equivalent to double. (Appendix A, section 8.2)
  • There is no mention of const or volatile; those features did not exist in K&R C. In fact, const was originally introduced in C++ (then known as C With Classes); this C++ feature was the inspiration for const in C, which appeared in C89. (More info here, section 2.3.) volatile, on the other hand, originated in C89. Stroustrup says it was introduced in C++ [to] match ANSI C (The Design and Evolution of C++, p. 128)
  • The preprocessing operators # and ## appear to be absent.
  • The text notes (Appendix A, section 17) that earlier versions of C had compound assignment operators with the equal sign at the beginning, e.g., x=-1 to decrement x. (Supposedly you had to insert a space between = and - if you wanted to assign -1 to x instead.) It also notes that the equal sign before the initializer in a declaration was not present, so int x 1; would define x and initialize it with the value 1. Thank goodness that even in 1978 the authors had had the good sense to eliminate these constructs… :P
  • A reading of the grammar on page 217 suggests that trailing commas in initializers were allowed only at the top level. I have no idea why. Maybe it was just a typo.
  • I saved the biggest WTF of all for the end: the compilers of the day apparently allowed you to write something like even when foo does not have the type of a structure that contains a member named bar. Likewise the left operand of -> could be any pointer or integer. In both cases, the compiler supposedly looks up the name on the right to figure out which struct type you intended. So foo->bar if foo is an integer would do something like ((Foo*)foo)->bar where Foo is a struct that contains a member named bar, and would be like ((Foo*)&foo)->bar. The text doesn’t say how ambiguity is resolved (i.e., if there are multiple structs that have a member with the given name).
Posted in Uncategorized | 16 Comments

I’m now a grownup, apparently

Whoa, it’s been a long time since my last update. (My previous post seems to have been received exceptionally poorly. I guess people don’t like negativity.)

Part of the reason why I update less frequently now is that I spend a lot of time on Quora and it gives me the opportunity to write a lot of things about science that I probably would’ve written here instead. Some people even blog on Quora, but I don’t think it makes sense for me to do so. I have a lot of followers on Quora, but they probably want to see my thoughts on science, and may or may not be interested in my personal life. On the other hand, a lot of people who read my blog aren’t on Quora.

I had planned to update in September, because that’s when I had planned to start work. For anyone who doesn’t know yet, I’m now living in Sunnyvale, California and working full-time as a software engineer at Google! Due to problems with getting a visa, I wasn’t actually able to start until the past Monday—more than four months after my initially planned start date. I guess you could say I had a rather long vacation—longer than I could handle, as I was starting to get quite bored at home.

This is, of course, my first full-time job, and for the first time, I feel that I have to exercise a lot of self-restraint. During my internships, I skipped breakfast, stuffed myself at dinner, drank a lot of pop, and frequently ate unhealthy snacks. I allowed myself to get away with it because I knew that the internship would end in a few months, and then I wouldn’t be tempted as much. Things are a lot different now, because, well, who knows how long I’m going to be here? If I develop unhealthy habits, it’s going to be hard to break them. So I now wake up early enough to catch breakfast at the office, eat reasonable portions at lunch and dinner, limit my consumption of pop, and try to eat at least one piece of fresh fruit per day. (Google doesn’t have as many unhealthy snacks as Facebook; I haven’t yet seen beef jerky at the office.) Now I just have to make myself exercise… urgh, I’ll get to that eventually >_>

You know, when I was a little kid, like all other kids, I hated having a bedtime, and I wondered why adults didn’t have bedtimes, and my mom told me that adults do have bedtimes, and they’re just self-imposed. Well, now I understand. I guess that makes me an adult, eh?

(Oh, and one last note: The girl mentioned in the previous post is no longer my girlfriend. I suppose it’s a little bit embarrassing to excitedly announce that you’re dating someone and then break up shortly afterward, but, oh well, that’s life. I’m not going to retroactively edit the post to hide my embarrassment or anything like that. She’s still an awesome person; just not the right one for me.)

Posted in Uncategorized | 1 Comment

Done exams

I thought my Real Analysis exam was going to be the hardest one, but it turned out to be probably the easiest exam I have ever written in my entire undergrad. Several questions just asked for definitions or statements of theorems. One question was true/false, with no proof required. Two questions were nontrivial, but actually quite trivial because they appeared on the 2012 exam so I already knew how to solve them. Also, the exam has 39 marks total, but you only need 30 marks to get 100%.

Looking over past exams, I would say that there has been a marked decrease in difficulty over time. Not just for this course, mind you—this has been the case for most other courses I’ve looked at too. For example, this was certainly the case for CHM342 and CHM440 (both organic synthesis). PHY489 (high energy physics) didn’t seem to show any variation in difficulty. I’ve never seen a course for which the exams became harder over time. It’s funny how people complain all the time about how hard U. of T. is. Would they have even passed in previous years?? (The counterargument here, perhaps, is that the high school curriculum has also become more and more watered-down over time, so perhaps students in the past were more competent in college and could handle the harder exams. I’m not sure whether this is true.)

Anyway; wow. School is finally over for me… at least until grad school. Not to say that studying wasn’t satisfying… but maybe it will be even more so now that I have the freedom to set my own curriculum.

The next order of business is the ACM-ICPC World Finals, which are going to be held in Russia from June 22 to June 26. This gives me about two months to practice. But sadly, I’m unlikely to make good use of it. I know people can improve a lot in a few months; Tyson (my teammate in first year) is a great example as he got really good really quickly over the summer after high school. Unfortunately, I get distracted by Reddit on a regular basis. Coding algorithm problems just hasn’t been fun for a long time now; I love solving them, but coding is a pain. Also, reading ACM problem statements is a pain. I’m adequately motivated during an actual contest, but when I’m practicing by myself that’s a different matter… sigh. It’s too bad Richard Peng isn’t here to remind me to do problems :P

Posted in Uncategorized | 1 Comment

Relativistic electrodynamics cheat sheet

I was bored, so I decided to LaTeX up the cheat sheet I brought to my PHY450 (relativistic electrodynamics) exam. It wasn’t actually cheating, of course; we were permitted to bring in a two-sided exam aid sheet. I originally used both sides, but when I typed it up I was able to cram all the relevant formulae onto one sheet. This has not been thoroughly vetted for errors, so use at your own risk. Page numbers are references to lecture notes and will not be useful for anyone not taking the course, but I still think these formulae will be generally useful. Gaussian units are used throughout because that’s what was used in class. (I don’t like Gaussian units but that’s just how it is; see previous post about what I think would be a better unit system—but of course nobody will ever use it.)


Posted in Uncategorized | 1 Comment

Unit systems in electrodynamics

I learned electrodynamics, like most other undergraduate students of my generation, in SI units. They seemed like the natural choice, because we use SI units for everything else. But then I took PHY450, “Relativistic Electrodynamics”, where we use cgs-Gaussian units. At first I found this unit system strange and unnatural, but then I realized it does have some advantages over the SI system. For example, the E and B fields have the same units, as do the scalar and vector potentials. This is attractive because the scalar and vector potentials together constitute the four-current, A^i = (\phi, \mathbf{A}), and the electric and magnetic fields together form the electromagnetic field tensor,

\displaystyle F^{ij} = \begin{pmatrix} 0 & -E_x & -E_y & -E_z \\ E_x & 0 & -B_z & B_y \\ E_y & B_z & 0 & -B_x \\ E_z & -B_y & B_x & 0 \end{pmatrix}

Still, I find it very strange that cgs units don’t have a separate unit for electric charge or electric current. I think it is unintuitive to measure electric charge in terms of length, mass, and time, because electric charge isn’t defined in that way. We define speed to be distance over time, acceleration to be change in velocity over time, force to be mass times acceleration, pressure to be force over area, energy to be force times distance, power to be energy over time. We don’t define electric charge to be anything times anything or anything over anything; it’s as much a fundamental quantity as any other. For what it’s worth, there are some contexts in which it makes sense not to treat charge as a separate unit. For example, in quantum electrodynamics, the charge of the electron is often expressed in Planck units, where its value is the square root of the fine structure constant. But then again, we also measure distance in units of time sometimes (light years) and time in units of distance sometimes (think x^0 in relativity). I think that it makes as much sense to have a separate unit for charge or current as it does to have separate units for distance and time, or mass, momentum, and energy—most of the time it’s better, with some exceptions.

I think that in the nicest possible system of units, then, the equations of electrodynamics would look like this. k denotes the reciprocal of the electric constant \epsilon_0, and is 4\pi times greater than Coulomb’s constant. The magnetic field and magnetic vector potentials are c times their values in SI units. The metric convention is (+,-,-,-).

Maxwell’s equations:
\displaystyle \nabla \cdot \mathbf{E} = k\rho
\displaystyle \nabla \cdot \mathbf{B} = 0
\displaystyle \nabla \times \mathbf{E} = -\frac{\partial \mathbf{B}}{\partial(ct)}
\displaystyle \nabla \times \mathbf{B} = k\frac{\mathbf{J}}{c} + \frac{\partial\mathbf{E}}{\partial(ct)}

Lorentz force law:
\displaystyle \mathbf{F} = q\left(\mathbf{E} + \frac{\mathbf{v}}{c} \times \mathbf{B}\right)

Coulomb’s law:
\displaystyle \mathbf{F} = \frac{k q_1 q_2}{4\pi r^2} \hat{\mathbf{r}}
\displaystyle \mathbf{E} = \frac{k q}{4\pi r^2} \hat{\mathbf{r}} = \iiint \frac{k \rho}{4\pi r^2} \hat{\mathbf{r}} \, dV

Biot–Savart law:
\displaystyle \mathbf{B} = \int \frac{k}{4\pi r^2} \frac{I \, d\ell \times \hat{\mathbf{r}}}{c} = \iiint \frac{k}{4\pi r^2} \frac{\mathbf{J} }{c} \times \hat{\mathbf{r}} \, dV

Scalar and vector potentials
\displaystyle \mathbf{E} = -\nabla V - \frac{\partial \mathbf{A}}{\partial(ct)}
\displaystyle \mathbf{B} = \nabla \times \mathbf{A}
\displaystyle A^\mu = (V, \mathbf{A})

Field tensor
\displaystyle F^{\mu\nu} = \begin{pmatrix} 0 & -E_x & -E_y & -E_z \\ E_x & 0 & -B_z & B_y \\ E_y & B_z & 0 & -B_x \\ E_z & -B_y & B_x & 0 \end{pmatrix}

Potentials in electrostatics and magnetostatics (Coulomb gauge)
\displaystyle V = \frac{k}{4\pi r}q = \iiint \frac{k}{4\pi r} \rho \, dV
\displaystyle \mathbf{A} = \int \frac{k}{4\pi r} \frac{I \, d\ell}{c} = \iiint \frac{k}{4\pi r} \frac{\mathbf{J}}{c} \, dV

Retarded potentials, Lorenz gauge
\displaystyle A^\mu(\mathbf{r}, t) = \iiint \frac{k}{4\pi r} \frac{J^\mu(\mathbf{r}', t - \|\mathbf{r} - \mathbf{r}'\|/c)}{c} \, d^3\mathbf{r}'
\displaystyle A^\mu(x^\nu) = \iiiint \frac{k}{4\pi} \frac{J^\mu(x^{\nu\prime})}{c} \, 2\delta((x^\nu - x^{\nu\prime})^2) \, \theta(x^0 - x^{0\prime}) \, d^4 x^{\nu\prime}

Liénard–Wiechert potentials
\displaystyle V = \frac{kq}{4\pi r} \frac{1}{1-\vec{\beta} \cdot \hat{\mathbf{r}}}
\displaystyle \mathbf{A} = \frac{kq}{4\pi r} \frac{\vec{\beta}}{1 - \vec{\beta} \cdot \hat{\mathbf{r}}}
(\mathbf{r} is the vector from the retarded position of the charge to the observation point, r is the magnitude of \mathbf{r}, and \vec\beta is 1/c times the velocity of the charge at the retarded time.)

Poynting vector; Maxwell stress tensor; electromagnetic stress-energy tensor
\displaystyle T^{\alpha\beta} = -\frac{1}{k}F^{\alpha \gamma} F^\beta{}_\gamma + \frac{\eta^{\alpha\beta}}{4k} F^{\gamma\delta}F_{\gamma\delta}
\displaystyle T^{00} = \frac{1}{2k}(E^2 + B^2)
\displaystyle T^{0i} = T^{i 0} = \frac{1}{k} \epsilon_{ijk} E_j B_k = \frac{1}{c} S_i
\displaystyle T^{ij} = -\frac{1}{k}\left[E_i E_j + B_i B_j - \frac{1}{2}\delta_{ij} (E^2 + B^2)\right] = -\sigma_{ij}
\displaystyle \mathbf{S} = \frac{c}{k} \mathbf{E} \times \mathbf{B}
Poynting’s theorem and the statement of the conservation of momentum in electrodynamics take their usual form and are the same in all unit systems:
\displaystyle \nabla \cdot \mathbf{S} = -\mathbf{J} \cdot \mathbf{E} - \frac{\partial}{\partial t} \left[\frac{1}{2k}(E^2 + B^2)\right]
\displaystyle \partial_i \sigma_{ij} = f_j + \frac{1}{c^2} \frac{\partial S_j}{\partial t}

Electromagnetic Lagrangian density
\displaystyle \mathcal{L} = -\frac{1}{4k}F^{\alpha\beta}F_{\alpha\beta} - \frac{1}{c}J^\alpha A_\alpha

Posted in Uncategorized | Leave a comment

On perspective

Reasoning objectively is difficult because we are all biased by our own subjective experiences. There are two ways I can see to address this. The first is to consider others’ subjective experiences in addition to your own. This gives you what many would call a more balanced point of view. You might say that reading about another person’s experiences gave you new perspective on an issue. The second is to attempt to unconsider your own subjective experiences by training yourself to recognize cognitive bias.

I think the conventional wisdom is that you should always try to seek out others’ perspectives, but I question how useful that really is. I’m biased enough already; why would I want to inject even more bias into my thoughts? Emotion is really important in day-to-day life, but I think that it shouldn’t enter the debate on important moral issues facing humanity. When people talk about how their perspectives have been changed, often they mean something like how they visited an impoverished country and were moved by what they saw. Honestly, I don’t care. I already know that poverty causes a great deal of suffering, and, like many others, I would really like to know how it can be reduced. I don’t want to hear your sob story, though.

On the other hand, talking to other people about issues can be rewarding in that they may provide you with facts that you didn’t already know. So that’s the strategy I try to use: talk to other people to learn things you didn’t know, and, at the same time, work by yourself to become less biased. Unfortunately, if there’s one thing I’ve learned, it’s that becoming less biased is really hard. I’m definitely still a very biased person.

Posted in Uncategorized | 1 Comment

Done organic chemistry forever!

I used to love organic chemistry. Especially synthesis problems, which I felt gave me a chance to exercise my creativity and problem-solving skills. When I was competing for a spot on the Canadian IChO team, organic chemistry was probably my strong suit (although it’s also Richard’s strong suit, and in fact he apparently beat me on every section). When I decided to study chemistry at U. of T., I decided to take mostly physical and organic chemistry courses. I avoided biochemistry, because there’s too much memorization, and analytical chemistry, because I thought it would be boring. And I couldn’t fit inorganic into my schedule initially, and I ended up just not taking any inorganic courses at all.

I’m sorry to say that I don’t love organic chemistry anymore. In fact, I’m sick of it. I’ve taken CHM249 (organic chemistry II), CHM342 (organic synthesis), and CHM440 (more organic synthesis, with a focus on pharmaceutical agents). Every time I took another organic chemistry course, there were more and more reactions to memorize. The memorization was not too bad in CHM249, but there also weren’t any challenging problems of the sort I used to love solving in high school. In CHM342 the amount of memorization increased significantly, and I had an especially hard time with drawing three-dimensional transition states and predicting stereochemistry. In CHM440 there were a total of 90 named reactions. I was actually scared of synthesis problems, because, with that many reactions, there is simply no way I would be able to see the right disconnections. Luckily, there weren’t any. Suffice it to say that this course confirmed my suspicion that it was the right choice not to go to grad school to study organic synthesis…

Anyway, I had my CHM440 final exam last week, and my CHM347 exam today (biochemistry; I didn’t want to take it but if I took only courses I liked then I wouldn’t be able to graduate). Next term I’m only taking statistical mechanics. This means I’ll never have to look at another curved arrow for the rest of my life. Yippee!

In retrospect, I greatly underestimated the difficulty of fourth year—or perhaps I just bit off more than I could chew. Three of the five courses I took (CHM440, high-energy physics, general relativity I) were cross-listed as grad courses, so I guess it was foolish of me to expect them not to be so hard. There was also a grad student in my (time-dependent) quantum chem course, and at any rate it was very grad-style, with a take-home final (which was way too hard) and a term paper. I was incredibly busy toward the end of the term, trying to keep up my GPA (not that it actually matters, but I guess it’s just an irrational drive of mine).

I’ve noticed something about myself: when I’m busy doing stuff I don’t want to do, I think of all the productive things I could be doing if I had free time, such as developing the PEG Judge, studying quantum field theory, learning Haskell, or maybe learning a new (human) language. Alas, once free time arrives, I basically waste time browsing Reddit and playing games. Anyone else do this?

Now for some more awesome news. We made ACM World Finals! I’ll be representing the University of Toronto in Russia in June, together with my teammates Hao Wei and Saman Samikermani. We weren’t sure whether we were going to make it, since CMU and UMich creamed us at the regionals, and solved more problems than we did. But I guess ACM just stuck to the recent trend of inviting the top three teams from our region. We’ve got a long road ahead of us; other teams are good—really, really good—and we’re all a bit out of practice. I just hope we don’t do too badly!

Posted in Uncategorized | 1 Comment

WKB approximation

I have a confession to make: I’m scared of approximations.

Scandalous, isn’t it? For there are few things in physics that can be solved exactly, and so if we want to ever be able to do any calculations, rather than just sitting around and writing down one PDE after another that nobody will ever solve, we simply can’t do without approximations. What kind of physics major doesn’t like approximations?

And yet, they’ve never come easily to me. In contrast to theorems and exact solutions, which make a lot of sense to me, approximations always confuse me and feel like they’re something I just have to memorize, and I’m not very good at memorizing things. This started in high school, when we were discussing double-slit diffraction patterns. In order to get an expression for the approximate spacing between the bands, you’ve got to argue that because the screen is far away, you have nearly a right triangle, and \sin \theta \approx \tan \theta \approx \theta, as shown here. In the end, I just memorized the formula and felt dirty about it.

In my second year of college, I took an intro quantum mechanics course. It began with a discussion of the wave-like nature of matter, the photoelectric effect, Compton scattering, bremsstrahlung, hydrogen atom energy levels, all that good stuff. Then we did the Schrödinger equation for a particle in a box, a free particle, and plenty of super annoying problems where you have a potential jump and you have to match coefficients at the discontinuity and compute transmission and reflection coefficients. At the very end of term, we were introduced to the WKB approximation for the first time. Now, the prof for this course is notoriously bad (fortunately, he no longer teaches it), so I could barely understand what was going on; in the end he just derived an approximate formula for the tunneling amplitude through a potential barrier, and said WKB wouldn’t be on the final exam. I was relieved, and hoped I’d never come across it again.

Fast forward to present. I’m taking a course called Applications of Quantum Mechanics, and it’s a thinly veiled physics course about time-dependent QM which happens to be classified as CHM (luckily for me, because I didn’t want to take any more organic courses). Naturally, the WKB approximation shows up. There’s a lengthy discussion in the text about assuming an exponential form, expanding in a power series, and then plugging it into the Schrödinger equation. It was terribly dry, so I ended up just looking at the formula. That’s when it finally made sense to me.

The WKB approximation gives the following expression for the wave function of a scattering momentum eigenstate (i.e., E > V) subject to a spatially varying potential V(x):

\displaystyle \psi(x) \approx \frac{A}{\sqrt{p}} e^{i \int p/\hbar \, dx} + \frac{B}{\sqrt{p}} e^{-i \int p /\hbar \, dx}

where A and B are constants, and the momentum function p is defined as you would expect: p(x) = \sqrt{2m(E-V(x))}.

In order to see why this formula makes sense, compare it to the case where V = 0 and we have a free particle. Here we have an exact solution for the momentum eigenstates:

\displaystyle \psi(x) = C e^{ipx/\hbar} + D e^{-ipx/\hbar}

From the free-particle solution we can see that as the wave travels, it picks up a phase shift proportional to its momentum. If it travels a distance dx, then it incurs a phase shift of p/\hbar \, dx.

The WKB approximation is nothing more than the extension of this to a spatially varying potential. Here p is a function of x, so the total phase shift up to a particular point isn’t just px/\hbar, but has to be replaced by an integral, \int p/\hbar \, dx.

There’s a twist, however; in order to conserve probability, the amplitude has to vary spatially. Because we have a stationary state, the probability current has to be constant. Now, the probability current transported by the wave A' e^{ipx/\hbar} is proportional to A'^2 p. For this to remain constant, we must have A' = A/\sqrt{p}.

And really, that’s all there is to it: WKB is what you get when you assume that a free particle propagating through a potential maintains basically the same form; it just doesn’t accumulate a phase shift at a constant rate now since the potential isn’t constant, and its amplitude varies in order to conserve probability (just like how a classical wave’s amplitude decreases when it passes to a denser medium). There wasn’t anything to be scared of!

(In regions where E < V, we instead have exponential decay of the wave function. The formula given above is still correct, but a factor of i from the square root cancels the factor of i already in the exponential, and you get a real argument. The factor of (-1)^{1/4} in the denominator can be absorbed into the constant.)

Posted in Uncategorized | Leave a comment

ACM regionals, problems and solutions

To avoid possible copyright-related issues, all problem statements have been paraphrased or summarized.

Problem A

You are given a 10 by 10 grid and told the sum in each row and in each column. The sum tells you how many cells in that row or column are occupied by a ship. Each ship is aligned to the grid, meaning that for any given cell it either occupies it completely or does not occupy it at all, and all cells occupied by a particular ship are contiguous within a single row or column (as in the game Battleship). Ships are completely contained within the grid. There are four one-cell ships, three two-cell ships, two three-cell ships, and one four-cell ship. Two different ships cannot overlap, nor can two vertically, horizontally, or diagonally adjacent cells be occupied by two different ships.

  1. How many possible configurations of the ten ships yield the given row and column sums?
  2. If this number is greater than one, can we force the solution to be unique by further specifying the contents of some cell? We can specify it to be empty, occupied by the interior of a ship, occupied by the leftmost cell of a horizontal ship, occupied by the topmost cell of a vertical ship, occupied by the rightmost cell of a horizontal ship, occupied by the bottommost cell of a vertical ship, or occupied by a one-cell ship. If this is possible, find the “best” cell that allows us to do this, ranking first by lowest row number, then breaking ties by lowest column number, and then by type of contents, in the order listed in the previous sentence. If it is not possible, can we specify the contents of two cells and hence force the solution to be unique? Among multiple pairs of cells, find the best pair, that is, the one whose best cell is the best according to the rules for ranking single cells; ties are broken by the second best cells.


No team solved this problem during the contest. However, we may be reasonably certain that it can only be solved by a recursive backtracking approach, in which we will recursively generate all possible combinations of locations for the ten ships by adding one ship at a time, and pruning wherever adding a ship in a particular location would exceed the specified number of occupied cells in some row or column, and remembering that ships can’t overlap or be adjacent; whenever we manage to place all ships, increment the count of possible solutions. When trying to specify one or two cells in order to force a unique solution, we simply try all possibilities, starting from the best (i.e., empty cell in first row and first column, the empty cell in first row and second column, and so on) and again recursively generating all solutions, until we find something that give us exactly one solution. Because we didn’t attempt this problem, I can’t say what kind of optimizations are necessary to get AC.

Problem B

This problem asks to simulate cuckoo hashing. There are two arrays, of sizes n_1 \leq 1000 and n_2 \leq 1000 (n_1 \neq n_2). A non-negative integer value x would be stored at position x \mod n_1 in the first, and x \mod n_2 in the second. These two arrays together constitute a hash table. To insert a value into the hash table x, insert it into the first array. If there is already some value y in the position x \mod n_1, then we have to move y to the second array, storing it in position y \mod n_2. And if there is already a value z in that position, then we move it to the first array, in position z \mod n_1, and so on—we keep shuffling values back and forth between the two hash tables until a value lands in an empty spot.

You are given a sequence of non-negative integers. Output the states of the two arrays after inserting all the values in order. It is guaranteed that each insertion will terminate.


Straightforward simulation.

To see that this is fast enough, first observe that it’s impossible to insert more than n_1 + n_2 values, so there will be at most 1999 integers given to insert. Furthermore, since it’s guaranteed that each insertion will terminate, it’s not too hard to see that there’s a linear limit on how many steps each insertion can take. To prove this, consider an undirected graph in which each vertex corresponds to a location in one of the two arrays and each edge corresponds to either an existing value or the value we’re inserting; the edge is incident upon the two vertices corresponding to the positions where the value can be stored (thus, the graph is bipartite). When we insert a value, we trace out a path along the edges, which terminates once it lands on an empty vertex. If an insertion is to terminate, then there can’t be more edges than vertices in the connected component of the edge corresponding to the value we’re inserting, because then there would be too few spots for all the values. So there are two cases:

  1. Case 1: the number of edges is one less than the number of vertices (there are two empty spots). So the connected component is a tree. Because a tree has no cycles, we can never visit an edge twice (i.e., move a value that we’ve already moved). This gives the desired linear time bound.
  2. Case 2: the number of edges equals the number of vertices (there is one empty spot). There is then exactly one cycle. In this case, observe that if we ever visit a vertex twice, this must be the first vertex on the cycle we visited. When we visit it the second time, we kick it back to its original position which must be outside the cycle since it was the first. After this, we are directed away from the cycle and so cannot enter it again. Thus in this case the number of iterations can’t be more than twice the number of edges.

I’m not sure whether I could’ve come up with this (extremely hand-wavy) proof during the contest, but I think you’re just supposed to intuitively realize the number of iterations is linearly bounded. The total number of iterations here can’t exceed 2 \times 1999^2, or about 8 million, so we would expect this to get in under the time limit.

Problem C

This problem deals with a modified version of the Playfair cipher. To encrypt a message, you first need to pick a secret key. Then you write the letters of the key, one at a time, in a 5 by 5 grid, with each letter occupying one cell. Squares are filled left to right, top to bottom. Duplicate letters in the key are skipped after the first time they’re written down. After this, unused squares in the grid are filled left to right, top to bottom, by the letters of the alphabet not present in the key, in order. I and J are considered the same letter, so there are only 25 letters in the alphabet, and thus the grid will end up containing a permutation of the alphabet.

A message is encrypted as digraphs, that is, two-letter pairs. For example, to encrypt ECNA, you would break it down as EC NA and then encrypt EC and NA separately, and concatenate the results. A special case occurs when the two letters in a pair are identical; in this case, in the original Playfair cipher, you insert the letter X after the first, and then the second actually becomes the first letter of the next pair; for example, HELLO would become HE LX LO and then these three pairs would be encrypted. If you end up with a single letter at the end, add an X to the end. (Evidently, a message such as FOX cannot be encrypted with this method.) In the modified version used in this problem, instead of always inserting the letter X, the first time you insert a letter you use A, the second time, you use B, and so on, skipping J, and wrapping around after Z. You also skip a letter if inserting it would give a pair of identical letters; for example, AARDVARK becomes AB AR DV AR KC, where we have skipped A.

After breaking down the message into letter pairs, we use the grid to encrypt each pair. If the two letters appear in the same row, then we replace each letter by the one immediately to the right of it in the grid, wrapping around if necessary. If they are in the same column, we replace each one by the one immediately below it, again wrapping around if necessary. If they are in neither the same row nor the same column, replace the first letter by the letter that lies in the same row as the first letter and same column as the second, and replace the second letter by the letter that lies in the same row as the second letter and same column as the first. Concatenate all the encrypted pairs to give the ciphertext. You are given a series of key-plaintext pairs. In each case output the corresponding ciphertext.


Simulation. I would say straightforward simulation, but I guess this problem isn’t exactly the most straightforward. Luckily, I wasn’t the one stuck coding this one :P

Problem D

On a web page of dimensions up to 109 by 109 pixels, up to 10000 disjoint axis-aligned squares are given, each of which has side length 10 pixels. Each square’s upper-left pixel has coordinates that are multiples of 10. Find the smallest set of pixels which both contains all the squares and is orthogonally convex, meaning that if two pixels in the same row belong to the set, then so do all the pixels between them in that row; likewise, if two pixels in the same column belong to the set, then so do all the pixels between them in that column. Output the vertices of this set in clockwise order starting from the topmost one (leftmost in case of a tie).


First, notice that we can replace each 10 by 10 square by its four vertices. For example, the square from (50, 60) to (59, 69) can be replaced by (50, 60), (50, 69), (59, 60), and (59, 69). The orthogonal convex hull of the 4n points thus obtained will be the same as that of the original set of squares, so we’ll just work with these 4n points instead.

To proceed, we basically walk around the hull in clockwise order. We start by identifying the topmost pixel (leftmost in case of a tie). This must be a vertex. To find the next vertex, we locate the topmost point which lies strictly to the right of it (in case of a tie, pick the rightmost). Then we find the topmost point which lies strictly to the right of that, and so on. Eventually we reach a point in the rightmost occupied column. Now we switch gears, and find the rightmost point that lies strictly below our current point (bottommost in case of a tie). Keep doing this until we get to the bottommost occupied column. Then switch gears again, repeatedly moving to the bottommost point that lies strictly to the left of the current point (leftmost in case of a tie). Finally, once we reach the leftmost occupied column, repeatedly move to the leftmost point that lies strictly above the current point. Eventually we reach the starting point, and we are done. Whenever we move from a point to another point that is in both a different row and different column, we have to insert an additional vertex. For example, in the first phase, when we are moving from a point to the topmost point strictly to the right, if the new point is strictly below the current point, then we insert a point which is in the same row as the new point but same column as the current point; the other three cases are analogous.

To efficiently move from one point to the next, all we have to do is precompute, for each of the (up to) 40000 points we have to work with, the topmost point to the right, the rightmost point below, the bottommost point to the left, and the leftmost point above. We can precompute the topmost point to the right of all points by sorting the points by x-coordinate and scanning from right to left, remembering the topmost point we’ve seen so far; the other three cases are analogous.

Problem E

In an n \times n grid (n \leq 25), some squares are occupied by obstacles. The leftmost and rightmost columns do not contain obstacles. There is initially a game piece in each cell of the leftmost column. You need to move them all to the rightmost column, using the minimum number of turns possible. In each turn, you can choose, for each piece, whether to move it or to leave it where it is; if you move a piece then you must move it into a vertically or horizontally adjacent cell that does not contain an obstacle. At the end of a turn, no two pieces can occupy the same square. What is the minimum number of turns necessary?


We didn’t solve this problem, but it seemed to us that we could do it with max flow. However, it’s somewhat unclear whether this approach would even fit in memory, let alone be fast enough.

First, we consider the problem of determining whether it is possible to solve the problem in k turns. We can reduce this to max flow with vertex constraints. There is one vertex for each (cell, turn) combination. For example, if k = 3, then there are four vertices for each cell: the cell at the beginning of the game, the cell after the first move, the cell after the second move, and the cell after the third move. Each vertex has a capacity of one. Each vertex has up to five outgoing edges (which we can take to have capacity one), each of which goes to some vertex corresponding to the next turn, and either the same cell, or a horizontally or vertically adjacent cell, as long as it does not contain an obstacle. There is an edge (which we can again take to have capacity one) from each vertex corresponding to the final state and a cell in the rightmost column to the sink. There is also an edge (which we can again take to have capacity one) from the source to each vertex corresponding to the initial state that is in the leftmost column. It is not hard to see that an augmenting path in this flow network corresponds to the movement of a game piece from the leftmost column in the initial state to the rightmost column in the final state; each edge that is not from the source or to the sink corresponds to the movement of the game piece from a particular cell in a particular turn to an adjacent cell in the next turn (or leaving it in the same cell); the vertex capacity constraint enforces not being able to have two pieces in the same cell in a given turn. If the max flow is n, then we have moved n pieces, that is, won the game.

Before we proceed, what is the complexity of this approach? Well, there are up to 625 cells, so there are about 625k internal vertices, each with up to about 5 outgoing edges, so 3125k edges. Furthermore, to enforce the vertex capacity constraints, we’ll have to split each vertex up into an incoming and outgoing vertex, with an edge between them, so now we’re up to 3750k edges. Because the graph has unit capacities and the max flow is at most 25, we can use DFS to find each augmenting path, and we have to go through 25 augmenting paths before we find the max flow. The complexity of DFS is E+V = 5000k so the total number of operations is about 125000k.

Now, how do we find the minimum k? Well, we didn’t get that far, but it seems that a reasonable approach would be as follows: first just set k = 0 and create the corresponding flow network. Then, as long as the max flow of n hasn’t been achieved yet because you can’t find a path in the residual network, add the next set of vertices and edges, corresponding to increasing k by allowing an extra move (that is, add n^2 more vertices and join the current final state vertices to them, making the vertices just added the new final state vertices; also add edges from the rightmost column cells in the new vertex set to the sink) and check whether we can now find an augmenting path; if not, increase k again, and so on. To efficiently determine whether an augmenting path exists in the new graph, all we have to do is run another DFS starting from reachable vertices in the previous final state cells, rather than doing a huge DFS from the source that will go through the entire graph again. This essentially means that every time we have to increase k once or more, the total time taken for all the DFSing is about the same as that taken to do a single DFS in the graph corresponding to the final k value before we find another augmenting path. That means the total number of operations is bounded by about 125000m, where m is the answer (minimum number of turns). I’m not quite sure how large m could be; it could certainly be as large as about n^2/2, in the case that the obstacles force you to take a winding path through the grid. Putting n = 25, we obtain a figure of about 39 million. Is this fast enough? I’m not sure.

Jacob argues that binary search on k would be faster. At some point I’ll probably code up both and see who’s right.

Problem F

In a directed acyclic graph of up to 200 vertices, we say an edge is redundant if removing the edge does not alter the connectivity of the graph (i.e., if a path existed from one vertex to another in the original graph, a path must also exist when the redundant edge is removed). Find all redundant edges.


A naive approach is to simply try deleting each edge in turn, and then running a DFS from each vertex to determine whether the connectivity has changed. Unfortunately, this approach takes O(EV(E+V)) time, which is O(V^5) in dense graphs, and is too slow for V = 200.

A faster approach is to run Warshall’s algorithm for computing the transitive closure of the graph. Whenever we find that there is a path from i to k and a path from k to j, if there is an edge directly from i to j, that edge must be redundant. This is only true because the graph is acyclic! Deleting that edge can’t alter the connectivity of i to k or k to j, because i < k < j in topological order, and thus the path from i to k can't use the edge from i to j, nor can the path from k to j. The running time of this approach is O(V^3).

Problem G

There is a horizontal line segment (the house) located above the origin on the Cartesian plane, and up to 10 decorations, each of which is also a line segment, and each of which is either reflective or non-reflective. There is a lamp located at the origin, which emits rays at all angles from -\theta/2 to \theta/2 to the vertical. Non-reflective decorations absorb all incident light, and reflective decorations act as mirrors that obey the law of reflection (and are reflective on both sides). It is guaranteed that a ray can only be reflected up to 100 times. What fraction of the house is actually illuminated?


I don’t think any teams solved this problem during the contest, so I have no idea how actually to do it. My suspicion is that it can be done using a brute force simulation approach, in which we simply sample the interval of angles and cast out a ray at each sampled angle, following its course until it either gets absorbed, reaches the house, or escapes to infinity. The fraction of the house illuminated only needs to be accurate to within one part in 104 (a percentage rounded to the nearest hundredth). I’m not sure how finely you’d have to sample in order to get the right answer, but of course, if nobody else solves the problem, then the rational strategy is to just keep resubmitting until you get it accepted. The simulation is pretty easy to do; in each iteration you just have to check whether the ray intersects any of the (up to) 11 line segments specified. If it’s the house or one of the non-reflective decorations, or if it doesn’t intersect anything, you’re done; if it’s a mirror, then you have to iterate again, but you only have to do up to 100 iterations for each angle, so that’s 1100 operations. Let’s say you decided to sample 5\times 10^4 angles; then you’d have to do a total of about 55 million operations. I suppose it’s plausible that this approach would work, but I can’t say for sure until I see the test data or official solutions.

The other, non-hackish approach would be to compute the illuminated fraction exactly by considering only interesting angles. An interesting angle is an initial angle at which the ray grazes the endpoint of a some line segment. It is easy to see that between any two consecutive interesting angles, either the ray does not illuminate the house at all, or the image of the ray on the house varies monotonically and continuously, so that whatever portion of the house lies between the image of the ray at the first angle and the image of the ray at the second angle is completely illuminated. Taking the union of all such intervals would then give the total length illuminated. To do the actual angle sweep, we would first sort a list of all angles at which the ray would graze the endpoint of some interval. We would then consider each sub-interval between a pair of consecutive interesting angles. If, in this interval, the ray hits a mirror, then we have to recursively subdivide the interval by considering all the angles at which the ray could hit the mirror, reflect, and then graze the endpoint of another interval. If in one of those intervals the ray hits a second mirror, then we would have to recursively subdivide again, and so on.

Two questions remain. First of all, how do we compute the initial angle required to hit some target point after reflection from a sequence of mirrors m_1, m_2, ..., m_k? The answer is pretty clever: we reflect the target point p across mirror m_k to get point p_k, and reflect p_k across mirror m_{k-1} to get point p_{k-1}, and so on, finally reflecting across m_1 to get point p_1. The ray from the origin to p_1 is then the same ray which would (potentially) reflect from all the mirrors in sequence and finally hit p. (I imagine anyone who plays pool is familiar with this trick.) The second question is what the worst case running time of this approach is. I have no idea. I don’t have any idea whether this was the intended approach, or the brute force one. When the test data come out, maybe we’ll see. If I’m not too lazy, that is.

Problem H

In a 3\times n grid of squares (n \leq 1000), each square contains an integer. We can place a domino across two squares and gain a number of points equal to the product of the numbers of the two squares. What is the maximum possible score, if we can place as many dominoes as we want but no two dominoes can overlap?


This problem can be solved using dynamic programming. We will define f(k, o_1, o_2, o_3) as the maximum score obtainable given that all squares covered by dominoes are left of column k, and o_1 specifies whether the top square in column k-1 is covered, likewise for o_2 and o_3 with the middle and bottom squares. The total number of states is then 2^3 n.

Computing the transitions is annoying. If we number the columns starting from zero, our base case is k \leq 0; here, no dominoes can be placed and the maximum score is 0. To compute f(k, o_1, o_2, o_3) for k > 0 , we consider all possible arrangements of dominoes in column k-1:

  • o_1 = o_2 = o_3 = 0: column k-1 is empty. Then f(k, 0, 0, 0) = \max_{o_1, o_2, o_3} f(k-1, o_1, o_2, o_3): we have license to cover as many or as few squares in column k-2 as we wish, and then we’ll simply not put anything in column k-1, and hence not gain any additional points.
  • o_1 = 1; o_2 = o_3 = 0: Only the top square in column k-1 is filled. In this case the domino covering that square must also occupy the top square in column k-2. The score we get is the value of this domino plus \max_{o_2, o_3} f(k-1, 0, o_2, o_3), where the 0 indicates that the top square must be initially uncovered in column k-2, but we don’t care whether the other two squares are.
  • o_1 = 0; o_2 = 1; o_3 = 0 or o_1 = 0; o_2 = 0; o_3 = 1: These are analogous to the previous case.
  • o_1 = o_2 = 1; o_3 = 0: There are two ways we can cover the top and middle squares in column k-1 while leaving the bottom one empty. Either we can place a domino across these two squares, giving a maximum total score equal to the value of that domino plus f(k, 0, 0, 0) (because once we remove this domino from the board we’ll have an empty column k-1). Or, we can place a domino on each that also covers the square to the left. The maximum score we could then obtain is the sum of the values of these two dominoes and \max(f(k-1, 0, 0, 0), f(k-1, 0, 0, 1)) (where we see that the bottom square in the previous column could be either covered or uncovered). The case o_1 = 0; o_2 = o_3 = 1 is analogous.
  • o_1 = 1; o_2 = 0; o_3 = 1: This is only obtainable by placing a domino on the top square that also covers the square to its left, and a domino on the bottom square that also covers the square to its left. The maximum score you can get is then the sum of the values of these two dominoes plus \max(f(k-1, 0, 0, 0), f(k-1, 0, 1, 0)), where we don’t care whether the middle square is covered in the previous column.
  • o_1 = o_2 = o_3 = 1: There are three ways to obtain this. We could lay a domino across the top and middle squares, and have the domino covering the bottom square also cover the square to its left. The score here is the value of the first domino plus f(k, 0, 0, 1). We could also lay a domino across the bottom and middle squares, and have the domino covering the top square also cover the square to its left. The score here is the value of the first domino plus f(k, 1, 0, 0). Finally, we could have three different dominoes in this column, each also covering the square to the left. The score here is the sum of the values of the three dominoes plus f(k-1, 0, 0, 0).

The final answer is then \max_{o_1, o_2, o_3} f(n, o_1, o_2, o_3), since we ultimately don’t care which squares are covered in the last column.

Problem I

Consider the regular language (a|ab|bb)*. Place all strings in this language in order of increasing length, where words of the same length are ordered lexicographically. Suppose there are n words per page in the dictionary (n \leq 30). What are the first and last words on page number m (m \leq 10^{18})?


The first word on the mth page has (one-based) rank n(m-1) + 1. The last word on the mth page has rank nm. If we can figure out how to compute the kth-ranked word in the dictionary, we just set k = n(m-1)+1, nm to get the desired words. Note that 30 \times 10^{18} is too large to fit in a 64-bit integer, so I suggest using Java’s native bignums.

This problem, like most problems of the type find the kth lexicographically smallest element, is solved by using the ordering criteria to narrow down the characteristics of the sought element from coarsest to finest. That is, first we will determine the length of the sought word, then we will determine the first letter of the word, and so on, because words are ranked first by length, then by first letter, and so on.

To determine the length of the word, we first ask ourselves whether it could be of length 1. To do this, we compute the number of words of length 1. If this is less than or equal to k, then we know that the kth word has length 1. If not, then we compute the number of words of length 2. If the sum of the number of words of length 1 and 2 is less than or equal to k, then we know that the kth word has length 2. And so on. To do this efficiently, we would first subtract the number of words of length 1 from k, then subtract the number of words of length 2, and so on, until we find l such that the value of k we have left over is less than or equal to the number of words of length l. Then the desired word is the kth word of length l (where, remember, we have subtracted out the word counts for lengths 1, 2, ..., l-1 already from k).

Next, we find the first character of the word, by computing the number of words of length l that begin with the letter a. If k is less than or equal to this, the word begins with a, otherwise, the word begins with b, and subtract the number of words beginning with a from k. Then move on to the second letter; compute the number of words of length l, which begin with the first letter we computed in the previous step, and compare this with k, and so on. In this way we successively determine one letter after another, until we have constructed the entire word.

(I take this opportunity to reiterate that this approach is very general, and is almost always the approach used to solve find the kth lexicographically smallest element problems, or generally any kind of problem in which a set of items is ranked according to a succession of criteria and we have to find an item in a particular position of the sequence.)

Finding the number of words of a given length or the number of words of a given length and a given prefix—the computation you need to do over and over again in the preceding procedure—is actually easy once you realize that a word is in the language if and only if it starts with an even number of bs. Thus, if the prefix contains at least one a, you have free rein with the rest of the letters and there are 2^r words, where r is the number of remaining letters. If there is no a in the prefix and an even number of bs, then there are 2^{r-1} + 2^{r-3} + ... words, where 2^{r-1} consist of a followed by whatever, 2^{r-3} correspond to bba followed by whatever, and so on. If there is no a in the prefix and an odd number of bs, then there are 2^{r-2} + 2^{r-4} + ... words, where 2^{r-2} correspond to ba followed by whatever, 2^{r-4} correspond to bbba followed by whatever, and so on.

In the contest, we solved problems B, C, D, F, H, and I, and ran out of time trying to code E. After solving B, C, D, F, H, and I (not in that order), we had about 50 minutes left (?), and figured that E would be quickest to code. However, E turned out to be difficult to debug, and the nature of the algorithm meant we couldn’t use prewritten max flow code from our printed materials. In the end, we couldn’t even get it to work on the sample data.

Posted in Uncategorized | 2 Comments