If you keep up with this blog, you’re probably the type who knows partial derivatives inside and out. If I were to ask you about the partial derivative of with respect to , you would probably blurt out, “zero”, without skipping a beat. On the other hand, you might not have come across the *total derivative*.

The total derivative gives the rate of change of a variable in terms of another variable, without assuming that all other variables are held constant. Typically, to compute it, we use the chain rule to express dependencies in terms of other dependencies. Notationally, this is very simple and intuitive and probably covered in the first lecture of a course in thermodynamics^{1}:

Notice that a roman is used, which distinguishes the total derivative (at least, when printed) from both the partial derivative and the ordinary derivative, which uses an italic . (The latter is also used in the expression .)

If you are good with math, you can start using this immediately. For example, let’s say

Then,

The intuition behind the terminology is not too hard to grasp. The partial derivative only captures *some* of the information regarding the dependence of one variable upon another at a given point. The total derivative is built up from several partial derivatives in order to capture *all* of the information, right?

Okay, sure. You can think of it that way if you want, and, indeed, that will suffice for practically any application of partial and total differentiation. But the real meaning is more subtle than that, and that’s what this post is about.

Now, let’s start over. What’s the derivative of with respect to ?

Aha! Now I’ve primed you to be all clever here and inquire for clarification! The *partial* derivative, which is 0, or the *total* derivative, which is ? Say *that*, and you’ll be guaranteed to get the trick question right.

But wait. There is something here that should bother you. If really is a function of , doesn’t that render the partial derivative completely meaningless? It looks as though the partial derivative simply operates on the *notation*, whereas the total derivative gives you the *actual variation* between the quantities being studied. So perhaps a partial derivative is nothing but a notational convenience used to build up the total derivative, which is the expression you actually care about when calculating variation. And in the case that is actually independent of , the total derivative expression reduces to 0, which is still correct.

Actually, this is not true. The partial derivative has meaning. But only if you think in terms of *functions*. People who are not mathematicians don’t really deal with functions. They deal with relationships between variables, and use the machinery of functions for convenience. But mathematicians deal with functions. Indeed, the entire field of real analysis deals with functions from to .

Let’s review a basic fact about functions, which is not made clear outside of a post-secondary curriculum in pure mathematics. A function itself is nothing more than some set of ordered pairs of elements from the domain and codomain, subject to the restriction that no element in the domain is paired with more than one element from the codomain. Fundamentally, a function has absolutely nothing to do with variables like . Variables are simply convenient tools for writing down the definitions of specific functions, such as . However, notice that we can simply refer to this function as “the exponential function”, without mentioning any variables at all, which indicates that functions have an intrinsic, variable-free identity. Indeed, the vast majority of functions are non-computable and have no defining expression at all (we usually call those functions “pathological”, though). You can see this because the set of functions from to has cardinality whereas the set of all expressions that we can use to write down the definitions of functions is merely countable.

By the way, this is very much like how coordinates are not part of the identity of a vector space; a vector space is just a collection of things that satisfy the vector space axioms. We think of vectors in terms of coordinates, perhaps, because we most often deal with the vector spaces with the standard basis. But coordinates don’t emerge until you’ve *selected* a basis, and the definition of a vector space gives you no clues about how to select some unique, canonical basis. (This, by the way, is why higher physics uses such confusing objects as tensors, raising and lowering indices, and covariant derivatives—it needs to respect the observer’s arbitrary choice of a coordinate system.)

Now then, **partial derivatives are intrinsic properties of functions**. A function whose domain is has up to partial derivatives. We can notate these partial derivatives without using any variables at all. This is precisely what is done in the well-known textbook *Calculus on Manifolds* by Michael Spivak. There, the notations are used for the partial derivatives of the function with respect to the first, second, *etc.* elements of (the tuple type of) the domain. Spivak goes so far as to refer to and so on as “classical notation”, with the implication that they are beautiful and evocative but ultimately imprecise.

So let’s say for example we have a function . This function is the unique function with this domain and codomain such that a given element of the domain is associated with that element of the codomain that equals the product of the first element and the exponential of the second element of the former. (We would normally write this down, for convenience, as .) Then, this function has two partial derivatives, and . The partial derivative is the function in which a given element of the domain is associated with that element of the codomain that equals the exponential of the second element of the former. (In classical notation, .) The other partial derivative is identical to itself.

Indeed, then, you will see that the symbol has no meaning whatsoever on its own. It is not a true operator, despite what quantum mechanics might have you think.^{2} An operator is a higher-order function, whose domain itself contains functions (or other operators). can’t just be fed a function and spit out another function. What is applied to the exponential function? You can’t answer, because you’re expecting me to tell you which variable is used in the definition of the exponential function. But it’s all the same *function*, whichever variable I pick. On the other hand, the symbol is a true operator, whose domain is differentiable functions of at least one real. (For functions of exactly one real, is the ordinary single-variable derivative.) And is an operator whose domain is differentiable functions of at least two reals (since it picks out the second, and differentiates with respect to it.) And so on. The symbol *acquires* meaning when it is paired with an expression, such as . The interpretation is now that we have some function of at least two reals, that picks one of them to associate each tuple of the domain with that one’s exponential in the codomain, and we are taking the partial derivative with respect to some *other* element. For example, where , or where .

In light of this revised, correct view of functions, what are total derivatives? **Total derivatives are not intrinsic properties of functions**. Total derivatives do, in fact, operate on expressions, unlike partial derivatives, which operate on functions. At the time of writing, we have the following from the Wikipedia article on total derivatives:

The part where it says, “if `y` depends on `x`“, is crucial, because it shows that associating total derivatives with *functions* is self-contradictory. You simply cannot say that (which is implied by the notation ) and then introduce a *restriction* that prevents the first and second elements of the ordered pair of the domain from varying independently. Really, what you’re doing by introducing this dependency is creating a *new* function, , and taking the ordinary derivative of *that*, instead.

It is more correct to think of the total derivative, not as an operator, but as a notation that represents the relationship between rates of change of two variables, out in the real world where there are no functions, just a bunch of variables that may or may not depend on each other. The notation does not, then, represent something you do to the function . It represents the rate of change of the *variable* with respect to the *variable* .

Furthermore, the chain rule, in the form I gave in the beginning of this post, is actually an abuse of notation. The thing on the left hand side is a total derivative, which is an *expression*, and not a function. The right hand side, though, contains partial derivatives, so it looks as though the right hand side must come out to be a function. But wait, the right hand side contains total derivatives too! Arrgh! The problem is that the right hand side contains notation of the form . This does not make sense if we treat as a variable, but only if we treat as a function and as one of the bound (dummy) variables used in its definition. So this chain rule is an abuse of notation in that treats as a variable on the left hand side and as a function on the right hand side. It just happens to be an extremely useful one because it helps you to calculate the total derivative.

With that in mind, let’s revisit an old joke.

A polynomial and are walking down the street, when all of a sudden they notice a differential operator heading toward them. The polynomial panics, and says, “Uh oh, a differential operator. If I run into it too many times, I’ll disappear.” says, “That’s okay, I’ll go talk to it. It can’t do anything to me, because I’m .” So approaches the differential operator, and says, “Hi, I’m .” The differential operator replies, “Nice to meet you, . I’m .”

Some people might point out that will only be annihilated by the differential operator if is not a function of . The truth is that this joke is just not precisely stated enough to hold up to this analysis (and that you should just laugh at it without reading that much into it). You see, when you simply say “d”, it’s taken to mean the ordinary derivative , rather than a total derivative. But then, since an ordinary derivative is a partial derivative with respect to the sole real-valued variable, it must operate on *functions*, whereas in the joke the protagonist and the deuteragonist, the polynomial, are merely *expressions*. The correct (but much less funny) joke then reads:

A polynomial and are walking down the street, when all of a sudden they notice

a total derivativeheading toward them. The polynomial panics, and says, “Uh oh, aa total derivative. If I run into it too many times, I’ll disappear.” says, “That’s okay, I’ll go talk to it. It can’t do anything to me, because I’m .” So approaches thetotal derivative, and says, “Hi, I’m .” Thetotal derivativereplies, “Nice to meet you, . I’m .”gulps and wishes that depended upon .

Or, even more forced:

A polynomial

functionandthe functionare walking down the street, when all of a sudden they notice a differential operator heading toward them.The formerpanics, and says, “Uh oh, a differential operator. If I run into it too many times, I’ll disappear.”The lattersays, “That’s okay, I’ll go talk to it. It can’t do anything to me, because I’mthe exponential function. So approaches the differential operator, and says, “Hi, I’m .” The differential operator replies, “Nice to meet you, . I’m .”

Sorry for killing the joke, but I hope at least now you understand the subtleties involved in the partial and total derivatives.

^{1} I used the example of thermodynamics because it has the annoying property that the variables it works with, such as pressure, volume, and temperature, are not independent, and you always have to carefully pay attention to which variables are being held constant and which ones are allowed to vary. It was the only math-based course I’ve ever had trouble with.

^{2} Whether “quantum mechanics” refers to a field or the people who study that field is left deliberately ambiguous.

Thanks for taking the time to explain this in detail. I clicked the link because I was curious, but I stayed because it turned out to be interesting. :)

Many thanks!

So let’s suppose we have a function f(x,y,z) = x^2 + y^2 + z^2 and that y = 2x and z = sin x. The partial with respect to x would then be 2x. But if we substitute y = 2x we would then have a function g(x,z) = 5x^2 + z^2. Here the partial with respect to x would be 10x. For both functions f and g the total derivative with respect to x is 10x +2(sin x)(cos x), so the total derivative is the same in both cases, however, very obviously partial f with respect to x and partial g with respect to x are NOT the same thing. This has always bothered me, as it seems to imply that varying x a little but will cause your “answer” to depend on how you write something down on a piece of paper, which is nonsense. From your post I understand this is the case because f(x,y,z) and g(x,z) are NOT the same function, even if they evaluate to the same “answer” from an elementary point of view. The partial derivative then is “attached” to the function itself, and one should be very careful when making substitutions of variables where partial derivatives are important. On the other hand the total derivative really just evaluates how one variable changes with respect to others and doesn’t care about the choice of function used to describe the relationship between the variables, and thus is not “attached” to the function in the same way as the partial derivative is. Do I have this basically right? I am a chemist, not a mathematician, and the rule for partial derivatives has always bothered me because I did not understand how you could artificially hold one variable constant if it depended on the other (like y and z in the above example) and obtain anything meaningful. If what I say above is a more correct way of thinking about it, then I believe I can better understand what is happening in these situations, although from a using mathematics to describe physics standpoint it seems that choosing when to take a partial derivative and how it actually applies to “the real world” should be handled with a great amount of care. Or perhaps a better way of saying it is that the choice of which function to pick and apply to a physical phenomenon (e.g., f vs. g above) depends on more than just writing down a form that when evaluated at some points x, y, … gives you the “right” answer.

Made my day!