Lecture 3: The Marginal Rate of Substitution and the Implicit Function Theorem

Click here for the quiz on this reading.

In the last lecture we introduced the notion of preferences, and showed that we could represent preferences over two goods using indifference curves. We then showed how we could use a multivariate function to represent preferences, and that the indifference curves were level sets of that multivariate function: that is, a set of combinations of $(x_1,x_2)$ that correspond to the same utility $U$.

In today’s lecture we are going to focus on what happens to utility as you change bundles. We will focus on two kinds of slopes in particular:

Marginal utility: we will look at the change in utility you get by adding more of a good. Mathematically, this is the partial derivative of the utility function; visually, it is the slope of the “hill” that represents the height of the utility function.
Marginal rate of substitution: we will look at the rate at which you are willing to trade one good for another. Visually, this is the slope of the indifference curve at a point. Mathematically, it is the ratio of the marginal utilities.

This lecture, therefore, will consist of four parts; we’ll then apply this to the economics.

Derivatives of univariate functions

A critical feature of any function is how the output changes with the changes to its inputs.

For a function of one variable $f(x)$, the derivative at some value $x$ may be written as $f^\prime(x)$, or equivalently $\frac{df}{dx}$, where: ${df \over dx} = \lim_{\Delta x \rightarrow 0} {f(x + \Delta x) - f(x) \over \Delta x}$ The reason I like to use this notation is that the $df$ refers to the vertical distance (rise) measured in the numerator, and $dx$ represents the horizontal distance (run) in the denominator.

Visually, this means a line connecting $(x, f(x))$ and $(x + \Delta x, f(x + \Delta x))$ converges to a line tangent to the function at $(x, f(x))$ as $\Delta x \rightarrow 0$. The following diagram illustrates this for a few functions. Use the slider to bring $\Delta x$ to zero; you can also change the value of $x$ to see how the derivative changes (or doesn’t) as $x$ changes.

See interactive graph online here.

Calculating derivatives

There are a few important rules for calculating derivatives, and examples of specific derivatives, that you should memorize:

Constant rule: if $f(x) = c$, then $f^\prime(x) = 0$.
Power rule: if $f(x) = x^n$, then $f^\prime(x) = nx^{n-1}$.
Natural log rule: if $f(x) = \ln(x)$, then $f^\prime(x) = 1/x$.
Addition rule: if $f(x) = g(x) + h(x)$, then $f^\prime(x) = g^\prime(x) +h^\prime(x)$.
Multiplication rule: if $f(x) = g(x)h(x)$, then $f^\prime(x) = g(x)h^\prime(x) + g^\prime(x)h(x)$.
Chain rule: if $h(x) = f(g(x))$, then $h^\prime(x) = f^\prime(g(x)) \times g^\prime(x)$.

These six rules together form many other familiar rules. For example:

Linear functions: If $f(x) = cx$, then $f^\prime(x) = c$. This follows from the constant rule, the power rule, and the multiplication rule.
Polynomial functions: If $f(x) = c_0 + c_1x + c_2x^2 +c_3x^3 + \cdots$, where $c_0$, $c_1$, $c_2$, $c_3$, etc. are constants, then $f^\prime(x) = c_1 + 2c_2x + 3c_3x^2 + \cdots$. This follows from the constant rule, the power rule, the multiplication rule, and the addition rule.
Square roots: If $f(x) = \sqrt{x}$, then $f^\prime(x) = {1 \over 2\sqrt{x}}$. This is a straightforward application of the power rule for $n = \frac{1}{2}$.

Local linearization

Note that our definition of the derivative essentially said that if $y = f(x)$, then ${df \over dx} \approx {\Delta y \over \Delta x}$ where $\Delta y = f(x + \Delta x) - f(x)$. In other words, it’s what the “rise over run” approaches as the run gets infinitesimally small.

If we multiply both sides of that by $\Delta x$, though, we can say that $\Delta y \approx \Delta x \times {df \over dx}$ In other words, one way of using the definition of the derivative is to say that locally, the function is changing at a rate of $dy/dx$; so if you move $\Delta x$ units to the right, you move up by approximately $\Delta x \times dy/dx$. This is called “local linearization.”

We use this in economics all the time. For example, suppose we have a cost function $c(q) = 10q + q^2$ The associated marginal cost is just the derivative of that: $MC(q) = 10 + 2q$ Therefore, we might say that the marginal cost around $q = 30$ units is $10 + 2 \times 30 = €70$. Therefore, we could use local linearization to say that the cost of increasing by 3 units, from 30 to 33, would be approximately $\Delta c \approx 3 \times MC(30) = 3 \times 70 = 210$ Now, if we look at the exact cost of 30 units and 33 units, we can see that it’s not quite that: $\begin{aligned} c(30) &= 10 \times 30 + 30^2 = 1,200\\ c(33) &= 10 \times 33 + 33^2 = 1,419 \end{aligned}$ That is, the actual cost increase is €1419-€1200 = €219, not €210. But it’s pretty close; and the smaller $\Delta q$ is, the closer it will be.

Partial derivatives of multivariate functions

A partial derivative of a multivariable function is defined in much the same way as the derivative of a univariate function. For a function of two variables (say, $x$ and $y$) we can proceed in the same way as above, comparing the value of the function at $f(x,y)$ as we change the values of $x$ and $y$ by small amounts. The building blocks of our analysis are the partial derivatives of the function, which measure how the output of the function changes when one variable is increased while the other(s) are held constant.

In the case of a function of two variables, $x$ and $y$, we can define the partial derivative “with respect to $x$” as $\partial f/\partial x$, where ${\partial f \over \partial x} = \lim_{\Delta x \rightarrow 0} {f(x + \Delta x, y) - f(x, y) \over \Delta x}$ Visually, this has the same interpretation as above, except now the two points are points along the surface plot of the function $f(x,y)$, as $\Delta x \rightarrow 0$ the line is tangent to the surface, not just a curve:

See interactive graph online here.

The partial derivative with respect to $y$ is defined similarly: holding $x$ constant, it measures the rate at which $f(x,y)$ changes when $y$ increases by a $\Delta y$. In the limit as $\Delta y \rightarrow 0$, it may be represented as a line tangent to the surface plot of the function, pointing in the $y$ direction.

See interactive graph online here.

One helpful way of thinking about partial derivatives is as the derivative of the function implied by holding the other variables of a multivariable function constant. (In economics, this often means imposing a “ceteris paribus” assumption that all other variables are held constant.)

To see what this means visually in this case, we can plot the (two-dimensional) function $f(x | y = \overline y)$. For illustrative purposes, we can see side-by-side what this looks like in three dimensions and two dimensions:

See interactive graph online here.

Indeed, if you look carefully, you can see that the two-dimensional graph is exactly the same as the graph of the intersection of the surface with the plane at $y = \overline y$.

Calculating partial derivatives

When taking the partial derivative of a multivariate function, we’re varying one input while holding the others constant. Therefore, we just follow all the same rules as for univariate calculus, but just treat all variables except the one of interest as if they were constants.

For example, with the univariate function $f(x) = 12x^\frac{1}{2}$, by the exponent rule we have ${df \over dx} = \tfrac{1}{2}\times 12x^{\frac{1}{2} - 1} = 6x^{-\frac{1}{2}}$ For the multivariable function $f(x,y) = 4x^\frac{1}{2}y$, the $y$ is treated as a constant when taking the partial derivative: ${\partial f \over \partial x} = \tfrac{1}{2}\times 4x^{\frac{1}{2} - 1}y = 2x^{-\frac{1}{2}}y$ It’s easy to see that when $y = 3$, these two expressions are identical.

Partial derivatives of utility functions: “Marginal Utility”

Let’s now apply this to the multivariate functions we’re interested in: utility functions.

Recall that the utility function $u(x_1,x_2)$ assigns a “utility amount,” in some imaginary unit “utils,” to the bundle $(x_1,x_2)$. We will call the partial derivatives of this utility function the marginal utility of good 1 or good 2, denoted $MU_1$ and $MU_2$: $\begin{aligned} MU_1(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_1} = \lim_{\Delta x_1 \rightarrow 0}\frac{u(x_1 + \Delta x_1,x_2) - u(x_1,x_2)}{\Delta x_1}\\ MU_2(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_2} = \lim_{\Delta x_2 \rightarrow 0}\frac{u(x_1,x_2 + \Delta x_2) - u(x_1,x_2)}{\Delta x_2}\\ \end{aligned}$ Intuitively, what we’re saying here is that the marginal utility of good 1, $MU_1$, is the rate at which your utility increases per additional unit of good 1; likewise, the marginal utility of good 2, $MU_2$, is the rate at which your utility increases per additional unit of good 2.

Note that both of these are rates: $MU_1$ is measured in “utils per unit of good 1,” and $MU_2$ is measured in “utils per unit of good 2.” This is going to be very important later on in this lecture!

Of course, how much an additional unit of a good increases your utility depends on how much of each good you have. For example, consider the utility function $u(x_1,x_2) = \sqrt{x_1x_2}$, which we looked at in the last lecture. The marginal utilities of this are $\begin{aligned} MU_1(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_1} = \frac{1}{2}\sqrt{\frac{x_2}{x_1}}\\ MU_2(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_2} = \frac{1}{2}\sqrt{\frac{x_1}{x_2}}\\ \end{aligned}$ So with this utility function, your marginal utility depends on the amount of each good you’re starting out with. If you have lots of good 1 and not very much good 2, you’d get a lot of marginal utility from some more good 2 but not so much from more good 1; on the other hand, if you have lots of good 2 and not much good 1, the opposite would be true.

Note that we can also use local linearization here as well. For example, suppose you currently have 40 units of good 1 and 10 units of good 2, as we looked at in lecture 2. Then we’d have $\begin{aligned} MU_1(x_1,x_2) &= \frac{1}{2}\sqrt{\frac{10}{40}} = \frac{1}{4}\\ MU_2(x_1,x_2) &= \frac{1}{2}\sqrt{\frac{40}{10}} = 1\\ \end{aligned}$ So, for example, we might say that getting an additional $\Delta x_1 = 2$ units of good 1 would increase your utility by about $\Delta x_1 \times MU_1 = 2 \times \frac{1}{4} = 0.5$ utils, while getting an additional $\Delta x_2 = 2$ units of good 2 would increase your utility by about 2 utils.

Implicit differentiation and the implicit function theorem

We saw before that the level set $f(x,y) = z$ defines a “contour” that can be plotted in $x-y$ space. Let’s derive the slope of such a contour, $dy/dx$. This will turn out to have important economic implications in a wide range of applications.

To derive this, we’ll need to start with the chain rule of calculus.

The chain rule

The most important rule of derivatives for this course is the chain rule; and it’s especially important to understand the intuition behind what it means, in addition to the mechanics of how to use it.

The mathematical formulation of the chain rule is this: if $h(x) = f(g(x))$, then $\frac{dh}{dx} = \frac{df}{dg} \times \frac{dg}{dx}$ For example, suppose we have the function $h(x) = (3x + 2)^2$. We can rewrite this as $f(g(x))$ if $f(g) = g^2$ $g(x) = 3x + 2$ Therefore $\frac{df}{dg} = 2g$ $\frac{dg}{dx} = 3$ so we have $\frac{dh}{dx} = \frac{df}{dg} \times \frac{dg}{dx} = 2(3x+2) \times 3 = 18x + 12$ Note that if we had expanded $h(x)$ into $9x^2 + 12x + 4$, we would have gotten this same result by taking the simple derivative.

The chain rule for multivariable functions

This same principle of multiplied rates is also true if the functions in question are multivariate. For example, the function $h(x,y) = (3x + y)^2$ may be decomposed into $f(g) = g^2$ $g(x,y) = 3x + y$ so the partial derivatives of $h$ with respect to $x$ and $y$ would be $\begin{aligned} {\partial h \over \partial x} &= {df \over dg} \times {\partial g \over \partial x} = 2(3x + y)\times 3 = 18x + 6y\\ {\partial h \over \partial y} &= {df \over dg} \times {\partial g \over \partial y} = 2(3x + y)\times 1 = 6x + 2y \end{aligned}$ Note that if we had just expanded the expression $(3x+y)^2$, we would have gotten exactly the same thing: $\begin{aligned} h(x,y) = (3x + y)^2 &= 9x^2 + 6xy + y^2\\ {\partial h(x,y) \over \partial x} &= 18x + 6y\\ {\partial h(x,y) \over \partial y} &= 6x + 2y \end{aligned}$

For more on this, Harvey Mudd College has a great explanation of how the multivariable chain rule works, which is well worth checking out.

The total derivative along a path

One application of this rule is the analysis of how altitude varies along a path over the surface of a multivariable function. For example, let’s consider how the height of the function $f(x,y)=12x^\frac{1}{2}y$ changes along the path defined by $\textcolor{#31a354}{y(x) = 4 - 0.4x}$. We can draw that as a green line over the surface of a function, as shown below.

Think about moving by some amount $\Delta x$ along the path, from some point $\textcolor{#3182bd}{(x,y)}$ to a second point $\textcolor{#d62728}{(x + \Delta x, y + \Delta y)}$. We can decompose this overall change into two changes:

holding $y$ constant, from $\textcolor{#3182bd}{(x,y)}$ to $\textcolor{#e6550d}{(x+\Delta x, y)}$
holding $x$ constant, from $\textcolor{#e6550d}{(x+\Delta x, y)}$ to $\textcolor{#d62728}{(x + \Delta x, y + \Delta y)}$

See interactive graph online here.

If we think about this in terms of partial derivatives, we can approximate the change due to $\Delta x$ as the change in $f(x,y)$ per unit change in $x$ (i.e., the partial derivative with respect to $x$), multiplied by $\Delta x$: $\left.\Delta f(x,y)\right|_{\Delta x} \approx {\partial f \over \partial x} \times \Delta x$ Likewise, the change due to $\Delta y$ as $\left.\Delta f(x,y)\right|_{\Delta y} \approx {\partial f \over \partial y} \times \Delta y$ Therefore the total change is the sum of these two changes: $\Delta f(x,y) \approx {\partial f \over \partial x} \times \Delta x + {\partial f \over \partial y} \times \Delta y$ If we divide both sides by $\Delta x$, this becomes ${\Delta f(x,y) \over \Delta x} \approx {\partial f \over \partial x} + {\partial f \over \partial y} \times {\Delta y \over \Delta x}$ As $\Delta x \rightarrow 0$ in the limit, $\Delta y/\Delta x$ approaches the derivative of $y$ with respect to $x$, giving us $\left.{\partial f \over \partial x}\right|_{y = y(x)} = {\partial f \over \partial x} + {\partial f \over \partial y} \times {dy \over dx}$ This is really just the chain rule: if $y$ changes when $x$ changes, then the total change in $f$ when $x$ changes is the direct effect due to the change in $x$, plus the indirect effect of the change in $y$.

The slope along a level set

The above analysis holds for any path along the surface. We can look in particular at the same analysis for a level set, which is implicitly defined by the equation $f(x,y) = z$ where $z$ is some constant.

Taking the derivative of both sides of this equation with respect to $x$ gives us ${\partial f \over \partial x} + {\partial f \over \partial y} \times {dy \over dx} = 0$ The left-hand side of the equation comes from the analysis above; the right-hand side is zero because $z$ is a constant. Intuitively, along a level set, we know that the total change is zero: however much $f(x,y)$ increases as a result of $\Delta x$, it decreases by the same amount as a result of $\Delta y$:

See interactive graph online here.

As $\Delta x \rightarrow 0$, the blue line becomes a line tangent to the function at the point $(x, y, z)$. This allows us to solve for the slope along a level set at a point, by solving for $dy/dx$: $\left.{dy \over dx}\right|_{f(x,y) = z} = - {\partial f/\partial x \over \partial f/\partial y}$ We can see if we plot level sets and contour maps, that we can use this formula to define the slope of the level set passing through any point $(x, y, f(x,y))$:

See interactive graph online here.

Importantly, note that every point $(x,y)$ defines a level set, and therefore the slope of a level set. No matter where you drag the blue point in this diagram, it defines a purple curve, and there is a line tangent to that curve at that point. Put another way, we can think of the level set itself and the slope of the level set as functions of the point $(x,y)$: $\begin{aligned} \text{Level set through }(\hat x, \hat y) &= \{(x,y) | f(x,y) = f(\hat x, \hat y)\}\\ \text{Slope of that level set at }(\hat x, \hat y) &= \left.{dy \over dx}\right|_{f(x,y) = f(\hat x, \hat y)} = -{\partial f(\hat x, \hat y)/\partial x \over \partial f(\hat x, \hat y)/\partial y} \end{aligned}$ These expressions will be incredibly important as we evaluate choices economic agents make, so be sure you are fluent in their applications.

The Marginal Rate of Substitution (MRS)

OK, this is the moment this whole week has been building up to: the marginal rate of substitution, or MRS. There is perhaps no more important concept in this course, because it gets at the heart of consumer theory: how people make tradeoffs.

We’ve just seen how to calculate the slope of a level set of any multivariate function. The slope of the indifference curve has a special economic meaning: it’s the rate at which a person is just willing to exchange good 2 for good 1 — what we call the marginal rate of substitution, or MRS.

For example, suppose “good 1” is apples and “good 2” is bananas, and further suppose the bundles $X = (10,24)$ and $Y = (12,20)$ lie along the same indifference curve for someone. This means that if they currently have bundle $X$, and someone offered them apples in exchange for 4 of their bananas, they would be just willing to accept the offer. In other words, their MRS between goods 1 and 2 at this point is approximately 2 bananas per apple.

See interactive graph online here.

Note that in this textbook, unlike the IMVH or Varian, I’ll be treating the MRS as a positive number. Economists do it both ways; I’ll explain in lecture why I believe this is important, and you’ll see in the next few weeks why I make this choice.

Calculating the MRS

Mathematically, we could just use the implicit function theorem here. We know that the slope of the level $f(x,y) = z$ is given by the ratio of the partial derivatives $\text{Slope of level set of }f(x,y) = -{\partial f(x,y)/\partial x \over \partial f(x,y)/\partial y}$ so in this case, we know that the slope of the indifference curve must be the ratio of the partial derivatives, or the marginal utilities: $\text{Slope of indifference curve} = -{MU_1 \over MU_2}$ Imagine you’re at some point $A$ on an indifference curve, and exchange some goods to arrive at some other point $C$ on the same indifference curve. Specifically, let’s say you give up $\Delta x_2$ units of good 2 in exchange for $\Delta x_1$ units of good 1. This movement is shown in the following two graphs:

See interactive graph online here.

Losing the $\Delta x_2$ units of good 2 decreases your utility by $\textcolor{#d62728}{\text{Utility loss from A to B }= \Delta x_2 \times MU_2}$ However, gaining the $\Delta x_1$ units of good 1 increases your utility by $\textcolor{#31a354}{\text{Utility gain from B to C }= \Delta x_1 \times MU_1}$ Note, however, that you end up with the same amount of utility (since $C$ is on the same indifference curve as $A$). Therefore, you know that the utility loss from giving up the good 2 must exactly equal the utility gain from the additional units of good 1: $\Delta x_2 \times MU_2 = \Delta x_1 \times MU_1$ or, cross multiplying, ${\Delta x_2 \over \Delta x_1} = {MU_1 \over MU_2}$ As $A$ and $B$ get closer and closer together, this approaches the instantaneous rate of change along the indifference curve, or $MRS = {MU_1 \over MU_2}$

The importance of units, and the unimportance of utils

We’ll conclude today’s lecture by noting the importance of units in our analysis of the MRS.

Remember that a slope is rise over run; therefore, its units are measured in the units of the rise over the units of the run.

In the graph of apples versus bananas above, the units of the rise are bananas, and the units of the run are apples; so the MRS is measured in terms of bananas per apple. Indeed, this makes sense: as we saw in that example, the consumer was just willing to give up 4 bananas to get 2 apples, so their MRS was 2 bananas per apple. If we’d had bananas on the horizontal axis and apples on the vertical axis, the MRS at the point $(24,10)$ would have been 2 apples for 4 bananas, or half an apple per banana. In short, the “rate” in the “marginal rate of substitution” is measured in units of good 2 per unit of good 1.

Indeed, this comes out of the formula for the MRS itself. We have $MRS = {MU_1 \over MU_2}$ But $MU_1$ and $MU_2$ are themselves rates: $MU_1$ is measured in terms of “utils per units of good 1”, and $MU_2$ is measured in terms of utils per units of good 2. So if we add the units to the expression for the MRS, we can see that the utils cancel out, and we’re left with units of good 2 per unit of good 1: $MRS = \frac{MU_1\ \cancel{\text{utils}}\text{/unit of good 1}}{MU_2\ \cancel{\text{utils}}\text{/unit of good 2}} = \frac{MU_1}{MU_2} \frac{\text{ units of good 2}}{\text{ units of good 1}}$ We can also notice something else more important. If we have two different ways to describe the same preferences, they might have very different marginal utilities; but they will have exactly the same MRS. For example, we saw before that the utility function $u(x_1,x_2) = \sqrt{x_1x_2}$ had the marginal utilities $\begin{aligned} MU_1(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_1} = \frac{1}{2}\sqrt{\frac{x_2}{x_1}}\\ MU_2(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_2} = \frac{1}{2}\sqrt{\frac{x_1}{x_2}}\\ \end{aligned}$ Therefore its MRS is given by $MRS = {MU_1 \over MU_2} = {\frac{1}{2}\sqrt{\frac{x_2}{x_1}} \over \frac{1}{2}\sqrt{\frac{x_1}{x_2}}} = {x_2 \over x_1}$ Now suppose we raise that utility function to the fourth power; that’s a monotonic transformation (as long as we’re not dealing with negative values, which we aren’t), and gives us the utility function $u(x_1,x_2) = x_1^2x_2^2$. The marginal utilities of this function are $\begin{aligned} MU_1(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_1} = 2x_1x_2^2\\ MU_2(x_1,x_2) &= \frac{\partial u(x_1,x_2)}{\partial x_2} = 2x_1^2x_2\\ \end{aligned}$ This is wildly different than the previous example, in all sorts of ways! But look what happens when we take the MRS: $MRS = {MU_1 \over MU_2} = {2x_1x_2^2 \over 2x_1^2x_2} = {x_2 \over x_1}$ It’s exactly the same.

In other words: we concluded the last lecture by saying that any two utility functions which had the same level sets described the same preferences. In this lecture, we conclude by saying that any two such functions will also necessarily have the same MRS at any given point. Thus, our notion of “marginal utility” may be suspect, because it is measured in utils per units of a good; but our notion of the MRS is sound.

Summary and next steps

In this lecture we have

Reviewed the definition of a derivative
Defined the partial derivatives of a multivariate function
Defined the marginal utilities for a utility function, which are just its partial derivatives
Reviewed the chain rule and seen how it applies to multivariate functions
Derived the total derivative along a path, and used it to derive the implicit function theorem
Defined the slope of an indifference curve as the marginal rate of substitution, and used the implicit function theorem to calculate the MRS from a utility function.

We’ve defined a lot of mathematical and economic terms thus far, but we haven’t applied them to many real-life scenarios. On Monday we’ll conclude this module by talking about how to model different kinds of real-world preferences using different functional forms; and in particular, look at the role that the MRS and marginal utilities play in describing how people feel about different kinds of goods.

The homework for module 1 will be posted soon; you should be able to do the mechanics of most of it over the weekend, and I’d encourage you to do so. Then on Monday we can put some economic intuition around all the math you’ve been doing.

Reading Quiz

That's it for today! Click here to take the quiz on this reading.