Note: These explanations are in the process of being adapted from my textbook.
I'm trying to make them each a "standalone" treatment of a concept, but there may still
be references to the narrative flow of the book that I have yet to remove.
This work is under development and has not yet been professionally edited.
If you catch a typo or error, or just have a suggestion, please submit a note here. Thanks!

# Constrained Optimization

One of the core problems of economics is constrained optimization: that is, maximizing a function subject to some constraint.

We previously saw that the function $$y = f(x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2$$ has an unconstrained maximum at the point $(2,4)$. But what if we wanted to find the highest point along the path $2x_1 + x_2 = 5$? In this case, we would be looking for the highest point along the green curve below. In this case, the maximum occurs at the point $(1,3)$:

In a problem like this, we call the function we’re trying to maximize (or minimize) the objective function, and we call the equation describing the set of points along the path under consideration the constraint. We can then write this problem formally as \begin{aligned} \max_{x_1,x_2}\ & f(x_1,x_2) & \text{Objective function}\\ \text{s.t. } & g(x_1,x_2) = k & \text{Constraint} \end{aligned} where in this case \begin{aligned} f(x_1,x_2) &= 8x_1 - 2x_1^2 + 8x_2 - x_2^2\\ g(x_1,x_2) &= 2x_1 + x_2\\ k &= 5 \end{aligned} For a simple problem like this, we could just substitute the constraint into the objective function and solve. However, with multivariable functions of more than two variables, that won’t work. Instead, we’ll take a slightly different approach, and employ the method of Lagrange multipliers. This method effectively converts an constrained maximization problem into a constrained optimization problem, by creating a new functions that combines the objective function and the constraint. Specifically, we construct the “Lagrangian” as follows: $$\mathcal{L}(x_1,x_2,\lambda) = f(x_1,x_2) + \lambda (k - g(x_1,x_2))$$ Note that this second term “punishes” the function for exceeding the constraint: if $g(x_1,x_2) > k$, this function is lower than it would be if $g(x_1,x_2) = k$. However, as long as $g(x_1,x_2) = k$, the two functions are identical.

The plot below illustrates how the Lagrangian affects the height of the function. It shows a horizontal plane at the maximum of the Lagrangian, and shows the gradient at the point (1,3). It starts with $\lambda = 0$; use the slider to increase $\lambda$, and notice what happens:

What’s going on here? It’s a little easier to see if we look at the gradient of the Lagrangian: $$\nabla \mathcal{L}(x_1,x_2,\lambda) = \left[\begin{matrix} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1}\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2}\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} \end{matrix}\right] = \left[\begin{matrix} {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} \\ \\ {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} \\ \\ k - g(x_1,x_2) \end{matrix}\right]$$ If we set this gradient equal to a vector of zeroes, we get the first-order conditions (FOCs): \begin{aligned} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1} = 0 & \Rightarrow & {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} = 0\\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2} = 0 & \Rightarrow & {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} = 0\\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} = 0 & \Rightarrow & k - g(x_1,x_2) = 0 \end{aligned} When the first two conditions hold, it means we’re at the maximum of the Lagrangian function in $x_1 - x_2$ space. When the third condition holds, it means we’re also along the constraint.

The FOCs are three equations in three unknowns; so to find the optimal $(x_1,x_2)$, we solve this system of equations. For example, if we plug the Lagrangian for this case into the FOCs above, we get \begin{aligned} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1} = 0 & \Rightarrow & 8 - 4x_1 - 2\lambda = 0\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2} = 0 & \Rightarrow & 8 - 2x_2 - \lambda = 0\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} = 0 & \Rightarrow & 2x_1 + x_2 = 5 \end{aligned} To solve this, we can solve the first two conditions for $\lambda$ and set those two values of $\lambda$ equal to each other: $$\lambda = 4 - 2x_1 = 8 - 2x_2 \Rightarrow x_2 = 2 + x_1$$ We can then plug that into the third condition to get $$2x_1 + [1 + x_1] = 5 \Rightarrow x_1^\star = 1$$ therefore $$x_2 = 5 - 2x_1 = 3$$ and $$\lambda = 4 - 2x_1 = 2$$ Put another way, if we have $\lambda = 2$, the Lagrangian becomes $$\mathcal{L}(x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2 + 2(5 - 2x_1 - x_2) = 10 + 4x_1 - 2x_1^2 +6x_2 - x_2^2$$ which has the gradient $$\nabla \mathcal{L}(x_1,x_2) = \left[\begin{matrix}4 - 4x_1 \\ 6 - 2x_2 \end{matrix}\right]$$ which is flat (a vector of zeroes) at $(1,3)$.

## The Lagrangian, Level Sets, and the Tangency Condition

If we look at this problem in two dimensions, we can notice that the optimum occurs at a point of tangency between the constraint and the level sets of the objective function. The following graph shows the constraint, as well as a few level sets of the objective function. The level set for $f(x_1,x_2) = 21$ is just tangent to the constraint at $(1,3)$:

In fact, we can see that this is another way of looking at the first two FOCs: if we solve each of the first FOCs for $\lambda$, as we did above, we get \begin{aligned} {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} = 0 & \Rightarrow \lambda = {\partial f(x_1,x_2)/\partial x_1 \over \partial g(x_1,x_2)/\partial x_1}\\ \\ {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} = 0 & \Rightarrow \lambda = {\partial f(x_1,x_2)/\partial x_2 \over \partial g(x_1,x_2)/\partial x_2} \end{aligned} Setting these equal to one another and cross multiplying gives us $${\partial f(x_1,x_2)/\partial x_1 \over \partial f(x_1,x_2)/\partial x_2} = {\partial g(x_1,x_2)/\partial x_1 \over \partial g(x_1,x_2)/\partial x_2}$$ The left-hand side of this equation is the slope of the level set $f(x_1,x_2) = y$ for some $y$. The right-hand side of this equation is the slope of the level set $g(x_1,x_2) = k$, which is just the constraint. If this equation holds for some value $(x_1,x_2)$, therefore, the functions’ level sets are tangent at that point.

Summing up: for a constrained optimization problem with two choice variables, the method of Lagrange multipliers finds the point along the constraint where the level set of the objective function is tangent to the constraint.

## Example: The Fence Problem

Consider the following problem: you are given 40 linear feet of fence and need to make a rectangular enclosure. What is the length and width that maximize the area of the enclosure?

Here, our objective function is the area $A(L,W) = LW$ and the constraint is that the perimeter be 40 feet: $2L + 2W \le 40$. More formally, we can write this problem as $$\max_{(L,W)} A(L,W) = LW$$ $$\text{s.t. } 2L + 2W \le 40$$ The following graph shows the constraint (the green line) as well as several levels sets of the objective function. If you drag the point along the constraint, you can see that the largest area occurs at a point where the level set is tangent to the constraint:

The Lagrangian for this problem is $$\mathcal{L}(L,W,\lambda) = LW + \lambda(40 - 2L - 2W)$$ To find the optimal choice of $L$ and $W$, we take the partial derivatives with respect to the three arguments ($L$, $W$, and $\lambda$) and set them equal to zero to get our three first order conditions (FOCs): \begin{aligned} \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial L} &= W - 2\lambda = 0 &\Rightarrow& W = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial W} &= L - 2\lambda = 0 &\Rightarrow& L = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial \lambda} &= 40 - 2L - 2W =0 &\Rightarrow& 2L + 2W = 40 \end{aligned} Solving the first two FOC’s for $\lambda$ and setting them equal to one another we get $W/2 = L/2$, or more simply $W = L$: that is, the optimal shape will be a square. Plugging $W = L$ into the constraint (i.e., the third FOC) and solving gives us $L = W = 10$, and $\lambda = W/2 = L/2 = 5$.

## Interpreting the Lagrange Multiplier

The Lagrange multiplier has an important intuitive meaning, beyond being a useful way to find a constrained optimum.

Let’s look at the Lagrangian for the fence problem again, but this time let’s assume that instead of 40 feet of fence, we have $F$ feet of fence. In this case the Lagrangian becomes $$\mathcal{L}(L,W,\lambda) = LW + \lambda(F - 2L - 2W)$$ and the first-order conditions become \begin{aligned} \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial L} &= W - 2\lambda = 0 &\Rightarrow& W = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial W} &= L - 2\lambda = 0 &\Rightarrow& L = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial \lambda} &= F - 2L - 2W =0 &\Rightarrow& 2L + 2W = F \end{aligned} Solving this gives us our solutions \begin{aligned} L^\star &= {F \over 4}\\ W^\star &= {F \over 4}\\ \lambda &= {F \over 8} \end{aligned} What happens, though, if we take our optimal values of $L^\star$ and $W^\star$ and plug them back into the objective function $A(L,W) = LW$? In that case we can find the maximum possible area for any length of fence $F$: $$A^\star(F) = L^\star(F) \times W^\star(F) = {F \over 4} \times {F \over 4} = {F \over 16}$$ (Intuitively, all this is saying is that for any amount of perimeter $F$, the best we can do is to create a square with side lengths $F/4$, which has an area of $(F/4)^2 = F^2/16$.)

With this interpretation, note that $\lambda$ represents the derivative of this $A^\star(F)$ function: $${dA^\star(F) \over dF} = {d \over dF}\left({F^2 \over 16}\right) = {F \over 8}$$ Visually, we can see this as the slope of $A^\star(F)$:

In other words, $\lambda$ tells us the amount by which the objective function rises due to a one-unit relaxation of the constraint. (We can also see that if we take the derivative of the Lagrangian with respect to $F$, we get $\lambda$.)

In economics, this value of $\lambda$ is often called a “shadow price.” For example, in consumer theory, we’ll use the Lagrange multiplier method to maximize utility given a constraint defined by the amount of money, $m$, you have to spend; the value of $\lambda$ in that problem will yield the additional utiltiy you’d get from getting another dollar to spend. Likewise, in producer theory, we’ll use the Lagrange method to solve for the cost-minimizing combination of labor and capital required to produce some amount of output, $q$; the value of $\lambda$ in that problem will be the marginal cost of producing an additional unit of output.

In practice, we can often solve constrained optimization problems without directly invoking a Lagrange multiplier. However, it’s important to understand the critical role this multiplier plays behind the scenes.

Copyright (c) Christopher Makler / econgraphs.org