Appendix B / Optimization

B.3 Constrained Optimization and the Lagrange Method

One of the core problems of economics is constrained optimization: that is, maximizing a function subject to some constraint.

We previously saw that the function $y = f(x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2$ has an unconstrained maximum at the point $(2,4)$. But what if we wanted to find the highest point along the path $2x_1 + x_2 = 5$? In this case, we would be looking for the highest point along the green curve below. In this case, the maximum occurs at the point $(1,3)$:

[ See interactive graph online at https://www.econgraphs.org/graphs/math/optimization/multivariable_plot_constrained ]

In a problem like this, we call the function we’re trying to maximize (or minimize) the objective function, and we call the equation describing the set of points along the path under consideration the constraint. We can then write this problem formally as $\begin{aligned} \max_{x_1,x_2}\ & f(x_1,x_2) & \text{Objective function}\\ \text{s.t. } & g(x_1,x_2) = k & \text{Constraint} \end{aligned}$ where in this case $\begin{aligned} f(x_1,x_2) &= 8x_1 - 2x_1^2 + 8x_2 - x_2^2\\ g(x_1,x_2) &= 2x_1 + x_2\\ k &= 5 \end{aligned}$ For a simple problem like this, we could just substitute the constraint into the objective function and solve. However, with multivariable functions of more than two variables, that won’t work. Instead, we’ll take a slightly different approach, and employ the method of Lagrange multipliers. This method effectively converts a constrained maximization problem into an unconstrained optimization problem, by creating a new functions that combines the objective function and the constraint. Specifically, we construct the “Lagrangian” as follows: $\mathcal{L}(x_1,x_2,\lambda) = f(x_1,x_2) + \lambda (k - g(x_1,x_2))$ Note that this second term “punishes” the function for exceeding the constraint: if $g(x_1,x_2) > k$, this function is lower than it would be if $g(x_1,x_2) = k$. However, as long as $g(x_1,x_2) = k$, the two functions are identical.

The plot below illustrates how the Lagrangian affects the height of the function. It shows a horizontal plane at the maximum of the Lagrangian, and shows the gradient at the point (1,3). It starts with $\lambda = 0$; use the slider to increase $\lambda$, and notice what happens:

[ See interactive graph online at https://www.econgraphs.org/graphs/math/optimization/multivariable_plot_constrained_lagrange ]

What’s going on here? It’s a little easier to see if we look at the gradient of the Lagrangian: $\nabla \mathcal{L}(x_1,x_2,\lambda) = \left[\begin{matrix} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1}\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2}\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} \end{matrix}\right] = \left[\begin{matrix} {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} \\ \\ {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} \\ \\ k - g(x_1,x_2) \end{matrix}\right]$ If we set this gradient equal to a vector of zeroes, we get the first-order conditions (FOCs): $\begin{aligned} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1} = 0 & \Rightarrow & {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} = 0\\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2} = 0 & \Rightarrow & {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} = 0\\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} = 0 & \Rightarrow & k - g(x_1,x_2) = 0 \end{aligned}$ When the first two conditions hold, it means we’re at the maximum of the Lagrangian function in $x_1 - x_2$ space. When the third condition holds, it means we’re also along the constraint.

The FOCs are three equations in three unknowns; so to find the optimal $(x_1,x_2)$, we solve this system of equations. For example, if we plug the Lagrangian for this case into the FOCs above, we get $\begin{aligned} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1} = 0 & \Rightarrow & 8 - 4x_1 - 2\lambda = 0\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2} = 0 & \Rightarrow & 8 - 2x_2 - \lambda = 0\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} = 0 & \Rightarrow & 2x_1 + x_2 = 5 \end{aligned}$ To solve this, we can solve the first two conditions for $\lambda$ and set those two values of $\lambda$ equal to each other: $\lambda = 4 - 2x_1 = 8 - 2x_2 \Rightarrow x_2 = 2 + x_1$ We can then plug that into the third condition to get $2x_1 + [1 + x_1] = 5 \Rightarrow x_1^\star = 1$ therefore $x_2 = 5 - 2x_1 = 3$ and $\lambda = 4 - 2x_1 = 2$ Put another way, if we have $\lambda = 2$, the Lagrangian becomes $\mathcal{L}(x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2 + 2(5 - 2x_1 - x_2) = 10 + 4x_1 - 2x_1^2 +6x_2 - x_2^2$ which has the gradient $\nabla \mathcal{L}(x_1,x_2) = \left[\begin{matrix}4 - 4x_1 \\ 6 - 2x_2 \end{matrix}\right]$ which is flat (a vector of zeroes) at $(1,3)$.

The Lagrangian, Level Sets, and the Tangency Condition

If we look at this problem in two dimensions, we can notice that the optimum occurs at a point of tangency between the constraint and the level sets of the objective function. The following graph shows the constraint, as well as a few level sets of the objective function. The level set for $f(x_1,x_2) = 21$ is just tangent to the constraint at $(1,3)$:

[ See interactive graph online at https://www.econgraphs.org/graphs/math/optimization/constrained_optimization_level_sets ]

In fact, we can see that this is another way of looking at the first two FOCs: if we solve each of the first FOCs for $\lambda$, as we did above, we get $\begin{aligned} {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} = 0 & \Rightarrow \lambda = {\partial f(x_1,x_2)/\partial x_1 \over \partial g(x_1,x_2)/\partial x_1}\\ \\ {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} = 0 & \Rightarrow \lambda = {\partial f(x_1,x_2)/\partial x_2 \over \partial g(x_1,x_2)/\partial x_2} \end{aligned}$ Setting these equal to one another and cross multiplying gives us ${\partial f(x_1,x_2)/\partial x_1 \over \partial f(x_1,x_2)/\partial x_2} = {\partial g(x_1,x_2)/\partial x_1 \over \partial g(x_1,x_2)/\partial x_2}$ The left-hand side of this equation is the slope of the level set $f(x_1,x_2) = y$ for some $y$. The right-hand side of this equation is the slope of the level set $g(x_1,x_2) = k$, which is just the constraint. If this equation holds for some value $(x_1,x_2)$, therefore, the functions’ level sets are tangent at that point.

Summing up: for a constrained optimization problem with two choice variables, the method of Lagrange multipliers finds the point along the constraint where the level set of the objective function is tangent to the constraint.

Example: The Fence Problem

Consider the following problem: you are given 40 linear feet of fence and need to make a rectangular enclosure. What is the length and width that maximize the area of the enclosure?

[ See interactive graph online at https://www.econgraphs.org/graphs/math/optimization/fence_problem ]

Here, our objective function is the area $A(L,W) = LW$ and the constraint is that the perimeter be 40 feet: $2L + 2W \le 40$. More formally, we can write this problem as $\max_{(L,W)} A(L,W) = LW$ $\text{s.t. } 2L + 2W \le 40$ The following graph shows the constraint (the green line) as well as several levels sets of the objective function. If you drag the point along the constraint, you can see that the largest area occurs at a point where the level set is tangent to the constraint:

[ See interactive graph online at https://www.econgraphs.org/graphs/math/optimization/fence_problem_level_sets ]

The Lagrangian for this problem is $\mathcal{L}(L,W,\lambda) = LW + \lambda(40 - 2L - 2W)$ To find the optimal choice of $L$ and $W$, we take the partial derivatives with respect to the three arguments ($L$, $W$, and $\lambda$) and set them equal to zero to get our three first order conditions (FOCs): $\begin{aligned} \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial L} &= W - 2\lambda = 0 &\Rightarrow& W = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial W} &= L - 2\lambda = 0 &\Rightarrow& L = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial \lambda} &= 40 - 2L - 2W =0 &\Rightarrow& 2L + 2W = 40 \end{aligned}$ Solving the first two FOC’s for $\lambda$ and setting them equal to one another we get $W/2 = L/2$, or more simply $W = L$: that is, the optimal shape will be a square. Plugging $W = L$ into the constraint (i.e., the third FOC) and solving gives us $L = W = 10$, and $\lambda = W/2 = L/2 = 5$.

Previous: Unconstrained Optimization

Next: Interpreting the Lagrange Multiplier