Regression Loss Functions All Machine Learners Should Know
January 16, 2020
Tense of English Verbs in Different Parts of A Scientific Article
January 18, 2020
Show all

Derivatives: definitions, notation, and rules

http://www.columbia.edu/itc/sipa/math/calc_rules_func_var.html#chain

A derivative is a function which measures the slope.  It depends upon x in some way, and is found by differentiating a function of the form y = f (x).   When x is substituted into the derivative, the result is the slope of the original function y = f (x).

There are many different ways to indicate the operation of differentiation, also known as finding or taking the derivative.  The choice of notation depends on the type of function being evaluated and upon personal preference.

Suppose you have a general function: y = f(x).  All of the following notations can be read as “the derivative of y with respect to x” or less formally, “the derivative of the function.” 

     f'(x)          f’          y’          df/dx          dy/dx          d/dx [f(x)].

[HINT: don’t read the last three terms as fractions, read them as an operation.

For example, read:   ”               dx/dy                                       =              3x”

As:                          “the function that gives the slope           is equal to      3x”

Let’s try some examples.  Suppose we have the function :  y = 4x3 + x2  + 3.

After applying the rules of differentiationwe end up with the following result:

                 dy/dx = 12x2 + 2x.

How do we interpret this?  First, decide what part of the original function (y = 4x3 + x2  + 3) you are interested in.  For example, suppose you would like to know the slope of y when the variable x takes on a value of 2.  Substitute x = 2 into the function of the slope and solve:

                dy/dx = 12 ( 2 )2 + 2 ( 2 ) = 48 + 4 = 52.

Therefore, we have found that when x = 2, the function y has a slope of  + 52.

Now for the practical part.  How do we actually determine the function of the slope?  Almost all functions you will see in economics can be differentiated using a fairly short list of rules or formulas, which will be presented in the next several sections.

How to apply the rules of differentiation

Once you understand that differentiation is the process of finding the function of the slope, the actual application of the rules is straightforward.

First, some overall strategy. The rules are applied to each term within a function separately.  Then the results from the differentiation of each term are added together, being careful to preserve signs. [For example, the sum of 3x and negative 2x2 is 3x minus 2x2.].

Don’t forget that a term such as “x” has a coefficient of positive one.  Coefficients and signs must be correctly carried through all operations, especially in differentiation.

The rules of differentiation are cumulative, in the sense that the more parts a function has, the more rules that have to be applied.  Let’s start here with some specific examples, and then the general rules will be presented in table form.

Take the simple function:  y = C, and let C be a constant, such as 15.  The derivative of any constant term is 0, according to our first rule.  This makes sense since slope is defined as the change in the y variable for a given change in the x variable.  Suppose x goes from 10 to 11; y is still equal to 15 in this function, and does not change, therefore the slope is 0.  Note that this function graphs as a horizontal line.

Now, add another term to form the linear function y = 2x + 15.  The next rule states that when the x is to the power of one, the slope is the coefficient on that x.  This continues to make sense, since a change in x is multiplied by 2 to determine the resulting change in y.  We add this to the derivative of the constant, which is 0 by our previous rule, and the slope of the total function is 2.

Now, suppose that the variable is carried to some higher power.  We can then form a typical nonlinear function such as y = 5x3 + 10.   The power rule combined with the coefficient rule is used as follows: pull out the coefficient, multiply it by the power of x, then multiply that term by x, carried to the power of n – 1.  Therefore, the derivative of 5x3 is equal to (5)(3)(x)(3 – 1); simplify to get 15x2.  Add to the derivative of the constant which is 0, and the total derivative is 15x2

Note that we don’t yet know the slope, but rather the formula for the slope.  For a given x, such as x = 1, we can calculate the slope as 15.  In plainer terms, when x  is equal to 1, the function ( y = 5x3 + 10) has a slope of 15.

These rules cover all polynomials, and now we add a few rules to deal with other types of nonlinear functions.   It is not as obvious why the application of the rest of the rules still results in finding a function for the slope, and in a regular calculus class you would prove this to yourself repeatedly.  Here, we want to focus on the economic application of calculus, so we’ll take Newton’s word for it that the rules work, memorize a few, and get on with the economics!  The most important step for the remainder of the rules is to properly identify the form, or how the terms are combined, and then the application of the rule is straightforward. 

For functions that are sums or differences of terms, we can formalize the strategy above as follows:

If y = f(x) + g(x), then dy/dx = f'(x) + g'(x).  Here’s a chance to practice reading the symbols.  Read this rule as: if y is equal to the sum of two terms or functions, both of which depend upon x, then the function of the slope is equal to the sum of the derivatives of the two terms.  If the total function is f minus g, then the derivative is the derivative of the f term minus the derivative of the g term.

The product rule is applied to functions that are the product of two terms, which both depend on x, for example, y = (x – 3)(2x– 1).  The most straightforward approach would be to multiply out the two terms, then take the derivative of the resulting polynomial according to the above rules.  Or you have the option of applying the following rule.

Given y = f(x) g(x); dy/dx = f’g + g’f.  Read this as follows: the derivative of y with respect to x is the derivative of the f term multiplied by the g term, plus the derivative of the g term multiplied by the f term.  To apply it to the above problem, note that f(x) = (x – 3) and g(x) = (2x2 – 1); f'(x) = 1 and g'(x) = 4x.  Then dy/dx = (1)(2x2 – 1) + (4x)(x – 3).  Simplify, and dy/dx = 2x2 – 1 + 4x2  – 12x, or 6x2 – 12x – 1.

The quotient rule is similarly applied to functions where the f and g terms are a quotient.  Suppose you have the function y = (x + 3)/ (- x2).  Then follow this rule:

Given y = f(x)/g(x),  dy/dx = (f’g – g’f) / g2.   Again, identify f= (x + 3) and g = -x2 ; f'(x) = 1 and g'(x) = – 2; and g2 = x4.  Then substitute in: dy/dx = [(1)(- x2) – (- 2)(x + 3)] / x4 . Simplify to dy/dx  = (-x2 + 2x + 6)/ x4 .

Now, let’s combine rules by type of function and their corresponding graphs.

Type of functionForm of functionGraphRuleInterpretation
y = constanty = CHorizontal linedy/dx  = 0Slope = 0;
y = linear functiony = ax + bStraight linedy/dx = aSlope = coefficient on x
y = polynomial of order 2 or highery = axn + bNonlinear, one or more turning pointsdy/dx = anxn-1Derivative is a function, actual slope depends upon location (ie value of x)
y = sums or differences of 2 functionsy = f(x) + g(x)Nonlineardy/dx = f'(x) + g'(x). Take derivative of each term separately, then combine.
y = product of two functions,y = [ f(x) g(x) ]Typically nonlineardy/dx = f’g + g’f. Start by identifying
f, g, f’, g’
y = quotient or ratio of two functionsy = f ( x) / g ( x)Typically nonlineardy/dx = (f’g – g’f) / g2.  Start by identifying
f, g, f’, g’, and g2

Not-so-basic rules of differentiation

There are two more rules that you are likely to encounter in your economics studies.  The hardest part of these rules is identifying to which parts of the functions the rules apply.  Actually applying the rule is a simple matter of substituting in and multiplying through.  Notice that the two rules of this section build upon the rules from the previous section, and provide you with ways to deal with increasingly complicated functions, while still using the same techniques.

The power function rule:

In the previous rules, we dealt with powers attached to a single variable, such as  x2 , or x5.  Suppose, however, that  your equation carries more than just the single variable x to a power.  For example,

                 y = (2x + 3)4

In this case, the entire term (2x + 3) is being raised to the fourth power.  To deal with cases like this, first  identify and rename the inner term in the parenthesis:  2x + 3 = g(x).  Then the problem becomes

Now, note that your goal is still to take the derivative of y with respect to x.  However, x is being operated on by two functions; first by g (multiplies x by 2 and adds to 3), and then that  result is carried to the power of four.  Therefore, when we take the derivatives, we have to account for both operations on x.  First, use the power rule from the table above to get:

                 .

Note that the rule was applied to g(x) as a whole.  Then take the derivative of g(x) = 2x + 3, using the appropriate rule from the table:

                 .

Note the change in notation.  “g” is used because we were finding the change in g, with respect to a change in x.  Now, both parts are multiplied to get the final result:

Recall that derivatives are defined as being a function of x.  Replace the g(x) in the above term with (2x + 3) in order to satisfy that requirement.  Then simplify by combining the coefficients 4 and 2, and changing the power (4-1) to 3:

Now, we can set up the general rule.  When a function takes the following form:

Then the rule for taking the derivative is:

The chain rule:

The second rule in this section is actually just a generalization of the above power rule.  It is used when x is operated on more than once, but it isn’t limited only to cases involving powers.  Since you already understand the above problem, let’s redo it using the chain rule, so you can focus on the technique.

Given the same problem:

rename the parts of the problem as follows:

and

Then the entire problem can be expressed as:

This type of function is also known as a composite function.  The derivative of a composite function is equal to the derivative of y with respect to u, times the derivative of u with respect to x:

specifically in our problem:

Recall that a derivative is defined as a function of x, not u.  Substitute in 2x + 3 for u:

and the problem is complete.  The formal chain rule is as follows.  When a function takes the following form:

Then the derivative of y with respect to x is defined as:

Updated table of derivatives

Let’s add these two rules to our table of derivatives from the previous section:

Type of functionForm of functionGraphRuleInterpretation
y = constanty = CHorizontal linedy/dx  = 0Slope = 0;
y = linear functiony = ax + bStraight linedy/dx = aSlope = coefficient on x
y = polynomial of order 2 or highery = axn + bNonlinear, one or more turning pointsdy/dx = anxn-1Derivative is a function, actual slope depends upon location (i.e. value of x)
y = sums or differences of 2 functionsy = f(x) + g(x)Nonlineardy/dx = f'(x) + g'(x). Take derivative of each term separately, then combine.
y = product of two functionsy = [ f(x) g(x) ]Typically nonlineardy/dx = f’g + g’f. Start by identifying
f, g, f’, g’
y = quotient or ratio of two functionsy = f ( x) / g ( x)Typically nonlineardy/dx = (f’g – g’f) / g2.  Start by identifying
f, g, f’, g’, and g2
y=generalized power functionNonlinearIdentify g(x)
y=composite function/chain ruleNonlineary is a function of u, and u is a function of x.

Special cases

There are two special cases of derivative rules that apply to functions that are used frequently in economic analysis.  You may want to review the sections on natural logarithmic functions and graphs and exponential functions and graphs before starting this section.

Natural logarithmic functions

When a function takes the logarithmic form:

Then the derivative of the function follows the rule:

If the function y is a natural log of a function of y, then you use the log rule and the chain rule.  For example, If the function is:

Then we apply the chain rule, first by identifying the parts:

Now, take the derivative of each part:

And finally, multiply  according to the rule.

Now, replace the u with 5x2, and simplify

Note that the generalized natural log rule is a special case of the chain rule:

Then the derivative of y with respect to x is defined as:

Exponential functions

Taking the derivative of an exponential function is also a special case of the chain rule.  First, let’s start with a simple exponent and its derivative.  When a function takes the logarithmic form:

Then the derivative of the function follows the rule:

                 .

No, it’s not a misprint!  The derivative of ex is ex .   

If the power of e is a function of x, not just the variable x, then use the chain rule:

Then the derivative of y with respect to x is defined as:

For example, suppose you are taking the derivative of the following function:

Define the parts y and u, and take their respective derivatives:

Then the derivative of y with respect to x is:

Updated table of derivatives

Now we can add these two special cases to our table:

Type of functionForm of functionGraphRuleInterpretation
y = constanty = CHorizontal linedy/dx  = 0Slope = 0;
y = linear functiony = ax + bStraight linedy/dx = aSlope = coefficient on x
y = polynomial of order 2 or highery = axn + bNonlinear, one or more turning pointsdy/dx = anxn-1Derivative is a function, actual slope depends upon location (i.e. value of x)
y = sums or differences of 2 functionsy = f(x) + g(x)Nonlineardy/dx = f'(x) + g'(x). Take derivative of each term separately, then combine.
y = product of two functions,y = [ f(x) g(x) ]Typically nonlineardy/dx = f’g + g’f. Start by identifying
f, g, f’, g’
y = quotient or ratio of two functionsy = f ( x) / g ( x)Typically nonlineardy/dx = (f’g – g’f) / g2.  Start by identifying
f, g, f’, g’, and g2
y=generalized power functionNonlinearidentify g(x)
y=composite function/chain ruleNonlineary is a function of u, and u is a function of x.
y=natural log functionNatural logSpecial case of chain rule
y=exponential functionExponentialSpecial case of chain rule

Higher order derivatives

Just as a first derivative gives the slope or rate of change of a function, a higher order derivative gives the rate of change of the previous derivative.  We’ll tak more about how this fits into economic analysis in a future section, [link: economic interpretation of higher order derivatives] but for now, we’ll just define the technique and then describe the behavior with a few simple examples.

To find a higher order derivative, simply reapply the rules of differentiation to the previous derivative.  For example, suppose you have the following function:

According to our rules, we can find the formula for the slope by taking the first derivative:

Take the second derivative by applying the rules again, this time to y’, NOT y:

If we need a third derivative, we differentiate the second derivative, and so on for each successive derivative.

Note that the notation for second derivative is created by adding a second prime.  Other notations are also based on the corresponding first derivative form.  Here are some examples of the most common notations for derivatives and higher order derivatives.

FunctionFirst derivativeSecond derivativeThird derivative
 
 
 

Now for some examples of what a higher order derivative actually is.  Let’s start with a nonlinear function and take a first and second derivative.  Recall from previous sections that this equation will graph as a parabola that opens downward [link: graphing binomial functions].

FunctionFirst derivativeSecond derivative

In order to understand the meaning of derivatives, let’s pick a couple of values of x, and calculate the value of the derivatives at those points.

Value of xValue of function at xfirst derivative at xsecond derivative at x
x=0
x=1
x=2

So, how do we interpret this information?  When x equals 0, we know that the slope of the function, or rate of change in y for a given change in x (from the first derivative) is 6.  Similarly, the second derivative tells us that the rate of change of the first derivative for a given change in x is -2.  In other words, when x changes, we expect the slope to change by -2, or to decrease by 2.  We can check this by changing x from 0 to 1, and noting that the slope did change from 6 to 4, therefore decreasing by 2. 

To sum up, the first derivative gives us the slope, and the second derivative gives the change in the slope.  In economics, the first two derivatives will be the most useful, so we’ll stop there for now.   

Added variables, same techniques

In the real world, it is very difficult to explain behavior as a function of only one variable, and economics is no different.  More specific economic interpretations will be discussed in the next section, but for now, we’ll just concentrate on developing the techniques we’ll be using.

First, to define the functions themselves.  We want to describe behavior where a variable is dependent on two or more variables.  Every rule and notation described from now on is the same for two variables, three variables, four variables, and so on, so we’ll use the simplest case; a function of two independent variables.  Conventionally, z is the dependent variable (like y in univariate functions) and x and y are the independent variables (like x in univariate functions):

For example, suppose that the following function describes some behavior:

Differentiating this function still means the same thing–still we are looking for functions that give us the slope, but now we have more than one variable, and more than one slope.

Visualize this by recalling from graphing what a function with two independent variables looks like.  Whereas a 2-dimensional picture can represent a univariate function, our z function above can be represented as a 3-dimensional shape.  Think of the x and y variables as being measured along the sides of a chessboard.  Then every combination of x and y would map onto a square somewhere on the chessboard.  For example, suppose x=1 and y=1.  Start at one of the corners of the chessboard.  Then move one square in on the x side for x=1, and one square up into the board to represent y=1.  Now, calculate the value of z.

The function z takes on a value of 4, which we graph as a height of 4 over the square that represents x=1 and y=1.  Map out the entire function this way, and the result will be a shape, usually looking like a mountain peak in typical economic analysis problems.

Now back to slope.  Imagine standing on the mountain shape, facing parallel to the x side of the chessboard.  If you allow x to increase, while holding y constant, then you would move forward in a straight line along the mountain shape.  We define the slope in this direction as the change in the z variable, or a change in the height of the shape, in response to a movement along the chessboard in one direction, or a change in the variable x, holding y constant.

Formally, the definition is:  the partial derivative of z with respect to x is the change in z for a given change in x, holding y constant.  Notation, like before, can vary.  Here are some common choices:

Now go back to the mountain shape, turn 90 degrees, and do the same experiment.  Now, we define a second slope as the change in the height of the z function in response to a movement forward on the chessboard (perpendicular to the movement measured by the first slope calculation), or a change in the y variable, holding the x variable constant.  Typical notation for this operation would be

Therefore, calculus of multivariate functions begins by taking partial derivatives, in other words, finding a separate formula for each of the slopes associated with changes in one of the independent variables, one at a time.  Before we discuss economic applications, let’s review the rules of partial differentiation.   

Basic rules of partial differentiation

The rules of partial differentiation follow exactly the same logic as univariate differentiation.  The only difference is that we have to decide how to treat the other variable.  Recall that in the previous section, slope was defined as a change in z for a given change in x or y, holding the other variable constant.  There’s our clue as to how to treat the other variable.  If we hold it constant, that means that no matter what we call it or what variable name it has, we treat it as a constant.  Suppose, for example, we have the following equation:

If we are taking the partial derivative of z with respect to x, then y is treated as a constant.  Since it is multiplied by 2 and x and is constant, it is also defined as a coefficient of x.  Therefore,

Therefore, once all other variables are held constant, then the partial derivative rules for dealing with coefficients, simple powers of variables, constants, and sums/differences of functions remain the same, and are used to determine the function of the slope for each independent variable.  Let’s use the function from the previous section to illustrate.

First, differentiate with respect to x, holding y constant:

Note that there were no y variables in the first term, so differentiation was exactly like the univariate process; in the last term there were no x variables, therefore the derivative is zero, according to the constant rule, since y is treated as a constant.

Now, take the partial derivative with respect to y, holding x constant:

Again, note that the first term had no “variables” in it, since x is being treated as a constant, therefore the derivative of that term is 0.

To make sure you have a clear picture of more than one slope in a function, let’s evaluate the two partial derivatives at the point on the function where x = 1 and y = 2:

How do we interpret this information?  First, note that when x = 1 and y = 2, then the function z takes on a value of 3.  At this point on our “mountain’ or 3 dimensional shape, we can evaluate the change in the function z in 2 different directions.  First, the change in z with respect to x is 10.  In other words, the slope in a direction parallel to the x-axis is 10.  Now turn 90 degrees.  The slope in a direction perpendicular to our previous slope is 6, therefore not quite as steep.  Also, note that although each slope depends on the change in only one variable, the position or fixed value of the other variable does matter; since you need both x and y to actually calculate the numerical values of slope.  We’ll come back to this in the next section, and look at the economic meaning behind this relatedness.  But first, back to the rules. 

The product and quotient of functions rules follow exactly the same logic: hold all variables constant except for the one that is changing in order to determine the slope of the function with respect to that variable.  To illustrate the product rule, first let’s redefine the rule, using partial differentiation notation:

Now use the product rule to determine the partial derivatives of the following function:

To illustrate the quotient rule, first redefine the rule using partial differentiation notation:

Use the new quotient rule to take the partial derivatives of the following function:

Not-so-basic rules of partial differentiation

Just as in the previous univariate section, we have two specialized rules that we now can apply to our multivariate case. 

First, the generalized power function rule.  Again, we need to adjust the notation, and then the rule can be applied in exactly the same manner as before. 

When a multivariate function takes the following form:

Then the rule for taking the derivative is:

Use the power rule on the following function to find the two partial derivatives:

The composite function chain rule notation can also be adjusted for the multivariate case:

Then the partial derivatives of z with respect to its two independent variables are defined as:

Let’s do the same example as above, this time using the composite function notation where functions within the z function are renamed.  Note that either rule could be used for this problem, so when is it necessary to go to the trouble of presenting the more formal composite function notation?  As problems become more complicated, renaming parts of a composite function is a better way to keep track of all parts of the problem.  It is slightly more time consuming, but mistakes within the problem are less likely.

The final step is the same, replace u with function g:

Special cases in multivariate functions

The last two special cases in multivariate differentiation also follow the same logic as their univariate counterparts.

The rule for differentiating multivariate natural logarithmic functions, with appropriate notation changes is as follows:

Then the partial derivatives of z with respect to its independent variables are defined as:

Let’s do an example.  Find the partial derivatives of the following function:

The rule for taking partials of exponential functions can be written as:

Then the partial derivatives of z with respect to its independent variables are defined as:

One last time, we look for partial derivatives of the following function using the exponential rule:

Higher order partial and cross partial derivatives

The story becomes more complicated when we take higher order derivatives of multivariate functions.  The interpretation of the first derivative remains the same, but there are now two second order derivatives to consider.

First, there is the direct second-order derivative.  In this case, the multivariate function is differentiated once, with respect to an independent variable, holding all other variables constant.  Then the result is differentiated a second time, again with respect to the same independent variable.  In a function such as the following:

There are 2 direct second-order partial derivatives, as indicated by the following examples of notation:

These second derivatives can be interpreted as the rates of change of the two slopes of the function z.

Now the story gets a little more complicated.  The cross-partials, fxy and fyx  are defined in the following way.  First, take the partial derivative of z with respect to x.  Then take the derivative again, but this time, take it with respect to y, and hold the x constant.  Spatially, think of the cross partial as a measure of how the slope (change in z with respect to x) changes, when the y variable changes.  The following are examples of notation for cross-partials:

We’ll discuss economic meaning further in the next section, but for now, we’ll just show an example, and note that in a function where the cross-partials are continuous, they will be identical.  For the following function:

Take the first and second partial derivatives.

Now, starting with the first partials, find the cross partial derivatives:

Note that the cross partials are indeed identical, a fact that will be very useful to us in future optimization sections.

Amir Masoud Sefidian
Amir Masoud Sefidian
Data Scientist, Researcher, Software Developer

Comments are closed.