
Econometrics is where economics meets mathematical statistics. It’s an area that many economics students struggle with, perhaps because of the mathematical notation that is used. Hence, there are a lot of old jokes, a little on the black side, about the topic: ‘Econometrics is economic-tricks’, ’Econometrics? Wasn’t he a member of Asterix’s band?’
According to Peter Kennedy, who has written one of the most readable introductions to the subject (see below), the term econometrics first came into prominence with the formation in the early 1930s of the Econometrics Society. So even back then we had what we now call ‘econometricians’.
But what do econometricians do? At its simplest, they draw lines through data points. They don’t do it with graph paper and a ruler, but with mathematical techniques that are implemented with computers.
Let’s look at an example of fitting a line through data points. Suppose we take 10 people off the street and ask each of them how much income they receive each week after tax. We also ask them how much they spend.
We can then plot the results for each person on a chart:
Expenditure by income for 10 people |
|
|
It’s not hard to see that there is some sort of relationship between income and spending (consumption). Generally, the higher the income a person receives, the higher is their consumption. This makes sense – we would expect income to have a strong impact on spending. To a large extent, spending is dependent on income. This is why we have plotted income on the x axis, since it is the independent variable, and spending on the y axis, since it is the dependent variable.
As the chart shows, we can draw a straight line through these points, choosing a line that seems to give the ‘best fit’ to the data. But in reality we can draw a large number of lines through these points. How then can we be sure that the line we finally choose is the one that gives the best fit?
This is where the maths comes in. We choose the line that minimises the sum of the squared errors. Let’s explain. Look at the second point from the left in the chart, which is for a person with an income of $187.50 per week. For this particular individual, their spending is higher than that shown by the line. Our line is therefore giving us an error, which is equal to the vertical distance between the point and the line. Note that we actually have an error for all points. We now take each error, square it, then sum all these squared errors. This is ‘the sum of squares’. We now choose the line that has the lowest value for its ‘sum of squares’.
You may recall from doing co-ordinate geometry in high school that the equation of a straight line is y = mx + c. In this equation m, the coefficient on x, is the slope of the line and c, a constant, is the y intercept.
What we want to find are the values of m and c for the line with lowest value for the sum of squares. There are mathematical techniques for doing this. They aren’t new techniques – the mathematician Gauss was doing something similar about 150 years ago – but it is a laborious exercise calculating, by hand, the values of m and c. What is relatively new is that computers, which are now on nearly every desktop, will calculate the values for us almost in an instant. For example, Excel will do it for us.
When we have values for m and c, we have a model. (The line shown in the chart above is actually the least squares line with the value of m being 0.8048 and the value of c being 90.04.) We can now use this model to predict behaviour. If we know the income of a particular person we can plug this value into our equation, as the x variable, and get a value for the y. We can expect this person’s expenditure to be roughly equal to this value.
This technique of putting lines through data points is called regression analysis. The approach outlined above can be extended so that the equation has more than one independent, or explanatory, variable. For example, we may find that a person’s weekly consumption depends not only on their income but also on the amount of money that they have in the bank. So now we have two variables, say x and z, on the right hand side of the equation, and we have to estimate the values of the coefficients for each of these, as well as the value of c, the constant term.
Regression analysis is used in many disciplines besides economics. But one of the distinguishing features of its use in econometrics is its application to analysing time series. In the example used above, we had 10 observations, with each of these being related to a person. In econometrics, the observations are usually for particular time periods. For example, our observations may be for each year, or for each quarter, or each month.
Suppose we have two time series, one for the after tax income of all individuals, which is an aggregate or macroeconomic variable, and one for total private consumption. And for each variable we have quarterly data, going back 20 years. This gives us 80 observations in all. Using these, we can derive an equation linking private consumption at each point in time with after tax income at that time.
There is a complicating factor here though. Consumption in one time period may reflect income in the preceding time period, since there is a lag between people receiving their income and spending it. Econometricians therefore examine the data carefully, looking for the lagged effects of one variable on another.
Econometric models (i.e. equations) are often used for forecasting. Suppose we can derive an equation that expresses a variable in terms of other variables, and that these other variables are easier to forecast than the original variable. We can now use the equation to forecast how the original variable will change. Econometric models are also used for other purposes, such as estimating how a change in price for a particular product will affect the demand for that product.
Recommended reading:
Kennedy, Peter (1998) A guide to econometrics, 4th edition, MIT Press.
Gujarati, Damodar N (1995) Basic econometrics, 3rd edition, McGraw Hill/Irwin.