Linear Regression in Time Series: Sources of Spurious Regression

A simulation of the results presented in the article by Granger and Newbold (1974)

Author

Jumbong Junior

Published

April 29, 2025

Introduction

Motivation

It is common when analyzing the relationship between a dependent time series and several independent time series, to use the regression model. In their well know paper, Granger and Newbold (1974) found several articles in the literature, presenting regression models with apparently high goodness of fit, measured by the coefficient of determination, \(R^2\), but with very low durbin-watson statistics.

What is particularly surprising is that almost all econometric textbook warns about the danger of autocorrelated errors, yet this issue persist in many published papers. Granger and Newbold (1974) identified several examples. For instance, they found published equations with \(R^2 = 0.997\) and the the Durbin-Watson statistic (d) equal to 0.53. The most extreme, the found is an equation with \(R^2 = 0.999\) and \(d = 0.093\).

These clear examples of what is called spurious regression, where the results look statistically impressive but are in fact misleading, falsely suggesting a strong relationship between the variables when no such relationship exists.

This honestly made me laugh because, during my internships, I saw many colleagues using regression models for time series data, evaluating performance purely based on the \(R^2\), especially when it was high (close to 1), along with metrics like the Mean Squared Error (MSE) or the Mean Absolute Error (MAE), without taking into account the autocorrelation of the residuals.

When I came across this article, I realized how common this issue is. That is why I wanted to show you how crucial to always check the autocorrelation of the residuals when performing a regression analysis with time series data.

Contribution

Our post provides :

A detailed explanation of the results from Granger and Newbold (1974).
A Python simulation replicating the key results presented in their article.

Objectives

The classic regression model tests assume independent data. However, in the case of time series, observations are not independent — they are autocorrelated, and sometimes we even observe what’s called serial correlation, which is the correlation between successive values of the series. For example, today’s GDP is strongly correlated with the GDP of the previous quarter. This autocorrelation of the data can lead to correlation in the errors, which influence the results of the regression analysis.

There are three main consequences of autocorrelated errors in regression analysis:

Estimation of the coefficients of the model is inefficient.
Forecasts based on the regression equation are sub-optimal.
The hypothesis tests of the coefficients are invalid.

The first two points are well documented in the literature. However, Granger and Newbold (1974) focused specifically on the third point, showing that it’s possible to obtain very high \(R^2\) values, even though the models have no real economic meaning — what we call spurious regressions.

To support their analysis, the authors first present some key results from time series analysis. Then, they explain how nonsense regressions can occur, and finally, they run simulations to demonstrate that if the variables follow a random walk (or close to it), and if you include variables in the regression that shouldn’t be there, then it becomes the rule rather than the exception to obtain misleading results

To walk you through this paper, the next section will introduce the random walk and the ARIMA(0,1,1) process. In section 3, we will explain how Granger and Newbold (1974) describe the emergence of nonsense regressions, with examples illustrated in section 4. Finally, we’ll show how to avoid spurious regressions when working with time series data.

Random Walk and ARIMA(0,1,1) Process

Random Walk

Let \(X_t\) be a time series. We say that \(X_t\) follows a random walk if its representation is given by:

\[ X_t = X_{t-1} + \epsilon_t \tag{1}\]

where \(\epsilon_t\) is a white noise. It can be writen as a sum of white noise :

\[ X_t = X_0 + \sum_{i=1}^{t} \epsilon_i \tag{2}\]

where \(X_0\) is the initial value of the series.

The random walk is a non-stationary time series. In fact, if we take the Variance of \(X_t\), we have:

\[ V(X_t) = t\sigma^2 \tag{3}\]

which is increasing with time.

This model have been found to represent well certain price series, particularly in speculative markets.

For many others series,

\[ X_t - X_{t-1} = \epsilon_t - \theta \epsilon_{t-1} \tag{4}\]

has been found to provide a good representation.

Those non-stationary series are often employed as bench-marks against which the forecasting performance of other models is judged.

ARIMA(0,1,1) Process

The ARIMA(0,1,1) process is given by:

\[ X_t = X_{t-1} + \epsilon_t - \theta \epsilon_{t-1} \tag{5}\]

where \(\epsilon_t\) is a white noise. The ARIMA(0,1,1) process is non-stationary. It can be written as a sum of independent random walk and white noise :

\[ X_t = X_0 + random walk + white noise \tag{6}\]

In the next section, we will see how Nonsense Regression can arise when we regress a dependent series on independent series that follow a random walk.

Random walk can lead to Nonsense Regression

First, let’s recall the linear regression model. The linear regression model is given by:

\[ Y = X\beta + \epsilon \tag{7}\]

where \(Y\) is a T x 1 vector of the dependent variable, \(\beta\) is a K x 1 vector of the coefficients, \(X\) is a T x K matrix of the independent containing a column of ones and (K-1) columns and T observations on each of (K-1) independent variables which are stochastic, but distributed independently of the \(T\times1\) vector of the errors \(\epsilon\). It is generally assumed that : \[ E(\epsilon) = 0 \tag{8}\]

and \[ E(\epsilon\epsilon') = \sigma^2I \tag{9}\]

where \(I\) is the identity matrix.

A test of contribution of independent variable of the explanation of the dependent variable is the F-test. The null hypothesis of the test is given by:

\[ H_0: \beta_1 = \beta_2 = \ldots = \beta_{K-1} = 0 \tag{10}\]

and the statistic of the test is given by:

\[ F = \frac{(R^2/(K-1))}{(1-R^2)/(T-K)} \tag{11}\]

where \(R^2\) is the coefficient of determination.

If we want to construct the statistic of the test, let’s assume that the null hypothesis is true, and one try to fit a regression of the form (Equation 7) to the levels of economic time series. Suppose next that these series are not stationary, or highly autocorrelated. In such situation, the test procedure is invalid since F in(Equation 11) is not distributed as an F-distribution under the null hypothesis (Equation 10). In fact,under the null hypothesis, the errors or residuals from (Equation 7) is given by:

\(\epsilon_t = Y_t - X\beta_0\) ; t = 1, 2, …, T,

will have the same autocorrelation structure as the original series Y.

Some idea of the distribution problem can arise in the situation when :

\[ Y_t = \beta_0 + X_t\beta_1 + \epsilon_t \]

where \(Y_t\) and \(X_t\) follow the independent first-order autoregressive process:

\(Y_t = \rho Y_{t-1} + \eta_t\), and \(X_t = \rho^* X_{t-1} + \nu_t\)

where \(\eta_t\) and \(\nu_t\) are white noise.

We know that in this case, \(R^2\) is the square of the correlation between \(Y_t\) and \(X_t\). Ils utilisent le résultat de Kendall dans l’article Knowles (1954) qui exprime la variance de \(R\) :

\[ Var(R) = \frac{1}{T}\frac{1+\rho \rho^*}{1-\rho \rho^*} \]

since R is constrained to lie between -1 and 1, if its variance is greater than 1/3, the distribution of R cannot have a mode at 0. This imply that \(\rho \rho^* >\frac{T-1}{T+1}\). Thus, for example if T=20 and \(\rho = \rho^*\), a distribution that is not unimodal at O will be obtained if \(\rho > 0.86\), and if \(\rho = 0.9\), \(var(R) = 0.47\). So the \(E(R^2)\) will be close to 0.47.

Here’s an improved and corrected version, keeping your tone and touch while improving the fluency and clarity:

It has been shown that when \(\rho\) is close to 1, \(R^2\) can be very high, suggesting a strong relationship between \(Y_t\) and \(X_t\). However, in reality, the two series are completely independent. When \(\rho\) is near 1, both series behave like random walks or near-random walks. On top of that, both series are highly autocorrelated, which causes the residuals from the regression to also be strongly autocorrelated. As a result, the Durbin-Watson statistic d will be very low. This is why a high \(R^2\) in this context should never be taken as evidence of a true relationship between the two series.

To explore the possibility of obtaining a spurious regression when regressing two independent random walks, a series of simulations proposed by Granger and Newbold (1974) will be conducted in the next section.

Simulation results using python.

In this section, we will show using simulations that using the regression model with independent random walks bias the estimation of the coefficients and the hypothesis tests of the coefficients are invalid.

A regression equation proposed by Granger and Newbold (1974) is given by: \[ Y_t = \beta_0 + X_t\beta_1 + \epsilon_t \]

where \(Y_t\) and \(X_t\) were, in fact, generated as independent random walks each of length 50. The values \[ S = \frac{\lvert \hat{\beta}_1 \rvert}{\sqrt{\hat{SE}(\hat{\beta}_1)}}, \]

representing the statistic for testing the significance of \(\beta_1\), for 100 simulations will be reported in the table below.

Let’s carry out the simulation using python.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

np.random.seed(123)
M = 100 
n = 50
S = np.zeros(M)
for i in range(M):
#---------------------------------------------------------------
# Generate the data
#---------------------------------------------------------------
    espilon_y = np.random.normal(0, 1, n)
    espilon_x = np.random.normal(0, 1, n)

    Y = np.cumsum(espilon_y)
    X = np.cumsum(espilon_x)
#---------------------------------------------------------------
# Fit the model
#---------------------------------------------------------------
    X = sm.add_constant(X)
    model = sm.OLS(Y, X).fit()
#---------------------------------------------------------------
# Compute the statistic
#------------------------------------------------------
    S[i] = np.abs(model.params[1])/model.bse[1]

import numpy as np
import pandas as pd



#------------------------------------------------------#    Maximum value of S
#------------------------------------------------------
S_max = int(np.ceil(max(S)))

#------------------------------------------------------#    Create bins
#------------------------------------------------------
bins = np.arange(0, S_max + 2, 1)  

#------------------------------------------------------#    Compute the histogram
#------------------------------------------------------
frequency, bin_edges = np.histogram(S, bins=bins)

#------------------------------------------------------#    Create a dataframe
#------------------------------------------------------

df = pd.DataFrame({
    "S Interval": [f"{int(bin_edges[i])}-{int(bin_edges[i+1])}" for i in range(len(bin_edges)-1)],
    "Frequency": frequency
})
print(df)
print(np.mean(S))

   S Interval  Frequency
0         0-1         13
1         1-2         16
2         2-3         12
3         3-4          8
4         4-5         14
5         5-6          8
6         6-7          7
7         7-8          5
8         8-9          7
9        9-10          0
10      10-11          3
11      11-12          4
12      12-13          0
13      13-14          1
14      14-15          1
15      15-16          1
16      16-17          0
4.58501785392309

The null hypothesis of no relationship between \(Y_t\) and \(X_t\) is rejected at the 5% level if \(S > 2\). This table shows that the null hypothesis (\(\beta =0\)) is wrongly rejected in about a quarter (71 times) of all cases. This is awkward because the two variables are independent random walks, meaning there’s no actual relationship. Let’s break down why this happens.

If \(\frac{\hat{\beta}_1}{\hat{SE}}\) follows a \(N(0,1)\), the expected value of S, its absolute value, should be \(\frac{\sqrt{2}}{\pi} \approx 0.8\)( \(\frac{2}{\pi}\) is the mean of the absolute value of a standard normal distribution). However, the simulation results show an average of 4.59, meaning the estimated S is underestimated by a factor of \(\frac{4.59}{0.8} = 5.7\).

In classical statistics, we usually use a t-test threshold of around 2 to check the significance of a coefficient. However, these results show that, in this case, you’d actually need to use a threshold of 11.4 to properly test for significancen :

\[ 2\times \frac{4.59}{0.8} = 11.4. \]

Interpretation: We’ve just shown that including variables that don’t belong in the model — especially random walks — can lead to completely invalid significance tests for the coefficients.

To make their simulations even clearer, Granger and Newbold (1974) ran a series of regressions using variables that follow either a random walk or an ARIMA(0,1,1) process.

Here is how they set up their simulations:

They regressed a dependent series \(Y_t\) on m series \(X_{j,t}\) (with j = 1, 2, …, m), varying m from 1 to 5. varier m from 1 to 5. The dependent series \(Y_t\) and the independent series \(X_{j,t}\) follow the same types of process, and they tested four cases:

Cas 1 (Levels) : \(Y_t\) et \(X_{j,t}\) follow random walk.
Cas 2 (Differences) : They use the first difference sof the random walk, which are stationary.
Cas 3 (Levels) : \(Y_t\) et \(X_{j,t}\) follow ARIMA(0,1,1).
Cas 4 (Differences) : They use the first differences of the previous ARIMA(0,1,1) which are stationary.

Each series has a length of 50 observations, and they ran 100 simulations for each case.

All errors terms are distributed as \(N(0,1)\) and the ARIMA(0,1,1) series are derived as the sum of the random walk and independent white noise. The simulation results, based on 100 replications with series of length 50, are summarized in the next table.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
from tabulate import tabulate

np.random.seed(1)  # Pour rendre les résultats reproductibles

#------------------------------------------------------
# Definition of functions
#------------------------------------------------------

def generate_random_walk(T):
    """
    Génère une série de longueur T suivant un random walk :
        Y_t = Y_{t-1} + e_t,
    où e_t ~ N(0,1).
    """
    e = np.random.normal(0, 1, size=T)
    return np.cumsum(e)

def generate_arima_0_1_1(T):
    """
    Génère un ARIMA(0,1,1) selon la méthode de Granger & Newbold :
    la série est obtenue en additionnant une marche aléatoire et un bruit blanc indépendant.
    """
    rw = generate_random_walk(T)
    wn = np.random.normal(0, 1, size=T)
    return rw + wn

def difference(series):
    """
    Calcule la différence première d'une série unidimensionnelle.
    Retourne une série de longueur T-1.
    """
    return np.diff(series)

#------------------------------------------------------
# Paramètres
#------------------------------------------------------

T = 50           # longueur de chaque série
n_sims = 100     # nombre de simulations Monte Carlo
alpha = 0.05     # seuil de significativité

def run_simulation_case(case_name, m_values=[1,2,3,4,5]):
    """
    case_name : un identifiant pour le type de génération :
        - 'rw-levels' : random walk (levels)
        - 'rw-diffs'  : differences of RW (white noise)
        - 'arima-levels' : ARIMA(0,1,1) en niveaux
        - 'arima-diffs'  : différences d'un ARIMA(0,1,1) => MA(1)
    
    m_values : liste du nombre de régresseurs.
    
    Retourne un DataFrame avec pour chaque m :
        - % de rejets de H0
        - Durbin-Watson moyen
        - R^2_adj moyen
        - % de R^2 > 0.1
    """
    results = []
    
    for m in m_values:
        count_reject = 0
        dw_list = []
        r2_adjusted_list = []
        
        for _ in range(n_sims):
#--------------------------------------
# 1) Generation of independents de Y_t and X_{j,t}.
#----------------------------------------
            if case_name == 'rw-levels':
                Y = generate_random_walk(T)
                Xs = [generate_random_walk(T) for __ in range(m)]
            
            elif case_name == 'rw-diffs':
                # Y et X sont les différences d'un RW, i.e. ~ white noise
                Y_rw = generate_random_walk(T)
                Y = difference(Y_rw)
                Xs = []
                for __ in range(m):
                    X_rw = generate_random_walk(T)
                    Xs.append(difference(X_rw))
                # NB : maintenant Y et Xs ont longueur T-1
                # => ajuster T_effectif = T-1
                # => on prendra T_effectif points pour la régression
            
            elif case_name == 'arima-levels':
                Y = generate_arima_0_1_1(T)
                Xs = [generate_arima_0_1_1(T) for __ in range(m)]
            
            elif case_name == 'arima-diffs':
                # Différences d'un ARIMA(0,1,1) => MA(1)
                Y_arima = generate_arima_0_1_1(T)
                Y = difference(Y_arima)
                Xs = []
                for __ in range(m):
                    X_arima = generate_arima_0_1_1(T)
                    Xs.append(difference(X_arima))
            
            # 2) Prépare les données pour la régression
            #    Selon le cas, la longueur est T ou T-1
            if case_name in ['rw-levels','arima-levels']:
                Y_reg = Y
                X_reg = np.column_stack(Xs) if m>0 else np.array([])
            else:
                # dans les cas de différences, la longueur est T-1
                Y_reg = Y
                X_reg = np.column_stack(Xs) if m>0 else np.array([])
            
            # 3) Régression OLS
            X_with_const = sm.add_constant(X_reg)  # Ajout de l'ordonnée à l'origine
            model = sm.OLS(Y_reg, X_with_const).fit()
            
            # 4) Test global F : H0 : tous les beta_j = 0
            #    On regarde si p-value < alpha
            if model.f_pvalue is not None and model.f_pvalue < alpha:
                count_reject += 1
            
            # 5) R^2, Durbin-Watson
            r2_adjusted_list.append(model.rsquared_adj)
            
            
            dw_list.append(durbin_watson(model.resid))
        
        # Statistiques sur n_sims répétitions
        reject_percent = 100 * count_reject / n_sims
        dw_mean = np.mean(dw_list)
        r2_mean = np.mean(r2_adjusted_list)
        r2_above_0_7_percent = 100 * np.mean(np.array(r2_adjusted_list) > 0.7)
        
        results.append({
            'm': m,
            'Reject %': reject_percent,
            'Mean DW': dw_mean,
            'Mean R^2': r2_mean,
            '% R^2_adj>0.7': r2_above_0_7_percent
        })
    
    return pd.DataFrame(results)

cases = ['rw-levels', 'rw-diffs', 'arima-levels', 'arima-diffs']
all_results = {}

for c in cases:
    df_res = run_simulation_case(c, m_values=[1,2,3,4,5])
    all_results[c] = df_res

for case, df_res in all_results.items():
    print(f"\n\n{case}")
    print(tabulate(df_res, headers='keys', tablefmt='fancy_grid'))



rw-levels
╒════╤═════╤════════════╤═══════════╤════════════╤═════════════════╕
│    │   m │   Reject % │   Mean DW │   Mean R^2 │   % R^2_adj>0.7 │
╞════╪═════╪════════════╪═══════════╪════════════╪═════════════════╡
│  0 │   1 │         74 │  0.33973  │   0.240534 │               4 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  1 │   2 │         86 │  0.481779 │   0.416778 │              15 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  2 │   3 │         93 │  0.561673 │   0.493484 │              23 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  3 │   4 │         98 │  0.679036 │   0.570376 │              28 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  4 │   5 │         96 │  0.800295 │   0.58997  │              35 │
╘════╧═════╧════════════╧═══════════╧════════════╧═════════════════╛


rw-diffs
╒════╤═════╤════════════╤═══════════╤═════════════╤═════════════════╕
│    │   m │   Reject % │   Mean DW │    Mean R^2 │   % R^2_adj>0.7 │
╞════╪═════╪════════════╪═══════════╪═════════════╪═════════════════╡
│  0 │   1 │          5 │   1.99294 │  0.00168242 │               0 │
├────┼─────┼────────────┼───────────┼─────────────┼─────────────────┤
│  1 │   2 │          9 │   1.97273 │  0.00602094 │               0 │
├────┼─────┼────────────┼───────────┼─────────────┼─────────────────┤
│  2 │   3 │          6 │   1.95948 │  0.00387684 │               0 │
├────┼─────┼────────────┼───────────┼─────────────┼─────────────────┤
│  3 │   4 │          9 │   1.97925 │  0.00627632 │               0 │
├────┼─────┼────────────┼───────────┼─────────────┼─────────────────┤
│  4 │   5 │          2 │   1.93948 │ -0.00971288 │               0 │
╘════╧═════╧════════════╧═══════════╧═════════════╧═════════════════╛


arima-levels
╒════╤═════╤════════════╤═══════════╤════════════╤═════════════════╕
│    │   m │   Reject % │   Mean DW │   Mean R^2 │   % R^2_adj>0.7 │
╞════╪═════╪════════════╪═══════════╪════════════╪═════════════════╡
│  0 │   1 │         64 │  0.698287 │   0.206495 │               2 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  1 │   2 │         73 │  0.886599 │   0.299502 │               6 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  2 │   3 │         87 │  1.03089  │   0.364599 │               6 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  3 │   4 │         94 │  1.19611  │   0.447544 │              13 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  4 │   5 │         89 │  1.26888  │   0.421604 │              19 │
╘════╧═════╧════════════╧═══════════╧════════════╧═════════════════╛


arima-diffs
╒════╤═════╤════════════╤═══════════╤════════════╤═════════════════╕
│    │   m │   Reject % │   Mean DW │   Mean R^2 │   % R^2_adj>0.7 │
╞════╪═════╪════════════╪═══════════╪════════════╪═════════════════╡
│  0 │   1 │         10 │   2.6299  │ 0.00465141 │               0 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  1 │   2 │         16 │   2.55665 │ 0.0173058  │               0 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  2 │   3 │          5 │   2.59995 │ 0.00913797 │               0 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  3 │   4 │          6 │   2.5216  │ 0.0139963  │               0 │
├────┼─────┼────────────┼───────────┼────────────┼─────────────────┤
│  4 │   5 │          6 │   2.47298 │ 0.00864894 │               0 │
╘════╧═════╧════════════╧═══════════╧════════════╧═════════════════╛

Interpretation of the results :

It is seen that the probability of no rejecting the null hypothesis of no relationship between \(Y_t\) and \(X_{j,t}\) becomes very small when m>=3 when regressions are made with random walk series (rw-levels). The \(R^2\) and the Mean Durbin-Watson increase. Simalar results are obtained when the regressions are made with ARIMA(0,1,1) series (arima-levels).
When white noise series (rw-diffs) are used, classical regression analysis is valid, since the errors series will be white noise and least squares will be efficient.
However, when the regressions are made with the differences of ARIMA(0,1,1) series (arima-diffs) or first order moving average series MA(1) process, the null hypothesis is rejected, on average \(\frac{10+16+5+6+6}{5} = 8.6\) greater than 5% of the time.

If your variables are random walks or close to them, and you include unnecessary variables in your regression, you will often get fallacious results. High \(R^2\) or \(R^2_{adjusted}\) and low Durbin-Watson values do not confirm a true relationship, but instead indicate a likely spurious one.

How to avoid spurious regression in time series

If one performs a regression analysis with time series data, and finds that the residuals are strongly autocorrelated, there is a serious problem when it comes to interpreting the coefficients of the equation. To check for autocorrelation in the residuals, one can use the Durbin-Watson test, the Ljung-Box or the Portmanteau test.

Based on the study above, we can conclude that if a regression analysis performed with economical variables produces strongly autocorrelated residuals, meaning to a law low Durbin-Watson statistic, then the results of the analysis are likely to be spurious whatever the value of the coefficient of determination \(R^2\) observed.

In such cases, it is important to understand where is the mis-specification comes from. According to the litterature, mis-specification usually falls into three categories : (i) the omission of a relevant variable, (ii) the inclusion of an irrelevant variable, or (iii) autocorrelation of the errors. Most of the time, mis-specification comes from a mix of these three sources.

To avoid spurious regression in time series, several recommendations can be made:

The first recommendation is to select the right macroeconomic variables that are likely to explain the dependent variable. This can be done by reviewing the literature or consulting experts in the field.
The second recommendation is to stationarize the series by taking first differences. In most cases, the first differences of macroeconomic variables are stationary and still easy to interpret. For macroeconomic data, it’s strongly recommended to difference the series once to reduce the autocorrelation of the residuals, especially when the sample size is small. There is indeed sometimes strong serial correlation observed in these variables. A simple calculation shows that the first differences will almost always have much smaller serial correlations than the original series.
The third recommendation is to use the Box-Jenkins methodology to model each macroeconomic variable individually, and then search for relationships between the series by relating the residuals from each individual model. The idea here is that the Box-Jenkins process extracts the explained part of the series, leaving the residuals, which contain only what can’t be explained by the series’ own past behavior. This makes it easier to check whether these unexplained parts (residuals) are related across variables.

Conclusion

Many econometrics textbooks warn about specification errors in regression models, but the problem still shows up in many published papers. Granger and Newbold (1974) highlighted the risk of spurious regressions, where you get a high paired with very low Durbin-Watson statistics.

Using Python simulations, we showed some of the main causes of these spurious regressions, especially including variables that don’t belong in the model and are highly autocorrelated. We also demonstrated how these issues can completely distort hypothesis tests on the coefficients.

Hopefully, this post will help reduce the risk of spurious regressions in future econometric analyses.

References

Granger, Clive WJ, and Paul Newbold. 1974. “Spurious Regressions in Econometrics.” Journal of Econometrics 2 (2): 111–20.

Knowles, EAG. 1954. “Exercises in Theoretical Statistics.” Oxford University Press.