DSPython Logo DSPython

Time Series Forecasting

Learn to analyze and forecast data points indexed in time, like stock prices or weather.

Time Series Advanced 90 min

Topic 1: What is Time Series Data?

Time Series Data is a set of data points collected sequentially over a period of time. Unlike other datasets where the order of rows doesn't matter, in a time series, the **order is everything**.

This temporal dependence means that a value at one point in time is often related to the values that came before it. This property is what we try to model.

Time Series Decomposition Diagram

Key Examples:

  • Finance: Daily stock prices or exchange rates.
  • Weather: Hourly temperature or daily rainfall.
  • Business: Monthly sales or daily website traffic.
  • IoT: Sensor readings taken every second.

The goal is usually **forecasting**: predicting future values based on past patterns.


Topic 2: Components of a Time Series

To understand a time series, we often "decompose" it into its core components. Any time series can be thought of as a combination of these three parts:

Time Series Decomposition Diagram

  • 1. Trend: The long-term, underlying upward or downward movement in the data. (e.g., a company's sales slowly growing over several years).
  • 2. Seasonality: Predictable, fixed, and repeating patterns at specific intervals (e.g., ice cream sales are high every summer, or website traffic is high every weekday morning).
  • 3. Noise (or Residual): The random, unpredictable, and irregular component of the data that is left over after removing the Trend and Seasonality.

A model's job is to learn the Trend and Seasonality so it can forecast them, while acknowledging the Noise as random error.


Topic 3: Stationarity

This is the most important concept in classical time series analysis. A time series is **stationary** if its statistical properties do not change over time.

Specifically, a stationary series has a constant mean and variance over time. It does not have a trend or seasonality.

Why does this matter?

Most forecasting models (like ARIMA) are "dumb." They *assume* the data is stationary. If you feed them a non-stationary series (one with a clear upward trend), the model will make terrible forecasts.


Our job: We must first transform our data to *make it* stationary.

The Fix: The most common method is **Differencing**. To remove a trend, we subtract the previous value from the current value. This new series of "changes" is often stationary.

# Make data stationary by differencing
df['sales_diff'] = df['sales'].diff(1)

Topic 4: The ARIMA Model

ARIMA is the most famous classical forecasting model. It's an acronym that combines three concepts:


  • AR (AutoRegressive): This part of the model assumes that the current value is dependent on its own past values. The parameter `p` defines how many "lag" values to use (e.g., `p=2` means $Y_t$ depends on $Y_{t-1}$ and $Y_{t-2}$).
  • I (Integrated): This is the differencing part. The parameter `d` tells the model how many times the data had to be differenced to become stationary. (e.g., `d=1` means we used first-order differencing).
  • MA (Moving Average): This part assumes the current value is dependent on the *errors* from past predictions. The parameter `q` defines how many past errors to use.

By combining these, we create an ARIMA(p, d, q) model. Finding the right values for p, d, and q is the key to building a good ARIMA model.

from statsmodels.tsa.arima.model import ARIMA

# We found p=1, d=1, q=1
model = ARIMA(train_data['sales'], order=(1,1,1))
model_fit = model.fit()

# Get a forecast for the next 3 periods
forecast = model_fit.forecast(steps=3)