DSPython - Time Series Analysis

Topic 1: What is Time Series Data?

Time Series Data is a set of data points collected sequentially over a period of time. Unlike other datasets where the order of rows doesn't matter, in a time series, the **order is everything**.

This temporal dependence means that a value at one point in time is often related to the values that came before it. This property is what we try to model.

Key Examples:

Finance: Daily stock prices or exchange rates.
Weather: Hourly temperature or daily rainfall.
Business: Monthly sales or daily website traffic.
IoT: Sensor readings taken every second.

The goal is usually **forecasting**: predicting future values based on past patterns.

Topic 2: Components of a Time Series

To understand a time series, we often "decompose" it into its core components. Any time series can be thought of as a combination of these three parts:

1. Trend: The long-term, underlying upward or downward movement in the data. (e.g., a company's sales slowly growing over several years).
2. Seasonality: Predictable, fixed, and repeating patterns at specific intervals (e.g., ice cream sales are high every summer, or website traffic is high every weekday morning).
3. Noise (or Residual): The random, unpredictable, and irregular component of the data that is left over after removing the Trend and Seasonality.

A model's job is to learn the Trend and Seasonality so it can forecast them, while acknowledging the Noise as random error.

Topic 3: Stationarity

This is the most important concept in classical time series analysis. A time series is **stationary** if its statistical properties do not change over time.

Specifically, a stationary series has a constant mean and variance over time. It does not have a trend or seasonality.

Why does this matter?

Most forecasting models (like ARIMA) are "dumb." They *assume* the data is stationary. If you feed them a non-stationary series (one with a clear upward trend), the model will make terrible forecasts.

Our job: We must first transform our data to *make it* stationary.

The Fix: The most common method is **Differencing**. To remove a trend, we subtract the previous value from the current value. This new series of "changes" is often stationary.

# Make data stationary by differencing
df['sales_diff'] = df['sales'].diff(1)

Topic 4: The ARIMA Model

ARIMA is the most famous classical forecasting model. It's an acronym that combines three concepts:

AR (AutoRegressive): This part of the model assumes that the current value is dependent on its own past values. The parameter `p` defines how many "lag" values to use (e.g., `p=2` means $Y_t$ depends on $Y_{t-1}$ and $Y_{t-2}$).
I (Integrated): This is the differencing part. The parameter `d` tells the model how many times the data had to be differenced to become stationary. (e.g., `d=1` means we used first-order differencing).
MA (Moving Average): This part assumes the current value is dependent on the *errors* from past predictions. The parameter `q` defines how many past errors to use.

By combining these, we create an ARIMA(p, d, q) model. Finding the right values for p, d, and q is the key to building a good ARIMA model.

from statsmodels.tsa.arima.model import ARIMA

# We found p=1, d=1, q=1
model = ARIMA(train_data['sales'], order=(1,1,1))
model_fit = model.fit()

# Get a forecast for the next 3 periods
forecast = model_fit.forecast(steps=3)

Time Series Forecasting

Topic 1: What is Time Series Data?

Topic 2: Components of a Time Series

Topic 3: Stationarity

Topic 4: The ARIMA Model

Practice Question

Loading Question...

Upload Your Own CSV

Output Console