Time Series Forecasting
Learn to analyze and forecast data points indexed in time, like stock prices or weather.
Topic 1: What is Time Series Data?
Time Series Data is a set of data points collected sequentially over a period of time. Unlike other datasets where the order of rows doesn't matter, in a time series, the **order is everything**.
This temporal dependence means that a value at one point in time is often related to the values that came before it. This property is what we try to model.

Key Examples:
- Finance: Daily stock prices or exchange rates.
- Weather: Hourly temperature or daily rainfall.
- Business: Monthly sales or daily website traffic.
- IoT: Sensor readings taken every second.
The goal is usually **forecasting**: predicting future values based on past patterns.
Topic 2: Components of a Time Series
To understand a time series, we often "decompose" it into its core components. Any time series can be thought of as a combination of these three parts:

- 1. Trend: The long-term, underlying upward or downward movement in the data. (e.g., a company's sales slowly growing over several years).
- 2. Seasonality: Predictable, fixed, and repeating patterns at specific intervals (e.g., ice cream sales are high every summer, or website traffic is high every weekday morning).
- 3. Noise (or Residual): The random, unpredictable, and irregular component of the data that is left over after removing the Trend and Seasonality.
A model's job is to learn the Trend and Seasonality so it can forecast them, while acknowledging the Noise as random error.
Topic 3: Stationarity
This is the most important concept in classical time series analysis. A time series is **stationary** if its statistical properties do not change over time.
Specifically, a stationary series has a constant mean and variance over time. It does not have a trend or seasonality.
Why does this matter?
Most forecasting models (like ARIMA) are "dumb." They *assume* the data is stationary. If you feed them a non-stationary series (one with a clear upward trend), the model will make terrible forecasts.
Our job: We must first transform our data to *make it* stationary.
The Fix: The most common method is **Differencing**. To remove a trend, we subtract the previous value from the current value. This new series of "changes" is often stationary.
# Make data stationary by differencing
df['sales_diff'] = df['sales'].diff(1)
Topic 4: The ARIMA Model
ARIMA is the most famous classical forecasting model. It's an acronym that combines three concepts:
- AR (AutoRegressive): This part of the model assumes that the current value is dependent on its own past values. The parameter `p` defines how many "lag" values to use (e.g., `p=2` means $Y_t$ depends on $Y_{t-1}$ and $Y_{t-2}$).
- I (Integrated): This is the differencing part. The parameter `d` tells the model how many times the data had to be differenced to become stationary. (e.g., `d=1` means we used first-order differencing).
- MA (Moving Average): This part assumes the current value is dependent on the *errors* from past predictions. The parameter `q` defines how many past errors to use.
By combining these, we create an ARIMA(p, d, q) model. Finding the right values for p, d, and q is the key to building a good ARIMA model.
from statsmodels.tsa.arima.model import ARIMA
# We found p=1, d=1, q=1
model = ARIMA(train_data['sales'], order=(1,1,1))
model_fit = model.fit()
# Get a forecast for the next 3 periods
forecast = model_fit.forecast(steps=3)