7 Pandas Tricks for Time-Series Feature Engineering

7 Pandas Methods for Time-Collection Characteristic Engineering
Picture by Editor | ChatGPT

Introduction

Characteristic engineering is among the most necessary steps in terms of constructing efficient machine studying fashions, and that is no much less necessary when coping with time-series knowledge. By having the ability to create significant options from temporal knowledge, you’ll be able to unlock predictive energy that’s unavailable when utilized to uncooked timestamps alone.

Happily for us all, Pandas affords a robust and versatile set of operations for manipulating and creating time-series options.

This text will discover 7 sensible Pandas tips that may assist remodel your time-series knowledge, which may help result in enhanced fashions and extra highly effective prediction. We’ll use a easy, artificial dataset for example every approach, permitting you to rapidly grasp the ideas and apply them to your individual initiatives.

Setting Up Our Knowledge

First, let’s create a pattern time-series DataFrame. This dataset will characterize every day gross sales knowledge over a time frame, which we’ll use for all subsequent examples.

import pandas as pd
import numpy as np

# Set a random seed for reproducibility
np.random.seed(42)

# Create a date vary
date_range = pd.date_range(begin=”2025-07-01″, finish=’2025-07-30′, freq=’D’)

# Create a pattern DataFrame
df = pd.DataFrame(date_range, columns=[‘date’])
df[‘sales’] = np.random.randint(50, 100, measurement=(len(date_range)))
df = df.set_index(‘date’)

print(f”Dataset measurement: {df.measurement}”)
print(df.head())

import pandas as pd

import numpy as np

# Set a random seed for reproducibility

np.random.seed(42)

# Create a date vary

date_range = pd.date_range(begin=‘2025-07-01’, finish=‘2025-07-30’, freq=‘D’)

# Create a pattern DataFrame

df = pd.DataFrame(date_range, columns=[‘date’])

df[‘sales’] = np.random.randint(50, 100, measurement=(len(date_range)))

df = df.set_index(‘date’)

print(f“Dataset measurement: {df.measurement}”)

print(df.head())

Output:

Dataset measurement: 30
gross sales
date
2025-07-01 88
2025-07-02 78
2025-07-03 64
2025-07-04 92
2025-07-05 57

Dataset measurement: 30

gross sales

date

2025–07–01 88

2025–07–02 78

2025–07–03 64

2025–07–04 92

2025–07–05 57

We now have created a small dataset, an entry for every day of July 2025, with a randomly-assigned gross sales worth. Observe that your knowledge will look the identical as mine above when you use np.random.seed(42).

With our knowledge prepared, we are able to now discover a number of methods for creating insightful options.

1. Extracting Datetime Elements

Considered one of easiest but most helpful time-series characteristic engineering methods is to interrupt down the datetime object into its constituent parts. These parts can seize seasonality and traits at totally different granularities (akin to day of the week, month of the 12 months, and so forth.). Pandas makes this very easy with the .dt accessor.

df[‘day_of_week’] = df.index.dayofweek
df[‘day_of_year’] = df.index.dayofyear
df[‘month’] = df.index.month
df[‘quarter’] = df.index.quarter
df[‘week_of_year’] = df.index.isocalendar().week

print(df.head())

df[‘day_of_week’] = df.index.dayofweek

df[‘day_of_year’] = df.index.dayofyear

df[‘month’] = df.index.month

df[‘quarter’] = df.index.quarter

df[‘week_of_year’] = df.index.isocalendar().week

print(df.head())

Output:

gross sales day_of_week day_of_year month quarter week_of_year
date
2025-07-01 88 1 182 7 3 27
2025-07-02 78 2 183 7 3 27
2025-07-03 64 3 184 7 3 27
2025-07-04 92 4 185 7 3 27
2025-07-05 57 5 186 7 3 27

gross sales day_of_week day_of_year month quarter week_of_year

date

2025–07–01 88 1 182 7 3 27

2025–07–02 78 2 183 7 3 27

2025–07–03 64 3 184 7 3 27

2025–07–04 92 4 185 7 3 27

2025–07–05 57 5 186 7 3 27

We now have day of week, day of 12 months, month, quarter, and week of 12 months knowledge factors for every of our entries. These new options may help a mannequin be taught patterns associated to weekly cycles (akin to larger gross sales on weekends) or annual seasonality. An excellent place to begin.

2. Creating Lag Options

Lag options are values from earlier time steps. They’re important in time-series forecasting as a result of they characterize the state of the system previously, which is usually extremely predictive of the longer term. The shift() methodology is ideal for this.

# Create a lag characteristic for gross sales from the day gone by
df[‘sales_lag_1’] = df[‘sales’].shift(1)

# Create a lag characteristic for gross sales from 3 days in the past
df[‘sales_lag_3’] = df[‘sales’].shift(3)

print(df.head())

# Create a lag characteristic for gross sales from the day gone by

df[‘sales_lag_1’] = df[‘sales’].shift(1)

# Create a lag characteristic for gross sales from 3 days in the past

df[‘sales_lag_3’] = df[‘sales’].shift(3)

print(df.head())

Output:

gross sales sales_lag_1 sales_lag_3
date
2025-07-01 88 NaN NaN
2025-07-02 78 88.0 NaN
2025-07-03 64 78.0 NaN
2025-07-04 92 64.0 88.0
2025-07-05 57 92.0 78.0

gross sales sales_lag_1 sales_lag_3

date

2025–07–01 88 NaN NaN

2025–07–02 78 88.0 NaN

2025–07–03 64 78.0 NaN

2025–07–04 92 64.0 88.0

2025–07–05 57 92.0 78.0

Observe that our shifting has created a number of NaN values at the start of the sequence for apparent causes, which you’ll have to deal with earlier than modeling by both filtering or dropping.

3. Calculating Rolling Window Statistics

Rolling window calculations (often known as transferring averages) are useful for smoothing out short-term fluctuations and highlighting longer-term traits. You’ll be able to simply calculate numerous statistics just like the imply, median, or commonplace deviation over a fixed-size window utilizing the rolling() methodology.

# Calculate the 3-day rolling imply of gross sales
df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()

# Calculate the 3-day rolling commonplace deviation
df[‘rolling_std_3’] = df[‘sales’].rolling(window=3).std()

print(df.head())

# Calculate the 3-day rolling imply of gross sales

df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()

# Calculate the 3-day rolling commonplace deviation

df[‘rolling_std_3’] = df[‘sales’].rolling(window=3).std()

print(df.head())

Output:

gross sales rolling_mean_3 rolling_std_3
date
2025-07-01 88 NaN NaN
2025-07-02 78 NaN NaN
2025-07-03 64 76.666667 12.055428
2025-07-04 92 78.000000 14.000000
2025-07-05 57 71.000000 18.520259

gross sales rolling_mean_3 rolling_std_3

date

2025–07–01 88 NaN NaN

2025–07–02 78 NaN NaN

2025–07–03 64 76.666667 12.055428

2025–07–04 92 78.000000 14.000000

2025–07–05 57 71.000000 18.520259

These new options may help present perception into the current pattern and volatility of the sequence.

4. Producing Increasing Window Statistics

In distinction to a rolling window, an increasing window contains all the knowledge from the very begin of the time sequence as much as the present time limit. This may be helpful for capturing statistics which accumulate over time, together with working totals and total averages. That is achieved with the increasing() methodology.

# Calculate the increasing sum of gross sales
df[‘expanding_sum’] = df[‘sales’].increasing().sum()

# Calculate the increasing common of gross sales
df[‘expanding_avg’] = df[‘sales’].increasing().imply()

print(df.head())

# Calculate the increasing sum of gross sales

df[‘expanding_sum’] = df[‘sales’].increasing().sum()

# Calculate the increasing common of gross sales

df[‘expanding_avg’] = df[‘sales’].increasing().imply()

print(df.head())

Output:

gross sales expanding_sum expanding_avg
date
2025-07-01 88 88.0 88.000000
2025-07-02 78 166.0 83.000000
2025-07-03 64 230.0 76.666667
2025-07-04 92 322.0 80.500000
2025-07-05 57 379.0 75.800000

gross sales expanding_sum expanding_avg

date

2025–07–01 88 88.0 88.000000

2025–07–02 78 166.0 83.000000

2025–07–03 64 230.0 76.666667

2025–07–04 92 322.0 80.500000

2025–07–05 57 379.0 75.800000

5. Measuring Time Between Occasions

Typically, the time elapsed for the reason that final occasion of significance or between consecutive knowledge factors could be a fascinating characteristic. You’ll be able to calculate the distinction between consecutive timestamps utilizing diff() on the index.

# Our index is every day, so the distinction is fixed, however this reveals the precept
df[‘time_since_last’] = df.index.to_series().diff().dt.days

print(df.head())

# Our index is every day, so the distinction is fixed, however this reveals the precept

df[‘time_since_last’] = df.index.to_series().diff().dt.days

print(df.head())

gross sales time_since_last
date
2025-07-01 88 NaN
2025-07-02 78 1.0
2025-07-03 64 1.0
2025-07-04 92 1.0
2025-07-05 57 1.0

gross sales time_since_last

date

2025–07–01 88 NaN

2025–07–02 78 1.0

2025–07–03 64 1.0

2025–07–04 92 1.0

2025–07–05 57 1.0

Whereas not precisely helpful for our easy common sequence, this may grow to be very highly effective for irregular time-series knowledge the place the time delta varies.

6. Encoding Cyclical Options with Sine/Cosine

Cyclical options like day of the week or month of the 12 months current an issue for machine studying fashions. That is the case as a result of the top of the cycle (Saturday, day 5, is much from Sunday, day 6, numerically, which might trigger confusion). To raised deal with this, we are able to remodel them into two dimensions utilizing sine and cosine transformations; this preserves the cyclical nature of the connection.

# From our earlier part “Extracting Datetime Elements”
df[‘day_of_week’] = df.index.dayofweek
df[‘month’] = df.index.month

# Day of week has a cycle of seven days
df[‘day_of_week_sin’] = np.sin(2 * np.pi * df[‘day_of_week’] / 7)
df[‘day_of_week_cos’] = np.cos(2 * np.pi * df[‘day_of_week’] / 7)

# Month has a cycle of 12 months
df[‘month_sin’] = np.sin(2 * np.pi * df[‘month’] / 12)
df[‘month_cos’] = np.cos(2 * np.pi * df[‘month’] / 12)

print(df.head())

# From our earlier part “Extracting Datetime Elements”

df[‘day_of_week’] = df.index.dayofweek

df[‘month’] = df.index.month

# Day of week has a cycle of seven days

df[‘day_of_week_sin’] = np.sin(2 * np.pi * df[‘day_of_week’] / 7)

df[‘day_of_week_cos’] = np.cos(2 * np.pi * df[‘day_of_week’] / 7)

# Month has a cycle of 12 months

df[‘month_sin’] = np.sin(2 * np.pi * df[‘month’] / 12)

df[‘month_cos’] = np.cos(2 * np.pi * df[‘month’] / 12)

print(df.head())

Output:

gross sales day_of_week month day_of_week_sin day_of_week_cos month_sin month_cos
date
2025-07-01 88 1 7 0.781831 0.623490 -0.5 -0.866025
2025-07-02 78 2 7 0.974928 -0.222521 -0.5 -0.866025
2025-07-03 64 3 7 0.433884 -0.900969 -0.5 -0.866025
2025-07-04 92 4 7 -0.433884 -0.900969 -0.5 -0.866025
2025-07-05 57 5 7 -0.974928 -0.222521 -0.5 -0.866025

gross sales day_of_week month day_of_week_sin day_of_week_cos month_sin month_cos

date

2025–07–01 88 1 7 0.781831 0.623490 –0.5 –0.866025

2025–07–02 78 2 7 0.974928 –0.222521 –0.5 –0.866025

2025–07–03 64 3 7 0.433884 –0.900969 –0.5 –0.866025

2025–07–04 92 4 7 –0.433884 –0.900969 –0.5 –0.866025

2025–07–05 57 5 7 –0.974928 –0.222521 –0.5 –0.866025

This transformation helps fashions perceive that December (month 12) is simply as near January (month 1) as February (month 2) is.

7. Creating Interplay Options

Lastly, let’s check out how we are able to create interacting options by combining two or extra current options, which may help seize extra complicated relationships. For instance, a mannequin may profit from realizing if it’s a “weekday morning” versus a “weekend morning.”

# From our earlier part “Calculating Rolling Window Statistics”
df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()

# A characteristic for the distinction between a day’s gross sales and the 3-day rolling common
df[‘sales_vs_rolling_mean’] = df[‘sales’] – df[‘rolling_mean_3’]

print(df.head())

# From our earlier part “Calculating Rolling Window Statistics”

df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()

# A characteristic for the distinction between a day’s gross sales and the 3-day rolling common

df[‘sales_vs_rolling_mean’] = df[‘sales’] – df[‘rolling_mean_3’]

print(df.head())

Output:

gross sales rolling_mean_3 sales_vs_rolling_mean
date
2025-07-01 88 NaN NaN
2025-07-02 78 NaN NaN
2025-07-03 64 76.666667 -12.666667
2025-07-04 92 78.000000 14.000000
2025-07-05 57 71.000000 -14.000000

gross sales rolling_mean_3 sales_vs_rolling_mean

date

2025–07–01 88 NaN NaN

2025–07–02 78 NaN NaN

2025–07–03 64 76.666667 –12.666667

2025–07–04 92 78.000000 14.000000

2025–07–05 57 71.000000 –14.000000

The probabilities for such interacting options are limitless. The larger your area information and creativity, the extra insightful these options can grow to be.

Wrapping Up

Time-series characteristic engineering is equal components artwork and science. Area experience is undeniably invaluable, however so is a powerful command of instruments like Pandas to assist present the muse for creating options that may assist enhance mannequin efficiency and in the end resolve issues.

The seven tips lined right here — from extracting datetime parts to creating complicated interactions — are highly effective constructing blocks for any time-series evaluation or forecasting process. By profiting from Pandas and its highly effective time-series capabilities, you’ll be able to extra successfully uncover the hidden patterns inside your temporal knowledge.

7 Pandas Tricks for Time-Series Feature Engineering

Introduction

Setting Up Our Knowledge

1. Extracting Datetime Elements

2. Creating Lag Options

3. Calculating Rolling Window Statistics

4. Producing Increasing Window Statistics

5. Measuring Time Between Occasions

6. Encoding Cyclical Options with Sine/Cosine

7. Creating Interplay Options

Wrapping Up

Leave a Reply Cancel reply

Follow US

Popular News

Aluminium: Why Google’s Android for PC launch may be messy and controversial

Daughters Of Ash Is The Closest We’ve Gotten To Dark Souls 4

Keyshia Cole & Hunxho Heat Up ATL Concert With PDA Moment

How a 2020 Rolex Collection Changed the Face of Watch Design

The Scariest, Most Shocking Sci-Fi Series In TV History Is Still The Perfect Stream

Categories

About US

Quick Links

Important Links

Subscribe US

Introduction

Setting Up Our Knowledge

1. Extracting Datetime Elements

2. Creating Lag Options

3. Calculating Rolling Window Statistics

4. Producing Increasing Window Statistics

5. Measuring Time Between Occasions

6. Encoding Cyclical Options with Sine/Cosine

7. Creating Interplay Options

Wrapping Up

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

Aluminium: Why Google’s Android for PC launch may be messy and controversial

Daughters Of Ash Is The Closest We’ve Gotten To Dark Souls 4

Keyshia Cole & Hunxho Heat Up ATL Concert With PDA Moment

How a 2020 Rolex Collection Changed the Face of Watch Design

The Scariest, Most Shocking Sci-Fi Series In TV History Is Still The Perfect Stream

Categories

About US

Quick Links

Important Links

Subscribe US