7 Pandas Methods for Time-Collection Characteristic Engineering
Picture by Editor | ChatGPT
Introduction
Characteristic engineering is among the most necessary steps in terms of constructing efficient machine studying fashions, and that is no much less necessary when coping with time-series knowledge. By having the ability to create significant options from temporal knowledge, you’ll be able to unlock predictive energy that’s unavailable when utilized to uncooked timestamps alone.
Happily for us all, Pandas affords a robust and versatile set of operations for manipulating and creating time-series options.
This text will discover 7 sensible Pandas tips that may assist remodel your time-series knowledge, which may help result in enhanced fashions and extra highly effective prediction. We’ll use a easy, artificial dataset for example every approach, permitting you to rapidly grasp the ideas and apply them to your individual initiatives.
Setting Up Our Knowledge
First, let’s create a pattern time-series DataFrame. This dataset will characterize every day gross sales knowledge over a time frame, which we’ll use for all subsequent examples.
import pandas as pd
import numpy as np
# Set a random seed for reproducibility
np.random.seed(42)
# Create a date vary
date_range = pd.date_range(begin=”2025-07-01″, finish=’2025-07-30′, freq=’D’)
# Create a pattern DataFrame
df = pd.DataFrame(date_range, columns=[‘date’])
df[‘sales’] = np.random.randint(50, 100, measurement=(len(date_range)))
df = df.set_index(‘date’)
print(f”Dataset measurement: {df.measurement}”)
print(df.head())
import pandas as pd
import numpy as np
# Set a random seed for reproducibility
np.random.seed(42)
# Create a date vary
date_range = pd.date_range(begin=‘2025-07-01’, finish=‘2025-07-30’, freq=‘D’)
# Create a pattern DataFrame
df = pd.DataFrame(date_range, columns=[‘date’])
df[‘sales’] = np.random.randint(50, 100, measurement=(len(date_range)))
df = df.set_index(‘date’)
print(f“Dataset measurement: {df.measurement}”)
print(df.head())
Output:
Dataset measurement: 30
gross sales
date
2025-07-01 88
2025-07-02 78
2025-07-03 64
2025-07-04 92
2025-07-05 57
Dataset measurement: 30
gross sales
date
2025–07–01 88
2025–07–02 78
2025–07–03 64
2025–07–04 92
2025–07–05 57
We now have created a small dataset, an entry for every day of July 2025, with a randomly-assigned gross sales worth. Observe that your knowledge will look the identical as mine above when you use np.random.seed(42).
With our knowledge prepared, we are able to now discover a number of methods for creating insightful options.
1. Extracting Datetime Elements
Considered one of easiest but most helpful time-series characteristic engineering methods is to interrupt down the datetime object into its constituent parts. These parts can seize seasonality and traits at totally different granularities (akin to day of the week, month of the 12 months, and so forth.). Pandas makes this very easy with the .dt accessor.
df[‘day_of_week’] = df.index.dayofweek
df[‘day_of_year’] = df.index.dayofyear
df[‘month’] = df.index.month
df[‘quarter’] = df.index.quarter
df[‘week_of_year’] = df.index.isocalendar().week
print(df.head())
df[‘day_of_week’] = df.index.dayofweek
df[‘day_of_year’] = df.index.dayofyear
df[‘month’] = df.index.month
df[‘quarter’] = df.index.quarter
df[‘week_of_year’] = df.index.isocalendar().week
print(df.head())
Output:
gross sales day_of_week day_of_year month quarter week_of_year
date
2025-07-01 88 1 182 7 3 27
2025-07-02 78 2 183 7 3 27
2025-07-03 64 3 184 7 3 27
2025-07-04 92 4 185 7 3 27
2025-07-05 57 5 186 7 3 27
gross sales day_of_week day_of_year month quarter week_of_year
date
2025–07–01 88 1 182 7 3 27
2025–07–02 78 2 183 7 3 27
2025–07–03 64 3 184 7 3 27
2025–07–04 92 4 185 7 3 27
2025–07–05 57 5 186 7 3 27
We now have day of week, day of 12 months, month, quarter, and week of 12 months knowledge factors for every of our entries. These new options may help a mannequin be taught patterns associated to weekly cycles (akin to larger gross sales on weekends) or annual seasonality. An excellent place to begin.
2. Creating Lag Options
Lag options are values from earlier time steps. They’re important in time-series forecasting as a result of they characterize the state of the system previously, which is usually extremely predictive of the longer term. The shift() methodology is ideal for this.
# Create a lag characteristic for gross sales from the day gone by
df[‘sales_lag_1’] = df[‘sales’].shift(1)
# Create a lag characteristic for gross sales from 3 days in the past
df[‘sales_lag_3’] = df[‘sales’].shift(3)
print(df.head())
# Create a lag characteristic for gross sales from the day gone by
df[‘sales_lag_1’] = df[‘sales’].shift(1)
# Create a lag characteristic for gross sales from 3 days in the past
df[‘sales_lag_3’] = df[‘sales’].shift(3)
print(df.head())
Output:
gross sales sales_lag_1 sales_lag_3
date
2025-07-01 88 NaN NaN
2025-07-02 78 88.0 NaN
2025-07-03 64 78.0 NaN
2025-07-04 92 64.0 88.0
2025-07-05 57 92.0 78.0
gross sales sales_lag_1 sales_lag_3
date
2025–07–01 88 NaN NaN
2025–07–02 78 88.0 NaN
2025–07–03 64 78.0 NaN
2025–07–04 92 64.0 88.0
2025–07–05 57 92.0 78.0
Observe that our shifting has created a number of NaN values at the start of the sequence for apparent causes, which you’ll have to deal with earlier than modeling by both filtering or dropping.
3. Calculating Rolling Window Statistics
Rolling window calculations (often known as transferring averages) are useful for smoothing out short-term fluctuations and highlighting longer-term traits. You’ll be able to simply calculate numerous statistics just like the imply, median, or commonplace deviation over a fixed-size window utilizing the rolling() methodology.
# Calculate the 3-day rolling imply of gross sales
df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()
# Calculate the 3-day rolling commonplace deviation
df[‘rolling_std_3’] = df[‘sales’].rolling(window=3).std()
print(df.head())
# Calculate the 3-day rolling imply of gross sales
df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()
# Calculate the 3-day rolling commonplace deviation
df[‘rolling_std_3’] = df[‘sales’].rolling(window=3).std()
print(df.head())
Output:
gross sales rolling_mean_3 rolling_std_3
date
2025-07-01 88 NaN NaN
2025-07-02 78 NaN NaN
2025-07-03 64 76.666667 12.055428
2025-07-04 92 78.000000 14.000000
2025-07-05 57 71.000000 18.520259
gross sales rolling_mean_3 rolling_std_3
date
2025–07–01 88 NaN NaN
2025–07–02 78 NaN NaN
2025–07–03 64 76.666667 12.055428
2025–07–04 92 78.000000 14.000000
2025–07–05 57 71.000000 18.520259
These new options may help present perception into the current pattern and volatility of the sequence.
4. Producing Increasing Window Statistics
In distinction to a rolling window, an increasing window contains all the knowledge from the very begin of the time sequence as much as the present time limit. This may be helpful for capturing statistics which accumulate over time, together with working totals and total averages. That is achieved with the increasing() methodology.
# Calculate the increasing sum of gross sales
df[‘expanding_sum’] = df[‘sales’].increasing().sum()
# Calculate the increasing common of gross sales
df[‘expanding_avg’] = df[‘sales’].increasing().imply()
print(df.head())
# Calculate the increasing sum of gross sales
df[‘expanding_sum’] = df[‘sales’].increasing().sum()
# Calculate the increasing common of gross sales
df[‘expanding_avg’] = df[‘sales’].increasing().imply()
print(df.head())
Output:
gross sales expanding_sum expanding_avg
date
2025-07-01 88 88.0 88.000000
2025-07-02 78 166.0 83.000000
2025-07-03 64 230.0 76.666667
2025-07-04 92 322.0 80.500000
2025-07-05 57 379.0 75.800000
gross sales expanding_sum expanding_avg
date
2025–07–01 88 88.0 88.000000
2025–07–02 78 166.0 83.000000
2025–07–03 64 230.0 76.666667
2025–07–04 92 322.0 80.500000
2025–07–05 57 379.0 75.800000
5. Measuring Time Between Occasions
Typically, the time elapsed for the reason that final occasion of significance or between consecutive knowledge factors could be a fascinating characteristic. You’ll be able to calculate the distinction between consecutive timestamps utilizing diff() on the index.
# Our index is every day, so the distinction is fixed, however this reveals the precept
df[‘time_since_last’] = df.index.to_series().diff().dt.days
print(df.head())
# Our index is every day, so the distinction is fixed, however this reveals the precept
df[‘time_since_last’] = df.index.to_series().diff().dt.days
print(df.head())
gross sales time_since_last
date
2025-07-01 88 NaN
2025-07-02 78 1.0
2025-07-03 64 1.0
2025-07-04 92 1.0
2025-07-05 57 1.0
gross sales time_since_last
date
2025–07–01 88 NaN
2025–07–02 78 1.0
2025–07–03 64 1.0
2025–07–04 92 1.0
2025–07–05 57 1.0
Whereas not precisely helpful for our easy common sequence, this may grow to be very highly effective for irregular time-series knowledge the place the time delta varies.
6. Encoding Cyclical Options with Sine/Cosine
Cyclical options like day of the week or month of the 12 months current an issue for machine studying fashions. That is the case as a result of the top of the cycle (Saturday, day 5, is much from Sunday, day 6, numerically, which might trigger confusion). To raised deal with this, we are able to remodel them into two dimensions utilizing sine and cosine transformations; this preserves the cyclical nature of the connection.
# From our earlier part “Extracting Datetime Elements”
df[‘day_of_week’] = df.index.dayofweek
df[‘month’] = df.index.month
# Day of week has a cycle of seven days
df[‘day_of_week_sin’] = np.sin(2 * np.pi * df[‘day_of_week’] / 7)
df[‘day_of_week_cos’] = np.cos(2 * np.pi * df[‘day_of_week’] / 7)
# Month has a cycle of 12 months
df[‘month_sin’] = np.sin(2 * np.pi * df[‘month’] / 12)
df[‘month_cos’] = np.cos(2 * np.pi * df[‘month’] / 12)
print(df.head())
# From our earlier part “Extracting Datetime Elements”
df[‘day_of_week’] = df.index.dayofweek
df[‘month’] = df.index.month
# Day of week has a cycle of seven days
df[‘day_of_week_sin’] = np.sin(2 * np.pi * df[‘day_of_week’] / 7)
df[‘day_of_week_cos’] = np.cos(2 * np.pi * df[‘day_of_week’] / 7)
# Month has a cycle of 12 months
df[‘month_sin’] = np.sin(2 * np.pi * df[‘month’] / 12)
df[‘month_cos’] = np.cos(2 * np.pi * df[‘month’] / 12)
print(df.head())
Output:
gross sales day_of_week month day_of_week_sin day_of_week_cos month_sin month_cos
date
2025-07-01 88 1 7 0.781831 0.623490 -0.5 -0.866025
2025-07-02 78 2 7 0.974928 -0.222521 -0.5 -0.866025
2025-07-03 64 3 7 0.433884 -0.900969 -0.5 -0.866025
2025-07-04 92 4 7 -0.433884 -0.900969 -0.5 -0.866025
2025-07-05 57 5 7 -0.974928 -0.222521 -0.5 -0.866025
gross sales day_of_week month day_of_week_sin day_of_week_cos month_sin month_cos
date
2025–07–01 88 1 7 0.781831 0.623490 –0.5 –0.866025
2025–07–02 78 2 7 0.974928 –0.222521 –0.5 –0.866025
2025–07–03 64 3 7 0.433884 –0.900969 –0.5 –0.866025
2025–07–04 92 4 7 –0.433884 –0.900969 –0.5 –0.866025
2025–07–05 57 5 7 –0.974928 –0.222521 –0.5 –0.866025
This transformation helps fashions perceive that December (month 12) is simply as near January (month 1) as February (month 2) is.
7. Creating Interplay Options
Lastly, let’s check out how we are able to create interacting options by combining two or extra current options, which may help seize extra complicated relationships. For instance, a mannequin may profit from realizing if it’s a “weekday morning” versus a “weekend morning.”
# From our earlier part “Calculating Rolling Window Statistics”
df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()
# A characteristic for the distinction between a day’s gross sales and the 3-day rolling common
df[‘sales_vs_rolling_mean’] = df[‘sales’] – df[‘rolling_mean_3’]
print(df.head())
# From our earlier part “Calculating Rolling Window Statistics”
df[‘rolling_mean_3’] = df[‘sales’].rolling(window=3).imply()
# A characteristic for the distinction between a day’s gross sales and the 3-day rolling common
df[‘sales_vs_rolling_mean’] = df[‘sales’] – df[‘rolling_mean_3’]
print(df.head())
Output:
gross sales rolling_mean_3 sales_vs_rolling_mean
date
2025-07-01 88 NaN NaN
2025-07-02 78 NaN NaN
2025-07-03 64 76.666667 -12.666667
2025-07-04 92 78.000000 14.000000
2025-07-05 57 71.000000 -14.000000
gross sales rolling_mean_3 sales_vs_rolling_mean
date
2025–07–01 88 NaN NaN
2025–07–02 78 NaN NaN
2025–07–03 64 76.666667 –12.666667
2025–07–04 92 78.000000 14.000000
2025–07–05 57 71.000000 –14.000000
The probabilities for such interacting options are limitless. The larger your area information and creativity, the extra insightful these options can grow to be.
Wrapping Up
Time-series characteristic engineering is equal components artwork and science. Area experience is undeniably invaluable, however so is a powerful command of instruments like Pandas to assist present the muse for creating options that may assist enhance mannequin efficiency and in the end resolve issues.
The seven tips lined right here — from extracting datetime parts to creating complicated interactions — are highly effective constructing blocks for any time-series evaluation or forecasting process. By profiting from Pandas and its highly effective time-series capabilities, you’ll be able to extra successfully uncover the hidden patterns inside your temporal knowledge.


