Pandas time series

Pandas Time Processing Basics

Python Pandas provides four types of objects for generating date times: datetime, Timedelta, Period, and DateOffset

  • DateTime specific date and time with time zone support. Similar to datetime.datetime in the Python standard library. Such as December 12, 2020, 16:07:25 seconds.

  • Timedelta absolute duration, used to add specified increments to the specified point in time, such as adding 3 days, adding 3 months, decreasing 4 hours, etc.

  • Period: the time defined by the point in time and its associated cycle, such as the continuous generation of a time series of four quarters of the year.

  • DateOffset: the relative duration calculated by the calendar, indicating the time interval, the length between two points in time, such as days, weeks, months, quarters, years

Related course: Data Analysis with Python Pandas

datetime

Get the current time

Get the time of the current moment is to get the data related to time at this moment, in addition to the specific year, month, day, hour, minute, second, will also look at the year, month, week, day and other indicators separately.

The return date and time of the current moment is implemented in Python with the function now().

from datetime import datetime
datetime.now()

# datetime.time(datetime.now())   # optional

Things you can do with the datetime object

Return the year, month and day

Returning the year with Python

datetime.now().year

Return the current month in Python

datetime.now().month

Return to the current day

datetime.now().day

Returning the week day

There are two pieces of data related to the week of the current moment, one is the number of the week the current moment is in, and the other returns the number of the week the current moment is in within the week of the year.

Returning the current day of the week is done in Python with the weekday() function.

datetime.now().weekday() + 1

Note

The number of weeks in Python starts from 0, so add 1 to the end

Returns the week number of the current week in Python, using the isocalendar() function.

datetime.now().isocalendar()

Timestamp creates a datetime object

The Pandas library also provides a similar datetime object pd.

import pandas as pd
pd.Timestamp("2020-12-15")
pd.Timestamp(year=2020,month=12,day=15,hour=15,minute=57,second=22) 

Specifying the date and time format

Set the date and time to display only the date with the help of the date() function.

datetime.now().date()
datetime.date(2020, 12, 10)

Set the date and time to display only the time with the help of the time() function.

datetime.now().time()

The time and date formats can be customized with the help of the strftime() function.

strftime() is a function that converts the date and time formats into certain custom formats, the specific formats are as follows:

Symbol Meaning
%H Hour (24 hour clock)
%I Hour (12 hour clock)
%M Minute
%S Second
%w weekday as decimal number (0,1,2..)
%U week number of the year (sunday as first day of the week)
%W week number of the year (monday as first day of the week)
%F date %Y/$m/%d
%D date %m/%m/%y

Example use:

datetime.now().strftime("%F %H:%M:%S")

Would return the format like 2020-12-15 16:00:00.

Time Series

Data records based on time series can be recorded by the index value of Series, DataFrame to record the time point, you can record the data elements corresponding to the time point synchronously.

Specify the time point to establish the Series object of time series

Timestamp and pd.DatetimeIndex methods to create the Series object of the specified time series

You can manually add pd.Timestamp to your pd.Series like this:

import pandas as pd
import numpy as np

# define series containing timestamps
t = pd.Series(np.arange(4),index=[pd.Timestamp("2020-12-1"),pd.Timestamp("2020-12-2"),pd.Timestamp("2020-12-3"),pd.Timestamp("2020-12-4")])

# output
print(t)
print("="*16)

This outputs the specified dates:

~$ python3 test.py
2020-12-01    0
2020-12-02    1
2020-12-03    2
2020-12-04    3
dtype: int32
================

You can define it with a DatetimeIndex, which is a bit easier to read:

import pandas as pd
import numpy as np

# time series
t = pd.Series(np.arange(4),index=pd.DatetimeIndex(["2020-12-1","2020-12-2","2020-12-3","2020-12-4"]))

# output
print(t)
print("="*16)

Warning

You can define every date manually using DatetimeIndex, but its easy to miss a date when adding a lot of them. Instead I recommend you to use date_range().

Time range function to create a time series

The pd.date_range() function is used to generate a continuous time series

import pandas as pd
import numpy as np

# time series
t = pd.Series(np.arange(4),index=pd.date_range(start="2020-12-01",end="2020-12-04"))

# output
print(t)
print("="*16)

This outputs the same thing,

2020-12-01    0
2020-12-02    1
2020-12-03    2
2020-12-04    3
Freq: D, dtype: int32
================

The function pd.date_range returns a DatetimeIndex. You can instantly get all dates between two dates:

print( pd.date_range(start="2021-01-01",end="2021-11-11") )

This returns:

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10',
               ...
               '2021-11-02', '2021-11-03', '2021-11-04', '2021-11-05',
               '2021-11-06', '2021-11-07', '2021-11-08', '2021-11-09',
               '2021-11-10', '2021-11-11'],
              dtype='datetime64[ns]', length=315, freq='D')

The parse() function

The parse() function converts a string format to a time format

Convert string format to time format using the parse() function

from dateutil.parser import parse
time ="2020-12-15"
print(type(time))
print("="*16)
print(parse(time))
print("="*16)
print(type(parse(time)))

In the output you see the datatype has changed:

<class 'str'>
================
2020-12-15 00:00:00
================
<class 'datetime.datetime'>

The str method converts time format to string format

Use the str() function to convert a time format to a string format

now = str(datetime.now())
type(now)

Timedelta

Timedelta is a subclass of datetime and is used to provide time increment calculation

String increase

String form to increase or decrease the date and hour

import pandas as pd
import datetime as dt
import numpy as np
today = dt.datetime.now()
print(today)
print("="*16)
today.date() + pd.Timedelta("3 day") 

You can also use time:

today + pd.Timedelta("3 hours") 
today + pd.Timedelta("-3 hours") 

time form

You can use this format too:

today + pd.Timedelta(2 ,unit="hours")   
today + pd.Timedelta(2 ,unit="W")   

datetime.datetime

You can specify weeks directly as parameter:

today + pd.Timedelta(weeks=2)
today + pd.Timedelta(np.timedelta64(2,"W"))

Time Span

Period represents a range of time, such as a day, a month, a quarter, a year, etc.

The time period is represented by the pd.Period object in pandas. pd Period_range() generates a continuous sequence of time periods object PeriodsIndex

Time period creation

Time period creation allows more flexible control of time periods such as year and month.

With simple addition, you can increase the time unit:

>>> import pandas as pd
>>> N = pd.Period("2021-01",freq="M")
>>> N
Period('2021-01', 'M')
>>> N+2
Period('2021-03', 'M')
>>> N+3
Period('2021-04', 'M')
>>> N+4
Period('2021-05', 'M')

Time Period Sequence

A time sequence can be generated with the pd.period_range() function

N = pd.period_range("2021-01-01","2021-12-12",freq="M")

This outputs every month in between, because we used freq="M"

PeriodIndex(['2021-01', '2021-02', '2021-03', '2021-04', '2021-05', '2021-06',
             '2021-07', '2021-08', '2021-09', '2021-10', '2021-11', '2021-12'],
            dtype='period[M]', freq='M')
>>> 

If you want quarters instead, change it to freq="Q"

N = pd.period_range("2021-01-01","2021-12-12",freq="Q")

It will output the quarters of the year:

PeriodIndex(['2021Q1', '2021Q2', '2021Q3', '2021Q4'], dtype='period[Q-DEC]', freq='Q-DEC')

DateOffset

Standard kind of date increment used for a date range.

Note

Date offsets follow the calendar duration rules more closely, e.g. Dateoffset always increases to the same time of the specified day when adding days, ignoring the time differences caused by daylight saving time, etc.: while Timedelta() increases by 24 hours per day when adding days

>>> import pandas as pd
>>> t1 = pd.Timestamp("2020-12-15")
>>> t1 = t1 + pd.DateOffset(n=2,months=4)
>>> t1
Timestamp('2021-08-15 00:00:00')