Pandas time series
Pandas Time Processing Basics
Python Pandas provides four types of objects for generating date times: datetime
, Timedelta
, Period
, and DateOffset
-
DateTime
specific date and time with time zone support. Similar todatetime.datetime
in the Python standard library. Such as December 12, 2020, 16:07:25 seconds. -
Timedelta
absolute duration, used to add specified increments to the specified point in time, such as adding 3 days, adding 3 months, decreasing 4 hours, etc. -
Period
: the time defined by the point in time and its associated cycle, such as the continuous generation of a time series of four quarters of the year. -
DateOffset
: the relative duration calculated by the calendar, indicating the time interval, the length between two points in time, such as days, weeks, months, quarters, years
Related course: Data Analysis with Python Pandas
datetime
Get the current time
Get the time of the current moment is to get the data related to time at this moment, in addition to the specific year, month, day, hour, minute, second, will also look at the year, month, week, day and other indicators separately.
The return date and time of the current moment is implemented in Python with the function now().
from datetime import datetime
datetime.now()
# datetime.time(datetime.now()) # optional
Things you can do with the datetime object
Return the year, month and day
Returning the year with Python
datetime.now().year
Return the current month in Python
datetime.now().month
Return to the current day
datetime.now().day
Returning the week day
There are two pieces of data related to the week of the current moment, one is the number of the week the current moment is in, and the other returns the number of the week the current moment is in within the week of the year.
Returning the current day of the week is done in Python with the weekday() function.
datetime.now().weekday() + 1
Note
The number of weeks in Python starts from 0, so add 1 to the end
Returns the week number of the current week in Python, using the isocalendar() function.
datetime.now().isocalendar()
Timestamp creates a datetime object
The Pandas library also provides a similar datetime object pd.
import pandas as pd
pd.Timestamp("2020-12-15")
pd.Timestamp(year=2020,month=12,day=15,hour=15,minute=57,second=22)
Specifying the date and time format
Set the date and time to display only the date with the help of the date() function.
datetime.now().date()
datetime.date(2020, 12, 10)
Set the date and time to display only the time with the help of the time() function.
datetime.now().time()
The time and date formats can be customized with the help of the strftime()
function.
strftime()
is a function that converts the date and time formats into certain custom formats, the specific formats are as follows:
Symbol | Meaning |
---|---|
%H | Hour (24 hour clock) |
%I | Hour (12 hour clock) |
%M | Minute |
%S | Second |
%w | weekday as decimal number (0,1,2..) |
%U | week number of the year (sunday as first day of the week) |
%W | week number of the year (monday as first day of the week) |
%F | date %Y/$m/%d |
%D | date %m/%m/%y |
Example use:
datetime.now().strftime("%F %H:%M:%S")
Would return the format like 2020-12-15 16:00:00.
Time Series
Data records based on time series can be recorded by the index value of Series, DataFrame to record the time point, you can record the data elements corresponding to the time point synchronously.
Specify the time point to establish the Series object of time series
Timestamp and pd.DatetimeIndex methods to create the Series object of the specified time series
You can manually add pd.Timestamp
to your pd.Series
like this:
import pandas as pd
import numpy as np
# define series containing timestamps
t = pd.Series(np.arange(4),index=[pd.Timestamp("2020-12-1"),pd.Timestamp("2020-12-2"),pd.Timestamp("2020-12-3"),pd.Timestamp("2020-12-4")])
# output
print(t)
print("="*16)
This outputs the specified dates:
~$ python3 test.py
2020-12-01 0
2020-12-02 1
2020-12-03 2
2020-12-04 3
dtype: int32
================
You can define it with a DatetimeIndex
, which is a bit easier to read:
import pandas as pd
import numpy as np
# time series
t = pd.Series(np.arange(4),index=pd.DatetimeIndex(["2020-12-1","2020-12-2","2020-12-3","2020-12-4"]))
# output
print(t)
print("="*16)
Warning
You can define every date manually using DatetimeIndex, but its easy to miss a date when adding a lot of them. Instead I recommend you to use date_range()
.
Time range function to create a time series
The pd.date_range()
function is used to generate a continuous time series
import pandas as pd
import numpy as np
# time series
t = pd.Series(np.arange(4),index=pd.date_range(start="2020-12-01",end="2020-12-04"))
# output
print(t)
print("="*16)
This outputs the same thing,
2020-12-01 0
2020-12-02 1
2020-12-03 2
2020-12-04 3
Freq: D, dtype: int32
================
The function pd.date_range
returns a DatetimeIndex
. You can instantly get all dates between two dates:
print( pd.date_range(start="2021-01-01",end="2021-11-11") )
This returns:
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
'2021-01-09', '2021-01-10',
...
'2021-11-02', '2021-11-03', '2021-11-04', '2021-11-05',
'2021-11-06', '2021-11-07', '2021-11-08', '2021-11-09',
'2021-11-10', '2021-11-11'],
dtype='datetime64[ns]', length=315, freq='D')
The parse() function
The parse()
function converts a string format to a time format
Convert string format to time format using the parse() function
from dateutil.parser import parse
time ="2020-12-15"
print(type(time))
print("="*16)
print(parse(time))
print("="*16)
print(type(parse(time)))
In the output you see the datatype has changed:
<class 'str'>
================
2020-12-15 00:00:00
================
<class 'datetime.datetime'>
The str method converts time format to string format
Use the str()
function to convert a time format to a string format
now = str(datetime.now())
type(now)
Timedelta
Timedelta is a subclass of datetime and is used to provide time increment calculation
String increase
String form to increase or decrease the date and hour
import pandas as pd
import datetime as dt
import numpy as np
today = dt.datetime.now()
print(today)
print("="*16)
today.date() + pd.Timedelta("3 day")
You can also use time:
today + pd.Timedelta("3 hours")
today + pd.Timedelta("-3 hours")
time form
You can use this format too:
today + pd.Timedelta(2 ,unit="hours")
today + pd.Timedelta(2 ,unit="W")
datetime.datetime
You can specify weeks directly as parameter:
today + pd.Timedelta(weeks=2)
today + pd.Timedelta(np.timedelta64(2,"W"))
Time Span
Period represents a range of time, such as a day, a month, a quarter, a year, etc.
The time period is represented by the pd.Period
object in pandas. pd Period_range()
generates a continuous sequence of time periods object PeriodsIndex
Time period creation
Time period creation allows more flexible control of time periods such as year and month.
With simple addition, you can increase the time unit:
>>> import pandas as pd
>>> N = pd.Period("2021-01",freq="M")
>>> N
Period('2021-01', 'M')
>>> N+2
Period('2021-03', 'M')
>>> N+3
Period('2021-04', 'M')
>>> N+4
Period('2021-05', 'M')
Time Period Sequence
A time sequence can be generated with the pd.period_range()
function
N = pd.period_range("2021-01-01","2021-12-12",freq="M")
This outputs every month in between, because we used freq="M"
PeriodIndex(['2021-01', '2021-02', '2021-03', '2021-04', '2021-05', '2021-06',
'2021-07', '2021-08', '2021-09', '2021-10', '2021-11', '2021-12'],
dtype='period[M]', freq='M')
>>>
If you want quarters instead, change it to freq="Q"
N = pd.period_range("2021-01-01","2021-12-12",freq="Q")
It will output the quarters of the year:
PeriodIndex(['2021Q1', '2021Q2', '2021Q3', '2021Q4'], dtype='period[Q-DEC]', freq='Q-DEC')
DateOffset
Standard kind of date increment used for a date range.
Note
Date offsets follow the calendar duration rules more closely, e.g. Dateoffset always increases to the same time of the specified day when adding days, ignoring the time differences caused by daylight saving time, etc.: while Timedelta() increases by 24 hours per day when adding days
>>> import pandas as pd
>>> t1 = pd.Timestamp("2020-12-15")
>>> t1 = t1 + pd.DateOffset(n=2,months=4)
>>> t1
Timestamp('2021-08-15 00:00:00')