Resampling time series in Pandas: resample and asfreq methods
2021-06-27
Measure the correlation between numerical and categorical variables and the correlation between two categorical variables in Python: Chi-Square and ANOVA
2021-07-02
Show all

Understanding Dates, Times, Periods, and Time Zones in Pandas

15 mins read

Introduction 

Time-series data is quite common among many datasets related to fields like finance, geography, earthquakes, healthcare, etc. Properly interpreting time-series data and handling requires good knowledge of generating properly formatted datetime related columns. Pandas provides a list of tools that helps us convert data to proper datetime format, generate a new range of datetime, and many other datetime manipulation functions. We’ll be exploring further how to properly handle datetime data.

1. Date Ranges 

Pandas provides a very helpful function date_range() that lets us generate a range of fixed frequency dates. It takes arguments like startendperiods, and freq to generate a range of dates though all of the parameters are not compulsory. We’ll explain how to generate various date ranges below with different frequencies with various examples.

import pandas as pd
import numpy as np

We can create a list of date ranges by setting startperiods and freq parameters or startend and freq parameters. If we don’t provide freq parameter value then the default value is D which refers to 1 day. It returns a list of dates as DatetimeIndex series. We’ll create date ranges by setting various strings of date formats to check which formats work with pandas date_range() function.

pd.date_range(start="2020 Jan 1", periods=5)
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="2020 January 1", periods=5)
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="1 Jan 2020", periods=5)
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="Jan 1, 2020", periods=5)
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="2020-7-1", periods=5)
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="2020/7/1", periods=5)
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05'],
              dtype='datetime64[ns]', freq='D')

We can see that all of the above examples were generated 5 days from the start date given. We can see that pandas can handle various date formats as well.

pd.date_range(start="1-7-2020", periods=5)
DatetimeIndex(['2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-11'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="7-1-2020", periods=5)
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05'],
              dtype='datetime64[ns]', freq='D')

We can see from the above two examples that the first one did not generate results as we expected. The reason behind this is that if you provide the year last then pandas assume that the first value is month then day.

Note: Please make a note that if you provide year last in date format then pandas would assume that the first value is the month and then the second day because it follows US style of date format. Please keep an eye on this when generating date ranges.

Below we have given a few more examples where we generate date ranges by setting startend and freq parameters. Pandas uses D for a day, H for an hour, S for seconds, T/min for minutes, B for business days, M for month-end, MS for month start and ms/L for milliseconds.

pd.date_range(start="1-1-2020", end="1-5-2020", freq="D")
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="1-1-2020", end="1-10-2020", freq="B")
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10'],
              dtype='datetime64[ns]', freq='B')

pd.date_range(start="1-1-2020 00:00", end="1-1-2020 5:00", freq="H")
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:00:00',
               '2020-01-01 02:00:00', '2020-01-01 03:00:00',
               '2020-01-01 04:00:00', '2020-01-01 05:00:00'],
              dtype='datetime64[ns]', freq='H')

pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="30S")
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:00:30',
               '2020-01-01 00:01:00', '2020-01-01 00:01:30',
               '2020-01-01 00:02:00', '2020-01-01 00:02:30',
               '2020-01-01 00:03:00', '2020-01-01 00:03:30',
               '2020-01-01 00:04:00', '2020-01-01 00:04:30',
               '2020-01-01 00:05:00'],
              dtype='datetime64[ns]', freq='30S')

print(pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="T"))
print(pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="2min"))
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:01:00',
               '2020-01-01 00:02:00', '2020-01-01 00:03:00',
               '2020-01-01 00:04:00', '2020-01-01 00:05:00'],
              dtype='datetime64[ns]', freq='T')
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:02:00',
               '2020-01-01 00:04:00'],
              dtype='datetime64[ns]', freq='2T')

pd.date_range(start="1-1-2020", periods=5, freq="M")
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31'],
              dtype='datetime64[ns]', freq='M')

pd.date_range(start="1-1-2020", periods=5, freq="MS")
DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01',
               '2020-05-01'],
              dtype='datetime64[ns]', freq='MS')

print(pd.date_range(start="1-1-2020", periods=5, freq="100ms"))
print(pd.date_range(start="1-1-2020", end="1-1-2020 00:00:00.500000", freq="100L"))
print(pd.date_range(start="1-1-2020", end="1-1-2020 00:00:01", freq="100L"))
DatetimeIndex([       '2020-01-01 00:00:00', '2020-01-01 00:00:00.100000',
               '2020-01-01 00:00:00.200000', '2020-01-01 00:00:00.300000',
               '2020-01-01 00:00:00.400000'],
              dtype='datetime64[ns]', freq='100L')
DatetimeIndex([       '2020-01-01 00:00:00', '2020-01-01 00:00:00.100000',
               '2020-01-01 00:00:00.200000', '2020-01-01 00:00:00.300000',
               '2020-01-01 00:00:00.400000', '2020-01-01 00:00:00.500000'],
              dtype='datetime64[ns]', freq='100L')
DatetimeIndex([       '2020-01-01 00:00:00', '2020-01-01 00:00:00.100000',
               '2020-01-01 00:00:00.200000', '2020-01-01 00:00:00.300000',
               '2020-01-01 00:00:00.400000', '2020-01-01 00:00:00.500000',
               '2020-01-01 00:00:00.600000', '2020-01-01 00:00:00.700000',
               '2020-01-01 00:00:00.800000', '2020-01-01 00:00:00.900000',
                      '2020-01-01 00:00:01'],
              dtype='datetime64[ns]', freq='100L')

Note: Please make a note that we can mix more than one frequency type as well to create complicated frequencies as explained by the below examples.

pd.date_range(start="1-1-2020", end="1-10-2020", freq="1D4H")
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-02 04:00:00',
               '2020-01-03 08:00:00', '2020-01-04 12:00:00',
               '2020-01-05 16:00:00', '2020-01-06 20:00:00',
               '2020-01-08 00:00:00', '2020-01-09 04:00:00'],
              dtype='datetime64[ns]', freq='28H')
pd.date_range(start="1-1-2020", end="1-10-2020", freq="1D4H30S")
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-02 04:00:30',
               '2020-01-03 08:01:00', '2020-01-04 12:01:30',
               '2020-01-05 16:02:00', '2020-01-06 20:02:30',
               '2020-01-08 00:03:00', '2020-01-09 04:03:30'],
              dtype='datetime64[ns]', freq='100830S')

1.1 Date Range Filtering 

We can also subscript series based on various combinations as well. We can pass various date time formats to filter a list of ranges. We’ll explain it below with a few examples.

rng = pd.date_range(start="1-1-2020", periods=6, freq="D")
ser = pd.Series(data=range(6), index=rng)
ser
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

ser["1/1/2020":"1/4/2020"]
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
Freq: D, dtype: int64

ser["1/5/2020":]
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
from datetime import datetime

ser[datetime(2020,1,3):]
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

We can also pass partial indexing like only year and month or only year and it’ll filter all values matching those combinations. We have explained it as well below with a few examples.

ser["2020-1":]
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

ser["2020":]
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

2. Timestamp 

Timestamp function lets us create an object of a particular point in time. We’ll need it to represent value which changes with different time stamps. We can create a timestamp by setting various date formats as explained by the below examples. It lets us pass values till nanoseconds.

pd.Timestamp("Jan 2020")
Timestamp('2020-01-01 00:00:00')

pd.Timestamp("12 Jan 2020")
Timestamp('2020-01-12 00:00:00')

pd.Timestamp("12 Jan 2020 20:20")
Timestamp('2020-01-12 20:20:00')

pd.Timestamp("12 Jan 2020 20:20:20.100")
Timestamp('2020-01-12 20:20:20.100000')

pd.Timestamp("12 Jan 2020 20:20:20.200000")
Timestamp('2020-01-12 20:20:20.200000')

pd.Timestamp("12 Jan 2020 20:20:20.000200000")
Timestamp('2020-01-12 20:20:20.000200')

We can create timestamps by setting year, month, day, hour, minute, second, microsecond and nanosecond them separately as well. We’ll explain it below with a few examples.

pd.Timestamp(year=2020, month=1, day=1, hour=10, minute=10, second=30, microsecond=100)
Timestamp('2020-01-01 10:10:30.000100')

pd.Timestamp(year=2020, month=1, day=1, hour=10, minute=10, second=30, microsecond=100, nanosecond=100)
Timestamp('2020-01-01 10:10:30.000100100')

We can add one timestamp to another timestamp as well as subtract one timestamp to another timestamp to move values by that much amount of time. We have explained it below with a few examples to make the concept clear. The output of time stamp addition and subtraction is time delta which we have explained in the next section.

t1 = pd.Timestamp("12 Jan 2020 12:12:45")
t2 = pd.Timestamp("12 Jan 2020 13:14:20")

(t2 -t1), (t1 - t2)
(Timedelta('0 days 01:01:35'), Timedelta('-1 days +22:58:25'))
t1 = pd.Timestamp("12 Jan 2020")
t2 = pd.Timestamp("13 Jan 2020")

(t2 - t1),
(Timedelta('1 days 00:00:00'),)

3. Timedelta 

Timedelta function lets us create a difference between the two timestamps. We might need this function to analyze how far 2 date/time values are from each other. We’ll explain below with a few examples of how to create time deltas using pandas. We can create time deltas consisting of daysdays with hour:min:seconds: nanoseconds. If we don’t provide value for a particular part then its default value will be assumed.

pd.Timedelta("1 days")
Timedelta('1 days 00:00:00')

pd.Timedelta("1 days 10:00:00")
Timedelta('1 days 10:00:00')

pd.Timedelta("1 days 10:10:00")
Timedelta('1 days 10:10:00')

pd.Timedelta("1 days 10:10:10")
Timedelta('1 days 10:10:10')

pd.Timedelta("1 days 10:10:10.100000")
Timedelta('1 days 10:10:10.100000')

We can perform addition and subtraction functions on time deltas to get combined time delta and time delta difference respectively. We’ll explain it with various examples.

t1 = pd.Timedelta("1 days 10:10:10.100000")
t2 = pd.Timedelta("2 days 10:10:10.100000")

t2 - t1
Timedelta('1 days 00:00:00')

t1 + t2
Timedelta('3 days 20:20:20.200000')

Time deltas as very useful when you want to move your timestamps by a particular time delta. We can add and subtract time deltas from timestamp to get dates moved. We have explained it below with a few examples.

pd.Timestamp("12 Jan 2020") + pd.Timedelta("1 days")
Timestamp('2020-01-13 00:00:00')

pd.Timestamp("Jan 2020") + pd.Timedelta("1 days")
Timestamp('2020-01-02 00:00:00')

pd.Timestamp("12 Jan 2020") - pd.Timedelta("1 days")
Timestamp('2020-01-11 00:00:00')

pd.Timestamp("12 Jan 2020") + pd.Timedelta("4H")
Timestamp('2020-01-12 04:00:00')

pd.Timestamp("12 Jan 2020") + pd.Timedelta("30min")
Timestamp('2020-01-12 00:30:00')

pd.Timestamp("12 Jan 2020") + pd.Timedelta("30 seconds")
Timestamp('2020-01-12 00:00:30')

We can add and subtract time deltas from date ranges as well and it’ll move all values of date ranges by that much time delta. We’ll explain it below with a few examples.

pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") + pd.Timedelta("1 days")
DatetimeIndex(['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05',
               '2020-01-06'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") - pd.Timedelta("1 days")
DatetimeIndex(['2019-12-31', '2020-01-01', '2020-01-02', '2020-01-03',
               '2020-01-04'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") + pd.Timedelta("4H30T30S")
DatetimeIndex(['2020-01-01 04:30:30', '2020-01-02 04:30:30',
               '2020-01-03 04:30:30', '2020-01-04 04:30:30',
               '2020-01-05 04:30:30'],
              dtype='datetime64[ns]', freq='D')

4. Period (TimeSpan) 

Pandas provides a Period function to represent the time span. We’ll need periods when we want to represent values that are the same throughout the period and are not changing much. Period function lets us pass freq like Timestamp function and if we don’t pass it then it’ll detect it from date format passed. We’ll explain the creation of various periods by various examples below.

pd.Period(value="1-1-2020")
Period('2020-01-01', 'D')

pd.Period("1-2020")
Period('2020-01', 'M')

pd.Period("2020")
Period('2020', 'A-DEC')

pd.Period("1-1-2020 10:00:00")
Period('2020-01-01 10:00:00', 'S')

pd.Period("1-1-2020 10:00")
Period('2020-01-01 10:00', 'T')

pd.Period("1-1-2020 10")
Period('2020-01-01 10:00', 'H')

pd.Period("1-1-2020 10:00:00.100000")
Period('2020-01-01 10:00:00.100', 'L')

We can add and subtract time deltas from periods as well. We have explained it below with a few examples.

pd.Period("1-1-2020 10:10") + pd.Timedelta("1 days")
Period('2020-01-02 10:10', 'T')

pd.Period("1-1-2020 10:10:10") - pd.Timedelta("1 days")
Period('2019-12-31 10:10:10', 'S')

pd.Period("1-1-2020 10") + pd.Timedelta("10H")
Period('2020-01-01 20:00', 'H')

pd.Period("1-1-2020 10") - pd.Timedelta("10H")
Period('2020-01-01 00:00', 'H')

5. Period Ranges 

Just like date_range() function, period_range() function lets us generate a list of periods. This function has almost all parameters the same as that date_range() function. We have explained various ways to create the period range below.

pd.period_range(start="1-1-2020", periods=5)

PeriodIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
             '2020-01-05'],
            dtype='period[D]', freq='D')

pd.period_range(start="1-1-2020", periods=5, freq="4H")

PeriodIndex(['2020-01-01 00:00', '2020-01-01 04:00', '2020-01-01 08:00',
             '2020-01-01 12:00', '2020-01-01 16:00'],
            dtype='period[4H]', freq='4H')

pd.period_range(start="1-1-2020", periods=5, freq="M")

PeriodIndex(['2020-01', '2020-02', '2020-03', '2020-04', '2020-05'], dtype='period[M]', freq='M')

pd.period_range(start="1-1-2020", periods=5, freq="T")

PeriodIndex(['2020-01-01 00:00', '2020-01-01 00:01', '2020-01-01 00:02',
             '2020-01-01 00:03', '2020-01-01 00:04'],
            dtype='period[T]', freq='T')

pd.period_range(start="1-1-2020", periods=6, freq="12min")

PeriodIndex(['2020-01-01 00:00', '2020-01-01 00:12', '2020-01-01 00:24',
             '2020-01-01 00:36', '2020-01-01 00:48', '2020-01-01 01:00'],
            dtype='period[12T]', freq='12T')

pd.period_range(start="1-1-2020", periods=4, freq="5S")

PeriodIndex(['2020-01-01 00:00:00', '2020-01-01 00:00:05',
             '2020-01-01 00:00:10', '2020-01-01 00:00:15'],
            dtype='period[5S]', freq='5S')

We can add and subtract time delta from a list of periods the same way we did with date ranges. It’ll move the list of periods by that much time delta amount. We have explained it below with a few examples.

pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("1 days")

PeriodIndex(['2020-01-02 00:00:00', '2020-01-02 00:00:05',
             '2020-01-02 00:00:10', '2020-01-02 00:00:15'],
            dtype='period[5S]', freq='5S')

pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("5 seconds")

PeriodIndex(['2020-01-01 00:00:05', '2020-01-01 00:00:10',
             '2020-01-01 00:00:15', '2020-01-01 00:00:20'],
            dtype='period[5S]', freq='5S')

pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("1min")
PeriodIndex(['2020-01-01 00:01:00', '2020-01-01 00:01:05',
             '2020-01-01 00:01:10', '2020-01-01 00:01:15'],
            dtype='period[5S]', freq='5S')

5.1 Period Range Filtering 

We can also subscript series based on various combinations as well. We can pass various date time formats to filter the list of ranges. We’ll explain it below with a few examples.

rng = pd.period_range(start="1-1-2020", periods=6, freq="D")
ser = pd.Series(data=range(6), index=rng)
ser

2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

ser["1/1/2020":"1/4/2020"]
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
Freq: D, dtype: int64

ser["1/3/2020":]

2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

from datetime import datetime

ser[datetime(2020,1,3):]
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

We can also pass partial indexing like only year and month or only year and it’ll filter all values matching those combinations. We have explained it as well below with a few examples.

ser["2020-1"]

2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

ser["2020"]

2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

6. TimeDelta Ranges 

Pandas provides a function named timedelta_range() just like date_range() and period_range() to create a range of time deltas. It lets us create a list of time deltas by following the almost same format as date_range() and period_range(). We have explained below a few examples of timedelta_range() usage.

pd.timedelta_range(start="1 day", periods=10)
TimedeltaIndex([ '1 days',  '2 days',  '3 days',  '4 days',  '5 days',
                 '6 days',  '7 days',  '8 days',  '9 days', '10 days'],
               dtype='timedelta64[ns]', freq='D')
pd.timedelta_range(start="1 day", periods=10, freq="30D")
TimedeltaIndex([  '1 days',  '31 days',  '61 days',  '91 days', '121 days',
                '151 days', '181 days', '211 days', '241 days', '271 days'],
               dtype='timedelta64[ns]', freq='30D')
pd.timedelta_range(start="1 day", periods=10, freq="10H")
TimedeltaIndex(['1 days 00:00:00', '1 days 10:00:00', '1 days 20:00:00',
                '2 days 06:00:00', '2 days 16:00:00', '3 days 02:00:00',
                '3 days 12:00:00', '3 days 22:00:00', '4 days 08:00:00',
                '4 days 18:00:00'],
               dtype='timedelta64[ns]', freq='10H')
pd.timedelta_range(start="1 day", end="2 day", freq="4H")
TimedeltaIndex(['1 days 00:00:00', '1 days 04:00:00', '1 days 08:00:00',
                '1 days 12:00:00', '1 days 16:00:00', '1 days 20:00:00',
                '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='4H')
pd.timedelta_range(start="1 hour", end="2 hour", freq="10min")
TimedeltaIndex(['01:00:00', '01:10:00', '01:20:00', '01:30:00', '01:40:00',
                '01:50:00', '02:00:00'],
               dtype='timedelta64[ns]', freq='10T')
pd.timedelta_range(start="1 min", end="5 min", freq="T")
TimedeltaIndex(['00:01:00', '00:02:00', '00:03:00', '00:04:00', '00:05:00'], dtype='timedelta64[ns]', freq='T')
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("1 days")
TimedeltaIndex([ '2 days',  '3 days',  '4 days',  '5 days',  '6 days',
                 '7 days',  '8 days',  '9 days', '10 days', '11 days'],
               dtype='timedelta64[ns]', freq='D')

We can move time delta ranges by adding or subtracting time delta from them. We have explained it below with a few examples.

pd.timedelta_range(start="1 day", periods=10) - pd.Timedelta("1 days")
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days', '5 days',
                '6 days', '7 days', '8 days', '9 days'],
               dtype='timedelta64[ns]', freq='D')
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2 days")
TimedeltaIndex([ '3 days',  '4 days',  '5 days',  '6 days',  '7 days',
                 '8 days',  '9 days', '10 days', '11 days', '12 days'],
               dtype='timedelta64[ns]', freq='D')
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H")
TimedeltaIndex([ '3 days 05:00:00',  '4 days 05:00:00',  '5 days 05:00:00',
                 '6 days 05:00:00',  '7 days 05:00:00',  '8 days 05:00:00',
                 '9 days 05:00:00', '10 days 05:00:00', '11 days 05:00:00',
                '12 days 05:00:00'],
               dtype='timedelta64[ns]', freq='D')
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H30min")
TimedeltaIndex([ '3 days 05:30:00',  '4 days 05:30:00',  '5 days 05:30:00',
                 '6 days 05:30:00',  '7 days 05:30:00',  '8 days 05:30:00',
                 '9 days 05:30:00', '10 days 05:30:00', '11 days 05:30:00',
                '12 days 05:30:00'],
               dtype='timedelta64[ns]', freq='D')

7. TimeZone 

Pandas let us specify timezone when creating date ranges, timestamps, etc. Till now, we have created all date ranges and timestamps without any time stamp set. We’ll now explore ways to set timezones and convert from one timezone to another timezone. We can specify timezone by setting string value to argument tz of date_range() and Timestamp(). Python library pytz maintains a list of all available time zone names.

from pytz import common_timezones, all_timezones

print("Number of Common Timezones : ", len(common_timezones))
print("Number of All Timezones : ", len(all_timezones))
print("Difference between all timezones and common timezones : ", list(set(all_timezones) - set(common_timezones)))

Number of Common Timezones :  440
Number of All Timezones :  592
Difference between all timezones and common timezones :  ['Etc/UCT', 'Asia/Rangoon', 'Atlantic/Faeroe', 'Asia/Ujung_Pandang', 'Etc/GMT-8', 'WET', 'GB', 'Navajo', 'Etc/UTC', 'Asia/Katmandu', 'Canada/Saskatchewan', 'EET', 'Mexico/General', 'America/Montreal', 'UCT', 'Australia/Canberra', 'GMT+0', 'Etc/GMT-3', 'Etc/GMT+8', 'HST', 'Pacific/Ponape', 'Australia/ACT', 'Pacific/Truk', 'Chile/Continental', 'Portugal', 'America/Knox_IN', 'Etc/GMT-13', 'Etc/GMT-10', 'Atlantic/Jan_Mayen', 'US/Samoa', 'Pacific/Yap', 'Brazil/East', 'Kwajalein', 'Asia/Thimbu', 'Asia/Tel_Aviv', 'America/Shiprock', 'Etc/GMT-5', 'Etc/GMT-12', 'America/Jujuy', 'Etc/GMT+10', 'Europe/Belfast', 'Australia/South', 'Etc/GMT-9', 'Australia/Yancowinna', 'Eire', 'CET', 'Mexico/BajaNorte', 'Etc/GMT+0', 'Universal', 'GMT0', 'Hongkong', 'Etc/GMT0', 'EST5EDT', 'Turkey', 'EST', 'America/Argentina/ComodRivadavia', 'Etc/GMT-1', 'Etc/Zulu', 'Jamaica', 'Asia/Saigon', 'Etc/GMT-6', 'Australia/West', 'Etc/GMT+9', 'Australia/Queensland', 'PRC', 'NZ-CHAT', 'America/Louisville', 'US/Aleutian', 'MET', 'Japan', 'NZ', 'America/Santa_Isabel', 'Etc/GMT+1', 'Etc/GMT+4', 'Poland', 'Israel', 'Etc/GMT+5', 'America/Ensenada', 'Australia/NSW', 'Antarctica/South_Pole', 'America/Porto_Acre', 'Egypt', 'Brazil/DeNoronha', 'Chile/EasterIsland', 'Etc/GMT+6', 'Africa/Asmera', 'MST', 'Etc/GMT-2', 'America/Indianapolis', 'CST6CDT', 'Asia/Chongqing', 'GMT-0', 'Cuba', 'Etc/GMT+7', 'Singapore', 'US/Michigan', 'Brazil/Acre', 'Asia/Kashgar', 'Europe/Nicosia', 'W-SU', 'Etc/GMT-14', 'Iran', 'Etc/GMT+3', 'Etc/GMT', 'Etc/GMT+11', 'Pacific/Samoa', 'Australia/North', 'Africa/Timbuktu', 'Zulu', 'Asia/Macao', 'Asia/Chungking', 'Asia/Ulan_Bator', 'MST7MDT', 'US/Indiana-Starke', 'America/Catamarca', 'America/Buenos_Aires', 'Etc/GMT-7', 'Brazil/West', 'Europe/Tiraspol', 'Mexico/BajaSur', 'Etc/GMT-4', 'Canada/Yukon', 'US/East-Indiana', 'ROC', 'America/Rosario', 'Pacific/Johnston', 'America/Fort_Wayne', 'Asia/Calcutta', 'Etc/GMT+2', 'America/Coral_Harbour', 'Australia/Victoria', 'Libya', 'Asia/Istanbul', 'Asia/Harbin', 'Asia/Ashkhabad', 'GB-Eire', 'Australia/LHI', 'Greenwich', 'Etc/GMT+12', 'Australia/Tasmania', 'America/Virgin', 'PST8PDT', 'America/Mendoza', 'ROK', 'Etc/GMT-0', 'America/Atka', 'Iceland', 'Asia/Dacca', 'Etc/GMT-11', 'America/Cordoba', 'Etc/Universal', 'Etc/Greenwich']

rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
rng.tz

rng = pd.date_range(start="1-1-2020", periods=5, freq="M", tz="US/Eastern")
rng.tz

<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

ts = pd.Timestamp("1-1-2020")
print(ts)
ts.tz

2020-01-01 00:00:00

ts = pd.Timestamp("1-1-2020", tz="Asia/Calcutta")
print(ts)
ts.tz

2020-01-01 00:00:00+05:30

<DstTzInfo 'Asia/Calcutta' IST+5:30:00 STD>

7.1 tz_localize() 

Pandas provides method tz_localize() to set a timezone for date ranges, and timestamp which does not have any timezone set previously. It returns a modified date range, timestamp with timestamp passed to tz_localize() as set.

rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31'],
              dtype='datetime64[ns]', freq='M')
Timezone :  None

rng = rng.tz_localize("US/Eastern")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-31 00:00:00-05:00', '2020-02-29 00:00:00-05:00',
               '2020-03-31 00:00:00-04:00', '2020-04-30 00:00:00-04:00',
               '2020-05-31 00:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='M')
Timezone :  US/Eastern

ts = pd.Timestamp("1-1-2020")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 00:00:00
Timezone :  None

ts = ts.tz_localize("US/Central")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 00:00:00-06:00
Timezone :  US/Central

7.2 tz_convert() 

We can convert date ranges and timestamp from one timezone to another timezone using tz_convert() method. We can pass a new timezone to tz_convert() method and it’ll return a modified date range and timestamp with time modified according to the new timezone. We’ll explain its usage with a few examples below.

ts = pd.Timestamp("1-1-2020", tz="US/Central")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 00:00:00-06:00
Timezone :  US/Central

ts = ts.tz_convert("US/Eastern")
print(ts)
print("Timezone : ", ts.tz)

2020-01-01 01:00:00-05:00
Timezone :  US/Eastern

rng = pd.date_range(start="1-1-2020", periods=5, freq="D", tz="US/Eastern")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-01 00:00:00-05:00', '2020-01-02 00:00:00-05:00',
               '2020-01-03 00:00:00-05:00', '2020-01-04 00:00:00-05:00',
               '2020-01-05 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')
Timezone :  US/Eastern

rng = rng.tz_convert("US/Central")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2019-12-31 23:00:00-06:00', '2020-01-01 23:00:00-06:00',
               '2020-01-02 23:00:00-06:00', '2020-01-03 23:00:00-06:00',
               '2020-01-04 23:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq='D')
Timezone :  US/Central

rng = rng.tz_convert("Asia/Calcutta")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-01 10:30:00+05:30', '2020-01-02 10:30:00+05:30',
               '2020-01-03 10:30:00+05:30', '2020-01-04 10:30:00+05:30',
               '2020-01-05 10:30:00+05:30'],
              dtype='datetime64[ns, Asia/Calcutta]', freq='D')
Timezone :  Asia/Calcutta

rng = rng.tz_convert("Asia/Istanbul")
print(rng)
print("Timezone : ", rng.tz)

DatetimeIndex(['2020-01-01 08:00:00+03:00', '2020-01-02 08:00:00+03:00',
               '2020-01-03 08:00:00+03:00', '2020-01-04 08:00:00+03:00',
               '2020-01-05 08:00:00+03:00'],
              dtype='datetime64[ns, Asia/Istanbul]', freq='D')
Timezone :  Asia/Istanbul

We can notice above that time has been moved when changed from one timezone to another. It takes care of daylight savings time as well. Pandas also provides a list of functions like to_datetime() which can be used to convert a list of strings to pandas date time formatted list. It also accepts formats which you can use to specify date-time format if it fails to recognize the exact time format by itself.

pd.to_datetime(["1-1-2020","1-2-2020", "1-3-2020"])
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(["1 Jan 2020","2 Jan 2020", "3 Jan 2020"], )
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(["2020/1/1","2020/1/2", "2020/1/3"])
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)

The below examples explain when you will want to use format attribute of to_datetime() method when it’s not creating the proper datetime.

pd.to_datetime(["2020/1/1","2020/2/1", "2020/3/1"])

DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(["2020/1/1","2020/2/1", "2020/3/1"], format="%Y/%d/%m")

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)

This ends the tutorial on various dates, times, and periods functionalities available with pandas.

Source:

https://coderzcolumn.com/tutorials/data-science/dates-times-and-time-zone-handling-in-python-using-pandas

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Comments are closed.