Working with Pandas Date and Time
Date and Time are commonly occurring and one of the important features in Data Science and Machine Learning problems. We often come across time-series data or problems regarding stock market predictions where we need to work with Date and Time functionalities. Pandas provide a wide variety of features to work with Date and Time data. In this article, we will learn to work with Date and Time manipulation with Pandas.
Convert timestamp to Pandas DateTime
Pandas can be used to parse a flexibly formatted string date from various sources and formats. Pandas to_datetime() function is used for such purposes.
Example-1:
dt = pd.to_datetime(“13/10/2020”) print(dt) |
Output:
2020-10-13 00:00:00 |
Example-2:
dt = pd.to_datetime(“13/10/2020 143045”) print(dt) |
Output:
2020-10-13 14:30:45 |
Example-3:
dt = pd.to_datetime(“13th of October, 2020”) print(dt) |
Output:
2020-10-13 00:00:00 |
Date Ranges
We can create a sequence of Dates and Times ranges of certain fixed frequency using the Pandas date_range() function.
For example, the following generates a sequence of days from 13th October 2020 to 20th October 2020.
dt = pd.date_range(start=’13/10/2020′, end=’20/10/2020′, freq=’D’) print(dt) |
The result is:
DatetimeIndex([‘2020-10-13’, ‘2020-10-14’, ‘2020-10-15’, ‘2020-10-16’, ‘2020-10-17’, ‘2020-10-18’, ‘2020-10-19’, ‘2020-10-20’], dtype=’datetime64[ns]’, freq=’D’) |
Here, freq=‘D’ implies the implies the intervals are a day ahead of the previous one. We can also set this as month(M), day(D), etc. We can also put the required number of DateTime values from the initial values using the ‘preiods’ parameter. For example:
dt = pd.date_range(start=’31/03/2020′, periods=5, freq=’M’) print(dt) |
Output:
DatetimeIndex([‘2020-03-31’, ‘2020-04-30’, ‘2020-05-31′,’2020-06-30’, ‘2020-07-31′], dtype=’datetime64[ns]’, freq=’M’) |
The DateTime information can also be manipulated with time zone information using Pandas tz_localize() operation. For example:
dt = pd.date_range(start=’31/03/2020′, periods=5, freq=’H’) print(dt.tz_localize(‘UTC’)) |
Output:
DatetimeIndex([‘2020-03-31 00:00:00+00:00’, ‘2020-03-31 01:00:00+00:00’, ‘2020-03-31 02:00:00+00:00’, ‘2020-03-31 03:00:00+00:00’, ‘2020-03-31 04:00:00+00:00′], dtype=’datetime64[ns, UTC]’, freq=’H’) |
We can also convert these date_ranges into DateTime features. Look at the code below:
import pandas as pd # creating a DataFrame dt = pd.DataFrame() # creating features dt[‘Date’] = pd.date_range(‘2020/10/31′, periods=5, freq =’M’) dt[‘Day’] = dt[‘Date’].dt.day dt[‘Month’] = dt[‘Date’].dt.month dt[‘Year’] = dt[‘Date’].dt.year print(dt) |
Output:
Date Day Month Year 0 2020-10-31 31 10 2020 1 2020-11-30 30 11 2020 2 2020-12-31 31 12 2020 3 2021-01-31 31 1 2021 4 2021-02-28 28 2 2021 |
Timestamp and Period
Timestamp and Period are time span data structures.
Timestamped data is the most basic type of data of Time Series, using the points in time. Date and Time arithmetic can be performed on Timestamped data using Pandas.
Example:
dt = pd.Timestamp(2020, 10, 31) print(dt) |
Output:
Timestamp(‘2020-10-31 00:00:00’) |
Period is useful in cases where instead of representing exact DateTime, we need to represent a time span.
Example:
dt = pd.Period(‘2020-10’) print(dt) |
Output:
Period(‘2020-10’) |
We can use Timestamp to get the present time.
dt = pd.Timestamp.now() print(dt) |
Returns the present time:
2020-10-12 23:24:18.762762 |
Summary
In this article, we worked with Pandas Date and Time. In the upcoming article, we will look at String columns in Pandas.