Pandas map() and reduce() Operations
In this article, we will focus on the map() and reduce() operations in Pandas and how they are used for Data Manipulation.
map()
Pandas map() operation is used to map the values of a Series according to the given input value which can either be another Series, a dictionary, or a function. map() operation does not work on a DataFrame.
Syntax:
Series.map(arg, na_action=None) |
The parameters are:
- arg : (Series, dict, or function) mapping correspondence
- na_action : (None, ‘ignore’) If ‘ignore’, then propagate NaN values, without passing them to the mapping correspondence (default: None)
Let us look at few examples of map() operation on the following Series:
import numpy as np import pandas as pd country = [‘Germany’, ‘Canada’, np.nan, ‘Japan’, ‘Australia’] series = pd.Series(country) print(series) |
This gives the following Series:
0 Germany 1 Canada 2 NaN 3 Japan 4 Australia dtype: object |
Now applying map() operations on this Series, by using a dictionary as an argument:
series.map({‘Canada’: ‘Ottawa’, ‘Japan’: ‘Tokyo’, ‘Australia’:’Canberra’}) |
Output:
0 NaN 1 Ottawa 2 NaN 3 Tokyo 4 Canberra dtype: object |
You can also map it to a function, for example:
print(series.map(‘He is from {}’.format, na_action=’ignore’)) |
Output:
0 He is from Germany 1 He is from Canada 2 NaN 3 He is from Japan 4 He is from Australia dtype: object |
If we don’t use na_action=‘ignore’ here, then it would change the line at index 2 as – “He is from nan”.
reduce()
reduce() operation is used on a Series to apply the function passed in its argument to all elements on the Series. reduce() is defined in the functools module of Python.
The way the algorithm of this function works is that initially, the function is called with the first two elements from the Series and the result is returned. The function is now applied to this result and the next element in the Series. The process keeps repeating itself until there are items in the sequence. The final result is ultimately returned by the function.
For example, consider the following series:
import pandas as pd data = [11,6,7,3,28,1] series = pd.Series(data) print(series) |
The series is:
0 11 1 6 2 7 3 3 4 28 5 1 dtype: int64 |
Now, let’s apply a function on this Series that uses reduce to find the product of all elements in the list:
# import functools module import functools # using reduce operation to apply function on the series product = functools.reduce(lambda x,y : x*y,series) print (“Product: “,product,sep=””) |
Output:
Product: 38808 |
Look at another example which uses reduce() to find minimum element of the Series:
# import functools module import functools # using reduce operation to apply function on the series minimum = functools.reduce(lambda x,y : x if x < y else y,series) print (“Minimum value: “,minimum,sep=””) |
Output:
Minimum value: 1 |
Summary
In this article, we looked at map() and reduce() functions. In the next one, we will look at ways to handle missing values in Pandas.