apply() in Pandas
apply() in Pandas is used to apply a function(e.g. lambda function) to a DataFrame or Series. This is highly useful in various Machine Learning and Data Analysis projects where we need to separate data based on certain conditions or apply lambda functions to a DataFrame or Series.
DataFrame – apply()
Using Pandas apply(), we can apply a function along an axis of a DataFrame. The function is applied along each row if axis=0, and it is applied along each column if axis=1.
The syntax is:
DataFrame.apply(func, axis, raw, result_type, args, **kwds) |
The parameters are:
- func : the function to be applied
- axis : axis along which the function is applied; for rows – 0 or ‘index’; for columns – 1 or ‘columns’
- raw : (bool) if False, then row or column is passed as a series to the function, otherwise it is passed as ndarray objects (default: False)
- result_type : (‘expand’, ‘reduce’, or ‘broadcast’) can only be applied on columns or axis=1; ‘expand’ makes list-like results turn into columns, ‘reduce’ returns a Series; ‘broadcast’ results will retain the original index and columns. (default: None)
- args : (tuple) positional arguments to pass to the function.
- **kwds : additional keyword arguments that can be passed to the function
Let’s look at various examples of the result of applying Pandas apply() on DataFrame.
Consider the following DataFrame:
import numpy as np import pandas as pd data = [[5,14],[8,12],[2,13]] df = pd.DataFrame(data, columns=[‘col1′,’col2’], index=[‘row1′,’row2′,’row3’]) print(df) |
Our DataFrame looks like:
col1 col2 row1 5 14 row2 8 12 row3 2 13 |
Let us apply a custom function to the DataFrame values. The function is:
# user function which takes an argument n def myfunc(n): return ((2*n)-3) |
Now apply this function to the DataFrame df:
df.apply(myfunc) |
As a result, the function gets applied to each and every value of the DataFrame:
col1 col2 row1 7 25 row2 13 21 row3 1 23 |
Now let’s apply some built-in NumPy functions:
df.apply(np.sqrt) |
Output:
col1 col2 row1 2.236068 3.741657 row2 2.828427 3.464102 row3 1.414214 3.605551 |
We can also apply a function to each row or each column as:
df.apply(np.sum, axis=’index’) |
The result of summing entries of each row is:
col1 15 col2 39 dtype: int64 |
We can also apply list-like values using a lambda function.
df.apply(lambda x: [3, 15], axis=1) |
This results in:
col1 col2 row1 3 15 row2 3 15 row3 3 15 |
Series – apply()
We can also apply a function on the values of a Series using the apply() function. The syntax for apply() on series is:
Series.apply(func, convert_dtype, args, **kwds) |
The parameters are:
- func : the Python function to be applied
- convertt_dtype : (bool) if True, it tries to find better dtype for elementwise function results (default: True)
- args : (tuple) positional arguments to pass to the function.
- **kwds : additional keyword arguments that can be passed to the function
Consider the following Series:
import numpy as np import pandas as pd data = [5,8,12,2] series = pd.Series(data, index=[‘s1′,’s2′,’s3′,’s4’]) |
This gives the following Series:
s1 5 s2 8 s3 12 s4 2 dtype: int64 |
Applying the function to this Series:
series.apply(np.sqrt) |
This gives:
s1 2.236068 s2 2.828427 s3 3.464102 s4 1.414214 dtype: float64 |
Another example:
series.apply(lambda x: x ** 2) |
Output:
s1 25 s2 64 s3 144 s4 4 dtype: int64 |
Similarly, we can apply various types of built-in and user-defined functions to the Series.
Summary
In this article, we looked at the apply() function of Pandas. The next article will focus on map() and reduce() operations.