pandasPython

apply() in Pandas

apply() in Pandas is used to apply a function(e.g. lambda function) to a DataFrame or Series. This is highly useful in various Machine Learning and Data Analysis projects where we need to separate data based on certain conditions or apply lambda functions to a DataFrame or Series.

DataFrame – apply()

Using Pandas apply(), we can apply a function along an axis of a DataFrame. The function is applied along each row if axis=0, and it is applied along each column if axis=1.

The syntax is:

DataFrame.apply(func, axis, raw, result_type, args, **kwds)

The parameters are:

  • func : the function to be applied
  • axis : axis along which the function is applied; for rows – 0 or ‘index’; for columns – 1 or ‘columns’
  • raw : (bool) if False, then row or column is passed as a series to the function, otherwise it is passed as ndarray objects (default: False)
  • result_type : (‘expand’, ‘reduce’, or ‘broadcast’) can only be applied on columns or axis=1; ‘expand’ makes list-like results turn into columns, ‘reduce’ returns a Series; ‘broadcast’ results will retain the original index and columns. (default: None)
  • args : (tuple) positional arguments to pass to the function.
  • **kwds : additional keyword arguments that can be passed to the function

Let’s look at various examples of the result of applying Pandas apply() on DataFrame.

Consider the following DataFrame:

import numpy as np
import pandas as pd

data = [[5,14],[8,12],[2,13]]

df = pd.DataFrame(data, columns=[‘col1′,’col2’], index=[‘row1′,’row2′,’row3’])
print(df)

Our DataFrame looks like:

col1 col2
row1 5 14
row2 8 12
row3 2 13

Let us apply a custom function to the DataFrame values. The function is:

# user function which takes an argument n
def myfunc(n):
return ((2*n)-3)

Now apply this function to the DataFrame df:

df.apply(myfunc)

As a result, the function gets applied to each and every value of the DataFrame:

col1 col2
row1 7 25
row2 13 21
row3 1 23

Now let’s apply some built-in NumPy functions:

df.apply(np.sqrt)

Output:

col1 col2
row1 2.236068 3.741657
row2 2.828427 3.464102
row3 1.414214 3.605551

We can also apply a function to each row or each column as:

df.apply(np.sum, axis=’index’)

The result of summing entries of each row is:

col1 15
col2 39
dtype: int64

We can also apply list-like values using a lambda function.

df.apply(lambda x: [3, 15], axis=1)

This results in:

col1 col2
row1 3 15
row2 3 15
row3 3 15

Series – apply()

We can also apply a function on the values of a Series using the apply() function. The syntax for apply() on series is:

Series.apply(func, convert_dtype, args, **kwds)

The parameters are:

  • func : the Python function to be applied
  • convertt_dtype : (bool) if True, it tries to find better dtype for elementwise function results (default: True)
  • args : (tuple) positional arguments to pass to the function.
  • **kwds : additional keyword arguments that can be passed to the function

Consider the following Series:

import numpy as np
import pandas as pd

data = [5,8,12,2]
series = pd.Series(data, index=[‘s1′,’s2′,’s3′,’s4’])

This gives the following Series:

s1 5
s2 8
s3 12
s4 2
dtype: int64

Applying the function to this Series:

series.apply(np.sqrt)

This gives:

s1 2.236068
s2 2.828427
s3 3.464102
s4 1.414214
dtype: float64

Another example:

series.apply(lambda x: x ** 2)

Output:

s1 25
s2 64
s3 144
s4 4
dtype: int64

Similarly, we can apply various types of built-in and user-defined functions to the Series.

Summary

In this article, we looked at the apply() function of Pandas. The next article will focus on map() and reduce() operations.

Leave a Reply

Your email address will not be published. Required fields are marked *