Working with Strings in Pandas

October 16, 2020 Pallavi Pandey

In this article, we will work with Strings in Pandas DataFrames and Series. Pandas library provides some built-in string functions for manipulating data.

Let’s create a Pandas Series with String values.

import pandas as pd
import numpy as np

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3’])
print(series)

Output:

0 car
1 DOG
2 NaN
3 Python Pandas
4 Ask11
5 27
6 np@3
dtype: object

We can see that the dtype of this is ‘object’. We can convert the given Series or DataFrame to ‘string’ dtype.

print(series.astype(‘string’))

Or, also:

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3′], dtype=’string’)
print(series)

The above two codes will return the same output:

0 car
1 DOG
2 <NA>
3 Python Pandas
4 Ask11
5 27
6 np@3
dtype: string

Note: The above two conversions work only on Python-2 and not on Python-3

String Operations

lower()

Converts all uppercase strings to lowercase, and returns the series with lowercase.

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3’])
print(series.str.lower())

Output:

0 car
1 dog
2 NaN
3 python pandas
4 ask11
5 27
6 np@3
dtype: object

upper()

Converts all lowercase strings to uppercase, and returns the series with lowercase.

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3’])
print(series.str.upper())

Output:

0 CAR
1 DOG
2 NaN
3 PYTHON PANDAS
4 ASK11
5 27
6 NP@3
dtype: object

split()

Use to split each string in the Series or DataFrame with the given pattern, and then returns the list containing elements which were separated by that pattern.

series = pd.Series([‘car’, ’11 12 13′, np.nan, ‘Python Pandas’, ‘Ask11 np@3’])
print(series.str.split(‘ ‘))

Output:

0 [car]
1 [11, 12, 13]
2 NaN
3 [Python, Pandas]
4 [Ask11, np@3]
dtype: object

strip()

Removes leading or trailing spaces in the strings.

series = pd.Series([‘car ‘, ‘ 11 ‘, np.nan, ‘Python Pandas’, ‘Ask11 np@3’])
print(series.str.strip())

Output:

0 car
1 11
2 NaN
3 Python Pandas
4 Ask11 np@3
dtype: object

cat()

Concatenates each string in the Index of the DataFrame or series with the specified separator. Returns the concatenated string.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘np@3’])
print(series.str.cat(sep=’ ‘))

Output:

car 11 Python Pandas Ask11 np@3

len()

Returns length of each string in the Series or the Index of the DataFrame.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘np@3’])
print(series.str.len())

Output:

0 3.0
1 2.0
2 NaN
3 13.0
4 5.0
5 4.0
dtype: float64

islower()

Returns true if all alphabetical characters in each string in the Series or the Index of the DataFrame is lowercase.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘np@3’])
print(series.str.islower())

Output:

0 True
1 False
2 NaN
3 False
4 False
5 True
dtype: object

isupper()

Returns true if all alphabetical characters in each string in the Series or the Index of the DataFrame is uppercase.

series = pd.Series([‘cAr’, ‘TOM’, np.nan, ‘Python Pandas’, ‘ASK11’, ‘np@3’])
print(series.str.isupper())

Output:

0 False
1 True
2 NaN
3 False
4 True
5 False
dtype: object

isnumeric()

Returns true if all characters in each string in the Series or the Index of the DataFrame is numeric.

series = pd.Series([‘cAr’, ’11’, np.nan, ’21 63′, ‘ASK11’, ‘56.3’])
print(series.str.isnumeric())

Output:

0 False
1 True
2 NaN
3 False
4 False
5 False
dtype: object

startswith()

Returns true if the string in the Series or DataFrame Index starts with the given pattern.

series = pd.Series([‘cAr’, ‘ATM’, np.nan, ‘Python Pandas’, ‘ASK11’, ‘np@3’])
print(series.str.startswith(‘A’))

Output:

0 False
1 True
2 NaN
3 False
4 True
5 False
dtype: object

endswith()

Returns true if the string in the Series or DataFrame Index ends with the given pattern.

series = pd.Series([‘car’, ‘very far’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘np@3’])
print(series.str.endswith(‘ar’))

Output:

0 True
1 True
2 NaN
3 False
4 False
5 False
dtype: object

get_dummies()

This function returns One-Hot Encoded values in a DataFrame. The value is 1 for that element’s relative index else 0.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘np@3’])
print(series.str.get_dummies())

Output:

11 ASK11r Python Pandas car np@3
0 0 0 0 1 0
1 1 0 0 0 0
2 0 0 0 0 0
3 0 0 1 0 0
4 0 1 0 0 0
5 0 0 0 0 1

replace()

Replaces the first argument value with the second argument value.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘np@3’])
print(series.str.replace(’11’,’*123′))

Output:

0 car
1 *123
2 NaN
3 Python Pandas
4 ASK*123r
5 np@3
dtype: object

repeat()

Repeats each string by the given number of repetitions.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘Py’, ‘ASK 11r’, ‘np@3’])
print(series.str.repeat(3))

Output:

0 carcarcar
1 11 11 11
2 NaN
3 PyPyPy
4 ASK 11rASK 11rASK 11r
5 np@3np@3np@3
dtype: object

count()

Returns count of the given pattern in each element in Series or Data-Frame.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘npa@3’])
print(series.str.count(‘a’))

Output:

0 1.0
1 0.0
2 NaN
3 2.0
4 0.0
5 1.0
dtype: float64

find()

Returns the position where the specified pattern first occurs.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘npa@3’])
print(series.str.find(‘a’))

Output:

0 1.0
1 -1.0
2 NaN
3 0.0
4 -1.0
5 2.0
dtype: float64

findall()

Returns list of all occurrences of the specified pattern.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘npa@3’])
print(series.str.findall(‘a’))

Output:

0 [a]
1 []
2 NaN
3 [a, a]
4 []
5 [a]
dtype: object

Swapcase()

Converts uppercase to lowercase and vice-versa.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘PyPy’, ‘ASK 11r’, ‘Npa@3’])
print(series.str.swapcase())

Output:

0 CAR
1 11
2 NaN
3 pYpY
4 ask 11R
5 nPA@3
dtype: object

Summary

In this articl, we worked with Srings in Pandas. Next article will focus on Pandas Data Visualization.

String Operations

lower()

upper()

split()

strip()

cat()

len()

islower()

isupper()

isnumeric()

startswith()

endswith()

get_dummies()

replace()

repeat()

count()

find()

findall()

Swapcase()

Summary

You May Also Like

Feature Scaling: MinMax, Standard and Robust Scaler

Spectral Clustering

apply() in Pandas

Leave a Reply Cancel reply