In this article, we will work with Strings in Pandas DataFrames and Series. Pandas library provides some built-in string functions for manipulating data.
Let’s create a Pandas Series with String values.
import pandas as pd import numpy as np series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3’]) print(series) |
Output:
0 car 1 DOG 2 NaN 3 Python Pandas 4 Ask11 5 27 6 np@3 dtype: object |
We can see that the dtype of this is ‘object’. We can convert the given Series or DataFrame to ‘string’ dtype.
print(series.astype(‘string’)) |
Or, also:
series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3′], dtype=’string’) print(series) |
The above two codes will return the same output:
0 car 1 DOG 2 <NA> 3 Python Pandas 4 Ask11 5 27 6 np@3 dtype: string |
Note: The above two conversions work only on Python-2 and not on Python-3
Converts all uppercase strings to lowercase, and returns the series with lowercase.
series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3’]) print(series.str.lower()) |
Output:
0 car 1 dog 2 NaN 3 python pandas 4 ask11 5 27 6 np@3 dtype: object |
Converts all lowercase strings to uppercase, and returns the series with lowercase.
series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘np@3’]) print(series.str.upper()) |
Output:
0 CAR 1 DOG 2 NaN 3 PYTHON PANDAS 4 ASK11 5 27 6 NP@3 dtype: object |
Use to split each string in the Series or DataFrame with the given pattern, and then returns the list containing elements which were separated by that pattern.
series = pd.Series([‘car’, ’11 12 13′, np.nan, ‘Python Pandas’, ‘Ask11 np@3’]) print(series.str.split(‘ ‘)) |
Output:
0 [car] 1 [11, 12, 13] 2 NaN 3 [Python, Pandas] 4 [Ask11, np@3] dtype: object |
Removes leading or trailing spaces in the strings.
series = pd.Series([‘car ‘, ‘ 11 ‘, np.nan, ‘Python Pandas’, ‘Ask11 np@3’]) print(series.str.strip()) |
Output:
0 car 1 11 2 NaN 3 Python Pandas 4 Ask11 np@3 dtype: object |
Concatenates each string in the Index of the DataFrame or series with the specified separator. Returns the concatenated string.
series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘np@3’]) print(series.str.cat(sep=’ ‘)) |
Output:
car 11 Python Pandas Ask11 np@3 |
Returns length of each string in the Series or the Index of the DataFrame.
series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘np@3’]) print(series.str.len()) |
Output:
0 3.0 1 2.0 2 NaN 3 13.0 4 5.0 5 4.0 dtype: float64 |
Returns true if all alphabetical characters in each string in the Series or the Index of the DataFrame is lowercase.
series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘np@3’]) print(series.str.islower()) |
Output:
0 True 1 False 2 NaN 3 False 4 False 5 True dtype: object |
Returns true if all alphabetical characters in each string in the Series or the Index of the DataFrame is uppercase.
series = pd.Series([‘cAr’, ‘TOM’, np.nan, ‘Python Pandas’, ‘ASK11’, ‘np@3’]) print(series.str.isupper()) |
Output:
0 False 1 True 2 NaN 3 False 4 True 5 False dtype: object |
Returns true if all characters in each string in the Series or the Index of the DataFrame is numeric.
series = pd.Series([‘cAr’, ’11’, np.nan, ’21 63′, ‘ASK11’, ‘56.3’]) print(series.str.isnumeric()) |
Output:
0 False 1 True 2 NaN 3 False 4 False 5 False dtype: object |
Returns true if the string in the Series or DataFrame Index starts with the given pattern.
series = pd.Series([‘cAr’, ‘ATM’, np.nan, ‘Python Pandas’, ‘ASK11’, ‘np@3’]) print(series.str.startswith(‘A’)) |
Output:
0 False 1 True 2 NaN 3 False 4 True 5 False dtype: object |
Returns true if the string in the Series or DataFrame Index ends with the given pattern.
series = pd.Series([‘car’, ‘very far’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘np@3’]) print(series.str.endswith(‘ar’)) |
Output:
0 True 1 True 2 NaN 3 False 4 False 5 False dtype: object |
This function returns One-Hot Encoded values in a DataFrame. The value is 1 for that element’s relative index else 0.
series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘np@3’]) print(series.str.get_dummies()) |
Output:
11 ASK11r Python Pandas car np@3 0 0 0 0 1 0 1 1 0 0 0 0 2 0 0 0 0 0 3 0 0 1 0 0 4 0 1 0 0 0 5 0 0 0 0 1 |
Replaces the first argument value with the second argument value.
series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘np@3’]) print(series.str.replace(’11’,’*123′)) |
Output:
0 car 1 *123 2 NaN 3 Python Pandas 4 ASK*123r 5 np@3 dtype: object |
Repeats each string by the given number of repetitions.
series = pd.Series([‘car’, ’11 ‘, np.nan, ‘Py’, ‘ASK 11r’, ‘np@3’]) print(series.str.repeat(3)) |
Output:
0 carcarcar 1 11 11 11 2 NaN 3 PyPyPy 4 ASK 11rASK 11rASK 11r 5 np@3np@3np@3 dtype: object |
Returns count of the given pattern in each element in Series or Data-Frame.
series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘npa@3’]) print(series.str.count(‘a’)) |
Output:
0 1.0 1 0.0 2 NaN 3 2.0 4 0.0 5 1.0 dtype: float64 |
Returns the position where the specified pattern first occurs.
series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘npa@3’]) print(series.str.find(‘a’)) |
Output:
0 1.0 1 -1.0 2 NaN 3 0.0 4 -1.0 5 2.0 dtype: float64 |
Returns list of all occurrences of the specified pattern.
series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘npa@3’]) print(series.str.findall(‘a’)) |
Output:
0 [a] 1 [] 2 NaN 3 [a, a] 4 [] 5 [a] dtype: object |
Converts uppercase to lowercase and vice-versa.
series = pd.Series([‘car’, ’11 ‘, np.nan, ‘PyPy’, ‘ASK 11r’, ‘Npa@3’]) print(series.str.swapcase()) |
Output:
0 CAR 1 11 2 NaN 3 pYpY 4 ask 11R 5 nPA@3 dtype: object |
In this articl, we worked with Srings in Pandas. Next article will focus on Pandas Data Visualization.
In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…
Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…
In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…
Analyze employee churn, Why employees are leaving the company, and How to predict, who will…
Airflow operators are core components of any workflow defined in airflow. The operator represents a…
Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…