# Working with Strings in Pandas

In this article, we will work with Strings in Pandas DataFrames and Series. Pandas library provides some built-in string functions for manipulating data.

Let’s create a Pandas Series with String values.

import pandas as pd import numpy as np series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘[email protected]’]) print(series) |

Output:

0 car 1 DOG 2 NaN 3 Python Pandas 4 Ask11 5 27 6 [email protected] dtype: object |

We can see that the dtype of this is ‘object’. We can convert the given Series or DataFrame to ‘string’ dtype.

print(series.astype(‘string’)) |

Or, also:

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘[email protected]′], dtype=’string’) print(series) |

The above two codes will return the same output:

0 car 1 DOG 2 <NA> 3 Python Pandas 4 Ask11 5 27 6 [email protected] dtype: string |

Note: The above two conversions work only on Python-2 and not on Python-3

**String Operations**

**lower()**

Converts all uppercase strings to lowercase, and returns the series with lowercase.

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘[email protected]’]) print(series.str.lower()) |

Output:

0 car 1 dog 2 NaN 3 python pandas 4 ask11 5 27 6 [email protected] dtype: object |

**upper()**

Converts all lowercase strings to uppercase, and returns the series with lowercase.

series = pd.Series([‘car’, ‘DOG’, np.nan, ‘Python Pandas’, ‘Ask11′, ’27’, ‘[email protected]’]) print(series.str.upper()) |

Output:

0 CAR 1 DOG 2 NaN 3 PYTHON PANDAS 4 ASK11 5 27 6 [email protected] dtype: object |

**split()**

Use to split each string in the Series or DataFrame with the given pattern, and then returns the list containing elements which were separated by that pattern.

series = pd.Series([‘car’, ’11 12 13′, np.nan, ‘Python Pandas’, ‘Ask11 [email protected]’]) print(series.str.split(‘ ‘)) |

Output:

0 [car] 1 [11, 12, 13] 2 NaN 3 [Python, Pandas] 4 [Ask11, [email protected]] dtype: object |

**strip()**

Removes leading or trailing spaces in the strings.

series = pd.Series([‘car ‘, ‘ 11 ‘, np.nan, ‘Python Pandas’, ‘Ask11 [email protected]’]) print(series.str.strip()) |

Output:

0 car 1 11 2 NaN 3 Python Pandas 4 Ask11 [email protected] dtype: object |

**cat()**

Concatenates each string in the Index of the DataFrame or series with the specified separator. Returns the concatenated string.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘[email protected]’]) print(series.str.cat(sep=’ ‘)) |

Output:

car 11 Python Pandas Ask11 [email protected] |

**len()**

Returns length of each string in the Series or the Index of the DataFrame.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘[email protected]’]) print(series.str.len()) |

Output:

0 3.0 1 2.0 2 NaN 3 13.0 4 5.0 5 4.0 dtype: float64 |

**islower()**

Returns true if all alphabetical characters in each string in the Series or the Index of the DataFrame is lowercase.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘Ask11’, ‘[email protected]’]) print(series.str.islower()) |

Output:

0 True 1 False 2 NaN 3 False 4 False 5 True dtype: object |

**isupper()**

Returns true if all alphabetical characters in each string in the Series or the Index of the DataFrame is uppercase.

series = pd.Series([‘cAr’, ‘TOM’, np.nan, ‘Python Pandas’, ‘ASK11’, ‘[email protected]’]) print(series.str.isupper()) |

Output:

0 False 1 True 2 NaN 3 False 4 True 5 False dtype: object |

**isnumeric()**

Returns true if all characters in each string in the Series or the Index of the DataFrame is numeric.

series = pd.Series([‘cAr’, ’11’, np.nan, ’21 63′, ‘ASK11’, ‘56.3’]) print(series.str.isnumeric()) |

Output:

0 False 1 True 2 NaN 3 False 4 False 5 False dtype: object |

**startswith()**

Returns true if the string in the Series or DataFrame Index starts with the given pattern.

series = pd.Series([‘cAr’, ‘ATM’, np.nan, ‘Python Pandas’, ‘ASK11’, ‘[email protected]’]) print(series.str.startswith(‘A’)) |

Output:

0 False 1 True 2 NaN 3 False 4 True 5 False dtype: object |

**endswith()**

Returns true if the string in the Series or DataFrame Index ends with the given pattern.

series = pd.Series([‘car’, ‘very far’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘[email protected]’]) print(series.str.endswith(‘ar’)) |

Output:

0 True 1 True 2 NaN 3 False 4 False 5 False dtype: object |

**get_dummies()**

This function returns One-Hot Encoded values in a DataFrame. The value is 1 for that element’s relative index else 0.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘[email protected]’]) print(series.str.get_dummies()) |

Output:

11 ASK11r Python Pandas car [email protected] 0 0 0 0 1 0 1 1 0 0 0 0 2 0 0 0 0 0 3 0 0 1 0 0 4 0 1 0 0 0 5 0 0 0 0 1 |

**replace()**

Replaces the first argument value with the second argument value.

series = pd.Series([‘car’, ’11’, np.nan, ‘Python Pandas’, ‘ASK11r’, ‘[email protected]’]) print(series.str.replace(’11’,’*123′)) |

Output:

0 car 1 *123 2 NaN 3 Python Pandas 4 ASK*123r 5 [email protected] dtype: object |

**repeat()**

Repeats each string by the given number of repetitions.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘Py’, ‘ASK 11r’, ‘[email protected]’]) print(series.str.repeat(3)) |

Output:

0 carcarcar 1 11 11 11 2 NaN 3 PyPyPy 4 ASK 11rASK 11rASK 11r 5 [email protected]@[email protected] dtype: object |

**count()**

Returns count of the given pattern in each element in Series or Data-Frame.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘[email protected]’]) print(series.str.count(‘a’)) |

Output:

0 1.0 1 0.0 2 NaN 3 2.0 4 0.0 5 1.0 dtype: float64 |

**find()**

Returns the position where the specified pattern first occurs.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘[email protected]’]) print(series.str.find(‘a’)) |

Output:

0 1.0 1 -1.0 2 NaN 3 0.0 4 -1.0 5 2.0 dtype: float64 |

**findall()**

Returns list of all occurrences of the specified pattern.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘aap’, ‘ASK 11r’, ‘[email protected]’]) print(series.str.findall(‘a’)) |

Output:

0 [a] 1 [] 2 NaN 3 [a, a] 4 [] 5 [a] dtype: object |

**Swapcase**()

Converts uppercase to lowercase and vice-versa.

series = pd.Series([‘car’, ’11 ‘, np.nan, ‘PyPy’, ‘ASK 11r’, ‘[email protected]’]) print(series.str.swapcase()) |

Output:

0 CAR 1 11 2 NaN 3 pYpY 4 ask 11R 5 [email protected] dtype: object |

**Summary**

In this articl, we worked with Srings in Pandas. Next article will focus on Pandas Data Visualization.