string — Common string operations (Part 1: methods)

The string module provides a number of methods and constants for manipulating strings (type: str). These work on all strings in Python which are created using quotation marks: 'single', "double", '''triple-single''', and """triple-double"""-quoted text. Strings can also be created by applying the built-in str( ) function to any other datatype. In Python, strings are immutable sequences, meaning that they cannot be altered. Instead, all strings are newly initialised in memory.

In this first part, we’ll look at string methods. In the second part, a brief look at constants. And, in the third, custom string formatting.

Useful Functions

While there are a number of methods, I will focus on the most interesting and useful.

`str.lower()`, `str.upper()`, `str.title()`

Basic string manipulation might requiring taking lowercase of a string (e.g., creating a website ‘slug’) or upper case (e.g., formatting in a report or other document). str.title() will capitalize the first word of every word, while the rest will be lowercase.

'Helsinki'.lower()  # 'helsinki'
'Helsinki'.upper()  # 'Helsinki'

# these do not modify the string, but create a new string
s = 'Helsinki'
s2 = s.lower()
s3 = s.upper()

print(s, s2, s3)
#> Helsinki helsinki HELSINKI

`str.casefold()`

While this is similar to str.lower(), it is designed specifically to ‘fold’ both ‘cases’ (i.e., upper or lower case letters/words) into a single case for, e.g., comparisons. When wanting to compare two strings regardless of case, this is probably the most desirable function.

One example is the German ß which alternates with ss. We can compare the differences:

'ß'.lower()  # 'ß'
'ß'.upper()  # 'SS'
'ß'.casefold()  # 'ss'

# Thus, this comparison is equal:
'ß'.casefold() == 'ss'.casefold() == 'SS'.casefold()  # True
# ...where this one does not:
'ß'.lower() == 'ss'.lower() == 'SS'.lower()  # False

`str.swapcase()`

Change all uppercase letters to lowercase, and vice versa. Note that not all of the mappings are unidirectional. We can revisit Turkish and German to show where two lowercase letters map to the same uppercase letter.

'Helsinki'.swapcase()  # 'hELSINKI'
'Helsinki'.swapcase().swapcase()  # 'Helsinki' (original string)

'ı'.swapcase()  # 'I'
'ı'.swapcase().swapcase()  # 'i' (not lowercase 'ı')

'ß'.swapcase()  # 'SS'
'ß'.swapcase().swapcase()  # 'ss' (not 'ß')

`str.strip()`

Strip off extra characters from either side of the subset of the string you want. By default, this removes all whitespace on the edges of the string, but you can specify an argument to remove as many of those characters as it can.

>>> ' A Tale of Two Cities\n'.strip()
'A Tale of Two Cities'
>>> ' A Tale of Two Cities\n'.rstrip()  # strip only from the right
' A Tale of Two Cities'
>>> ' A Tale of Two Cities\n'.lstrip()  # strip only from the left
'A Tale of Two Cities\n'
>>> '............___Dickens, Charles___..........'.strip('._')
'Dickens, Charles'

`str.startswith(prefix)` and `str.endswith(prefix)`

Checking the start or end of a string is often useful (typically after strip). For example, endswith might look for a file extension while startswith might check to see if you have a test case or not.

The prefix in either case can be tuple or str — the tuple will check for the presence of any of the strings at the start/end.

There are similar methods str.removeprefix and str.removesuffix, although neither of those will take a tuple (I’ve never used either method anyway).

>>> 'test.csv'.endswith(('.csv', '.txt'))
True
>>> 'test.csv'.removesuffix('.csv')
'test'
>>> 'test.csv'.removesuffix('.txt')
'test.csv'

in

Want to know if a substring is contained in your string? This is the equivalent to Ctrl + F to find a substring somewhere on the web or in a document. Remember that you may want to normalise the case (via str.casefold) before checking.

>>> long_text_from_website = 'It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness...'
>>> 'doctor manette' in long_text_from_website
False  # character name is always capitalised
>>> 'doctor manette' in long_text_from_website.casefold()
True

`str.is___()` identities

What is your string?

str.isalnum(): all characters are alphabetic or numeric (i.e., no punctuation or whitespace)
str.isalpha(): all characters are alphabetic
str.isascii(): all of the characters are ascii (code points from U+0000 to U+007F
str.isdigit(): all characters are numeric (‘9.0′ returns False; ’90’ returns True)
str.isspace(): all characters are whitespace characters
str.isupper(), str.islower(), str.istitle()

`str.join()` and `str.split()`

When working with text a lot, it is very useful to be able to split and join text around some separator. These are inverse operations. A very frequent use case can be to remove duplicated whitespace characters: ' '.join(value.split()). Thus where is will turn into where is (single-space).

Another good use case is reading some text that has a delimiter (e.g., a Python version). While these can sometimes be handled by the csv module, using str.split is usually much easier.

>>> version = '3.11.1'
>>> version.split('.')
['3', '11', '1']
>>> major, minor, patch = version.split('.')
>>> major
'3'
>>> minor
'11'
>>> patch
'1'
>>> version == '.'.join([major, minor, patch])
True
>>> version == '.'.join(version.split('.'))
True

It is worth mentioning that str.partition will split a single instance and keep whatever separator was found. I have never used this, though it probably makes sense for parsing, e.g., property files.

>>> '1+1=2'.partition('=')
('1+1', '=', '2')

`str.translate(table)` from `str.maketrans()`

This method will replace all values of something in a string. The resulting call will look something like s.translate(str.maketrans(x, y, z)). I will show the most useful case I’ve found which is removing all punctuation from a string, along with some more trivial examples.

>>> 'Hello, World.'.translate(str.maketrans({'.': '!', '!': '?'}))
'Hello, World!'
>>> 'Hello, World!'.translate(str.maketrans({'.': '!', '!': '?'}))
'Hello, World?'
>>> 'Hello, World!'.translate(str.maketrans('.!,', '!.-'))
'Hello- World.'

# remove punctuation
>>> import string
>>> 'Hello, World!'.translate(str.maketrans('', '', string.punctuation))
'Hello World'

`str.zfill(width)`

If your string is number (actually, it can be letters, punctuation, whitespace, etc., as well) and you want it nicely formatted (e.g., with leading zeroes), zfill is handy…though format strings (discussed in part 3) will be more broadly generalizable. The number passed into zfill will be the number of characters in the resulting string.

>>> '21'.zfill(5)
'00021'
>>> '-21'.zfill(8)
'-0000021'
# also, non-numeric following same rules
>>> 't'.zfill(5)
'0000t'
>>> '+t'.zfill(8)
'+000000t'

Parting Thoughts

The methods for str types are particularly useful in string manipulation, particularly for purposes of NLP as well as building reports.

Useful Functions

str.lower(), str.upper(), str.title()

str.casefold()

str.swapcase()

str.strip()

str.startswith(prefix) and str.endswith(prefix)

str.is___() identities

str.join() and str.split()

str.translate(table) from str.maketrans()

str.zfill(width)