The string
module provides a number of methods and constants for manipulating strings (type: str
). These work on all strings in Python which are created using quotation marks: 'single'
, "double"
, '''triple-single'''
, and """triple-double"""
-quoted text. Strings can also be created by applying the built-in str( )
function to any other datatype. In Python, strings are immutable sequences, meaning that they cannot be altered. Instead, all strings are newly initialised in memory.
In this first part, we’ll look at string methods. In the second part, a brief look at constants. And, in the third, custom string formatting.
Useful Functions
While there are a number of methods, I will focus on the most interesting and useful.
str.lower()
, str.upper()
, str.title()
Basic string manipulation might requiring taking lowercase of a string (e.g., creating a website ‘slug’) or upper case (e.g., formatting in a report or other document). str.title()
will capitalize the first word of every word, while the rest will be lowercase.
'Helsinki'.lower() # 'helsinki' 'Helsinki'.upper() # 'Helsinki' # these do not modify the string, but create a new string s = 'Helsinki' s2 = s.lower() s3 = s.upper() print(s, s2, s3) #> Helsinki helsinki HELSINKI
str.casefold()
While this is similar to str.lower()
, it is designed specifically to ‘fold’ both ‘cases’ (i.e., upper or lower case letters/words) into a single case for, e.g., comparisons. When wanting to compare two strings regardless of case, this is probably the most desirable function.
One example is the German ß
which alternates with ss
. We can compare the differences:
'ß'.lower() # 'ß' 'ß'.upper() # 'SS' 'ß'.casefold() # 'ss' # Thus, this comparison is equal: 'ß'.casefold() == 'ss'.casefold() == 'SS'.casefold() # True # ...where this one does not: 'ß'.lower() == 'ss'.lower() == 'SS'.lower() # False
str.swapcase()
Change all uppercase letters to lowercase, and vice versa. Note that not all of the mappings are unidirectional. We can revisit Turkish and German to show where two lowercase letters map to the same uppercase letter.
'Helsinki'.swapcase() # 'hELSINKI' 'Helsinki'.swapcase().swapcase() # 'Helsinki' (original string) 'ı'.swapcase() # 'I' 'ı'.swapcase().swapcase() # 'i' (not lowercase 'ı') 'ß'.swapcase() # 'SS' 'ß'.swapcase().swapcase() # 'ss' (not 'ß')
str.strip()
Strip off extra characters from either side of the subset of the string you want. By default, this removes all whitespace on the edges of the string, but you can specify an argument to remove as many of those characters as it can.
>>> ' A Tale of Two Cities\n'.strip() 'A Tale of Two Cities' >>> ' A Tale of Two Cities\n'.rstrip() # strip only from the right ' A Tale of Two Cities' >>> ' A Tale of Two Cities\n'.lstrip() # strip only from the left 'A Tale of Two Cities\n' >>> '............___Dickens, Charles___..........'.strip('._') 'Dickens, Charles'
str.startswith(prefix)
and str.endswith(prefix)
Checking the start or end of a string is often useful (typically after strip
). For example, endswith
might look for a file extension while startswith
might check to see if you have a test case or not.
The prefix
in either case can be tuple
or str
— the tuple
will check for the presence of any of the strings at the start/end.
There are similar methods str.removeprefix
and str.removesuffix
, although neither of those will take a tuple
(I’ve never used either method anyway).
>>> 'test.csv'.endswith(('.csv', '.txt')) True >>> 'test.csv'.removesuffix('.csv') 'test' >>> 'test.csv'.removesuffix('.txt') 'test.csv'
in
Want to know if a substring is contained in your string? This is the equivalent to Ctrl + F
to find a substring somewhere on the web or in a document. Remember that you may want to normalise the case (via str.casefold
) before checking.
>>> long_text_from_website = 'It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness...' >>> 'doctor manette' in long_text_from_website False # character name is always capitalised >>> 'doctor manette' in long_text_from_website.casefold() True
str.is___()
identities
What is your string?
str.isalnum()
: all characters are alphabetic or numeric (i.e., no punctuation or whitespace)str.isalpha()
: all characters are alphabeticstr.isascii()
: all of the characters are ascii (code points from U+0000 to U+007Fstr.isdigit()
: all characters are numeric (‘9.0′ returns False; ’90’ returns True)str.isspace()
: all characters are whitespace charactersstr.isupper()
,str.islower()
,str.istitle()
str.join()
and str.split()
When working with text a lot, it is very useful to be able to split and join text around some separator. These are inverse operations. A very frequent use case can be to remove duplicated whitespace characters: ' '.join(value.split())
. Thus where is
will turn into where is
(single-space).
Another good use case is reading some text that has a delimiter (e.g., a Python version). While these can sometimes be handled by the csv
module, using str.split
is usually much easier.
>>> version = '3.11.1' >>> version.split('.') ['3', '11', '1'] >>> major, minor, patch = version.split('.') >>> major '3' >>> minor '11' >>> patch '1' >>> version == '.'.join([major, minor, patch]) True >>> version == '.'.join(version.split('.')) True
It is worth mentioning that str.partition
will split a single instance and keep whatever separator was found. I have never used this, though it probably makes sense for parsing, e.g., property files.
>>> '1+1=2'.partition('=') ('1+1', '=', '2')
str.translate(table)
from str.maketrans()
This method will replace all values of something in a string. The resulting call will look something like s.translate(str.maketrans(x, y, z))
. I will show the most useful case I’ve found which is removing all punctuation from a string, along with some more trivial examples.
>>> 'Hello, World.'.translate(str.maketrans({'.': '!', '!': '?'})) 'Hello, World!' >>> 'Hello, World!'.translate(str.maketrans({'.': '!', '!': '?'})) 'Hello, World?' >>> 'Hello, World!'.translate(str.maketrans('.!,', '!.-')) 'Hello- World.' # remove punctuation >>> import string >>> 'Hello, World!'.translate(str.maketrans('', '', string.punctuation)) 'Hello World'
str.zfill(width)
If your string is number (actually, it can be letters, punctuation, whitespace, etc., as well) and you want it nicely formatted (e.g., with leading zeroes), zfill
is handy…though format strings (discussed in part 3) will be more broadly generalizable. The number passed into zfill
will be the number of characters in the resulting string.
>>> '21'.zfill(5) '00021' >>> '-21'.zfill(8) '-0000021' # also, non-numeric following same rules >>> 't'.zfill(5) '0000t' >>> '+t'.zfill(8) '+000000t'
Parting Thoughts
The methods for str
types are particularly useful in string manipulation, particularly for purposes of NLP as well as building reports.