About 5538 letters
About 28 minutes
Strings can be considered as a list of characters, and they are a very common and feature-rich type.
We have already touched upon strings in the section Basic Syntax - Variables and Basic Types.
Computers store data in binary using electronic components, so characters must be mapped to binary values. This mapping is called encoding.
For example, the English letter A
corresponds to the binary value 01000001
(decimal 65) in ASCII encoding.
The smallest storage unit in a computer is a byte, which has 8 bits.
In Python, there is a type called bytes
that stores a sequence of bytes.
A bytes
literal looks similar to a string but is prefixed with a b
, such as b'hello world'
.
Converting a string to bytes is called encoding, and converting bytes back to string is called decoding:
data: bytes = b'hello world'
print(data)
text: str = data.decode() # decode
print(text)
print(text.encode()) # encode
Bytes may look like strings with a b
prefix, but there are key differences:
text: str = '你好世界'
print(text)
data: bytes = text.encode() # encode
print(data)
print(len(text), len(data)) # lengths differ
print(text[1], data[1]) # text[1] is a whole Chinese character '好', while data[1] is a byte of the character '你'
In programming, you often need to create strings based on variables.
You can use the %
operator for formatting, with the syntax:
"format string" % (value1, value2, ...) # values are in a tuple
If there is only one value, you can omit the tuple:
"format string" % value
For example:
print("Pork price is %d yuan per jin" % 15)
print("%d jin of pork costs %d yuan" % (3, 3*15))
Here %d
is a decimal integer placeholder that will be replaced by the corresponding value in decimal form. Common placeholders include:
%%
: literal percent sign %d
: decimal integer %o
: octal integer %x
: hexadecimal integer (lowercase) %X
: hexadecimal integer (uppercase) %f
: floating-point number %s
: string This style is less common nowadays; see more details at printf-style String Formatting.
The format
method is more flexible than %
. It uses curly braces {}
as placeholders and supports formatting within the braces, for example:
print("Name: {}, Age: {}".format("Jerry", 18)) # positional replacement
print("Name: {1}, Age: {0}".format(19, "Tom")) # positional index
print("Name: {name}, Age: {age}".format(name="Tuffy", age=8))# named replacement
You can specify width:
# Print multiplication table
for x in range(1, 10):
for y in range(1, 10):
print(' {:2} '.format(x * y), end='') # min width 2 chars
print('')
Width can be a variable:
print("'{:{width}}'".format('txt', width=7))
You can specify alignment:
print("'{:<5}'".format('txt')) # left-align with width 5
print("'{:>5}'".format('txt')) # right-align with width 5
print("'{:^5}'".format('txt')) # center with width 5
You can use indexing for dicts:
score_list: dict[str,int] = {
'Tom': 88,
'Jerry': 99,
'Spike': 66
}
print("Scores: Tom:{0[Tom]} Jerry:{0[Jerry]} Spike:{0[Spike]}".format(score_list))
To keep n decimal places using format
function:
print("Approximate value of pi is {}".format(format(3.1415926, '.2f'))) # two decimal places
See Format String Syntax.
Formatted string literals use the syntax f'xxxx'
or f"xxxx"
, with expressions inside {}
evaluated:
score_list: dict[str,int] = {
'Tom': 88,
'Jerry': 99,
'Spike': 66
}
print(f"Scores: Tom:{score_list['Tom']} Jerry:{score_list['Jerry']} Spike:{score_list['Spike']}")
Raw string literals are prefixed with r'xxxx'
or r"xxxx"
, where escape sequences are not processed, so \n
is treated as two characters, \
and n
:
print(r'hello \n world')
Raw strings are useful for regular expressions or other scenarios requiring many backslashes.
Regular expressions will be covered later.
Multiline strings use triple quotes ('''
or """
), for example:
print('''
## Multiline strings
Multiline strings use triple quotes (`'''` or `"""`).
''')
Multiline strings are also commonly used as multiline comments:
'''
Not assigned to a variable and not evaluated,
so acts as a comment.
'''
print("hello world")
Multiline strings can also be combined with prefixes b
, f
, or r
for bytes, formatted strings, or raw strings.
Created in 5/15/2025
Updated in 5/21/2025