Data Types in Python

Data Types in Python#

data type	description	composite	mutable
int	integer numbers	no	no
float	floating-point numbers	no	no
string	characters	no	no
bool	`True` or `False`	no	no
list	sequence of items	yes	yes
tuple	immutable sequence	yes	no
dict	lookup table	yes	yes
set	collection of unique items	yes	yes
NoneType	just nothing	no	no

Basic and composite data types#

Basic means that a data type does not contain any other types. Composite means that a data type contains other types.

Immutable and mutable data types#

In Python there are basic and composite data types. The values of basic data types cannot be changed, they are immutable. Most of the composite data types are mutable.

The immutable data types in Python are:

Boolean (True / False)
Integer (0, 1, -3)
Float (1.0, -0.3, 1.2345)
Strings ('apple', "banana") - both single and double quotes are valid
None (aka an empty variable)
Tuples (multiple values in parentheses, e.g. ('Jack', 'Smith', 1990))

The mutable data types are

List [1, 2, 2, 3]
Dictionary {‘name’: ‘John Smith’, ‘year’: 1990}
Set {1, 2, 3}

Numbers#

Integer numbers#

Numerical values without decimal places are called integers or ints. In Python, integers are a predefined data type.

  a = 42
  a

Floating-point numbers#

Numbers with decimal places are called floating-point numbers or floats.

b = 42.0
b

42.0

pi = 3.14159
pi

3.14159

Arithmetical Operators#

The arithmetical symbols like + - * / connecting two numbers are called operators. In addition to the standard arithmetics a few other operators are available:

a = 7
b = 4

c = a - b
c

d = a * b      
d

e = a / b      
e

1.75

f = a % b      # modulo, 3
f

g = a ** 2     # 49  
g

h = 7.0 // b   # floor division, 1.0
h

1.0

If you perform arithmetics with integer numbers, the result is also an integer. If one of the numbers is a float, the result will also be a float. When you perform a division, the result is always a floating-point number.

Rounding and binary representation#

Occasionally, seemingly simple floating-point calculations will result in strange results, e.g. instead of 0.3 you might see:

    
1 + 0.2
30000000000000004

0.30000000000000004

This is related to the underlying binary representation of floating-point numbers. The precision of floats is by default 16 digits, which is enough for most applications (be aware that it might not, if you are doing astrophysics or other high-precision calculations).

(This happens in all programming languages that use floats with limited precision, but they might round the floats automatically.)

Strings#

Text values are called strings. In Python, strings are defined by single quotes, double quotes, triple-single or triple-double-quotes:

    first = 'Emily'
    first

'Emily'

    first = "Emily"
    first

'Emily'

    first = '''Emily'''
    first

'Emily'

    first = """Emily"""
    first

'Emily'

last = 'Lastname String'

Special characters#

Some characters in Python require special attention:

character	meaning
`\n`	Newline character
`\t`	tabulator
`\\`	normal, single backslash

Additionally, Python 3 encodes Unicode characters including German Umlauts, Chinese and Arab alphabets by default. However, they may not be interpreted in the same way in different environments. Just be a bit careful when using them.

String concatenation#

The operator + also works for strings, only that it concatenates the strings. It does not matter whether you write the strings as variables or as explicit values.

With

first = 'Emily'

and

last = 'Smith'

the following three statements have the same result:

    name = first + last
    name

'EmilySmith'

    
    name = first + "Smith"
    name

'EmilySmith'

    name = "Emily" + "Smith"
    name

'EmilySmith'

Accessing single characters#

Using square brackets, any character of a string can be accessed. This is called indexing. The first character has the index [0], the second [1] and the fourth has the index [3].

    name[0]

'E'

    name[3]

'l'

With negative numbers, you can access single characters from the end, the index [-1] being the last, [-2] the second last character and so on:

    
    name[-1]

'h'

    name[-2]

't'

Note that none of these modify the contents of the string variable.

Creating substrings#

Substrings can be formed by applying square brackets with two numbers inside separated by a colon (slices). The second number is not included in the substring itself.

    
    name = 'Emily Smith'

    name[0:5]

'Emily'

    name[1:4]

'mil'

    name[6:11]

'Smith'

    name[:3]

'Emi'

    name[-4:]

'mith'

String methods#

Every string in Python brings a list of functions to work with it. As the functions are contained within the string they are also called methods. They are used by adding the . to the string variable followed by the method name.

Below you find a few of the available methods:

Changing case#

    name = 'Manipulating Strings \n'

    name.upper()

'MANIPULATING STRINGS \n'

    name.lower()

'manipulating strings \n'

Removing whitespace at both ends#

    name.strip()

'Manipulating Strings'

Cutting a string into columns#

    
    name.split(' ')

['Manipulating', 'Strings', '\n']

Searching for substrings#

    name.find('ing')
    

The method returns the start index of the match. The result -1 means that no match has been found.

Replacing substrings#

    name.replace('Strings','text')
    

'Manipulating text \n'

Checking beginning and end of a string#

Both of the following functions return a boolean:

    name.startswith('Man')

True

    name.endswith('ings')

False

Tuples#

A tuple is a sequence of elements that cannot be modified. They are useful to group elements of different type.

        person = ('Emily', 'Smith', 23)

In contrast to lists, tuples can also be used as keys in dictionaries.

Indexing tuples#

Elements of tuples can be indexed in the same way as lists:

    person[0]

'Emily'

    person[-2]

'Smith'

    person[1:]

('Smith', 23)

Iterating over tuples#

You can run a for loop over a tuple:

for elem in person:
    print(elem)

Emily
Smith
23

Packing and unpacking tuples#

Enumerating multiple values separated by a comma implictly creates tuples:

        person = 'Emily', 'Smith', 23

Tuples can be unpacked to multiple variables:

    first, last, age = person

person

('Emily', 'Smith', 23)

It is even possible to swap the value of variables that way:

    first, last = last, first

person

('Emily', 'Smith', 23)

Lists#

A list is a Python data type representing a sequence of elements. You can have lists of strings:

    names = ['Hannah', 'Emily', 'Madison', 'Ashley', 'Sarah']
    names

['Hannah', 'Emily', 'Madison', 'Ashley', 'Sarah']

and also lists of numbers:

    numbers = [25952, 23073, 19967, 17994, 17687]
    numbers

[25952, 23073, 19967, 17994, 17687]

Accessing elements of lists#

Using square brackets, any element of a list and tuple can be accessed. The first character has the index 0.

print(names[0])    
print(numbers[3])

Hannah
17994

Negative indices start counting from the last character.

   print(names[-1])

Sarah

Creating lists from other lists:#

Lists can be sliced by applying square brackets in the same way as strings.

names = ['Hannah', 'Emily', 'Sarah', 'Maria', 'Maikel']
names[1:3]

['Emily', 'Sarah']

names[0:2]      

['Hannah', 'Emily']

names[:3]

['Hannah', 'Emily', 'Sarah']

names[-2:]

['Maria', 'Maikel']

Copying a list#

You can use slicing to create a copy:

    girls = names[:]
    girls

['Hannah', 'Emily', 'Sarah', 'Maria', 'Maikel']

Adding elements to a list#

Add a new element to the end of the list:

    names.append('Marilyn')
    names

['Hannah', 'Emily', 'Sarah', 'Maria', 'Maikel', 'Marilyn']

Removing elements from a list#

Remove an element at a given index:

names.remove('Sarah')

names

['Hannah', 'Emily', 'Maria', 'Maikel', 'Marilyn']

Remove the last element:

   names.pop()

'Marilyn'

Replacing elements of a list#

You can replace individual elements of a list by using an index in an assignment operation:

names = ['Hannah', 'Emily', 'Sarah', 'Maria', 'Maikel']
print(names)

['Hannah', 'Emily', 'Sarah', 'Maria', 'Maikel']

names[4] = 'Fiona'
print(names)

['Hannah', 'Emily', 'Sarah', 'Maria', 'Fiona']

Sorting a list#

    names.sort()

The itemgetter module allows you to sort lists by a specific column. E.g. to sort names by the 3rd character:

from operator import itemgetter
names.sort(key=itemgetter(2))
print(names)

['Emily', 'Hannah', 'Fiona', 'Maria', 'Sarah']

Counting elements#

    names = ['Hannah', 'Emily', 'Sarah', 'Emily', 'Maria']
    names.count('Emily')

List comprehension#

A powerful construct in modern languages, including Python, is list comprehension. It is a way to succinctly build lists from other lists. It can be very useful, but should be applied sensibly as it can sometimes be difficult to read. There is an optional extension exercise at the end of this notebook that uses list comprehension.

Say we have a list of numbers and we wish to create a new list that squares each number in the original list and adds 5. Using list comprehension:

x = [4, 6, 10, 11]
y = [a*a + 5 for a in x]

print(x)
print(y)

[4, 6, 10, 11]
[21, 41, 105, 126]

To understand the meaning, read the statement left-to-right.

As another example, say we have a list of names and we want to

build a new list of names that contains only the names with more than 5 characters; and
for these names we want to add a full stop at the end.

Using list comprehension:

lab_group1 = ["Roger", "Rachel", "Amer", "Caroline", "Colin"]
print(lab_group1)

group = [name + "." for name in lab_group1 if len(name) > 5]
print(group)

['Roger', 'Rachel', 'Amer', 'Caroline', 'Colin']
['Rachel.', 'Caroline.']

Dictionaries#

Dictionaries are unordered, associative arrays. They consist of key/value pairs. They are very versatile data structures, but more difficult to use than lists if you are new to Python. As the name implies, dictionaries are good for looking up things, or searching in general.

Creating dictionaries#

A Python dictionary (dict) is declared using curly braces or also called winged brackets. On the left side of each entry is the key, on the right side the value:

    ratios = {
        'Alice': 0.75,
        'Bob': 0.55,
        'Charlie': 0.80
        }

room_allocation = {"Adrian": None, "Laura": 32, "John": 31, "Penelope": 28, "Fraser": 28, "Gaurav": 19}
print(room_allocation)
print(type(room_allocation))

{'Adrian': None, 'Laura': 32, 'John': 31, 'Penelope': 28, 'Fraser': 28, 'Gaurav': 19}
<class 'dict'>

Each entry is separated by a comma. For each entry we have a ‘key’, which is followed by a colon, and then the ‘value’. Note that for Adrian we have used ‘None’ for the value, which is a Python keyword for ‘nothing’ or ‘empty’.

Now if we want to know which room Fraser has been allocated, we can query the dictionary by key:

frasers_room = room_allocation["Fraser"]
print(frasers_room)

# Create empty dictionary
room_allocation_inverse = {}

# Build inverse dictionary to map 'room number' -> name 
for name, room_number in room_allocation.items():
    # Insert entry into dictionary
    room_allocation_inverse[room_number] = name

print(room_allocation_inverse)

{None: 'Adrian', 32: 'Laura', 31: 'John', 28: 'Fraser', 19: 'Gaurav'}

Accessing elements in dictionaries#

By using square brackets and a key, you can retrieve the values from a dictionary. At least if the key is present:

ratios['Alice']    # 0.75
ratios['Ewing']    # KeyError!

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

Cell In[144], line 2
      1 ratios['Alice']    # 0.75
----> 2 ratios['Ewing']    # KeyError!


KeyError: 'Ewing'

Retrieving values in a fail-safe way:#

With the get() method you can assign an alternative value if the key was not found.

ratios.get('Alice')
ratios.get('Ewing', 'sorry not found')

'sorry not found'

Changing values in a dictionary#

The contents of a dictionary can be modified. For instance if you start with an empty dictionary:

    persons = {}

Now you can add values one key/value pair at a time:

    persons['Emily'] = 1977

Setting values only if they dont exist yet:#

persons.setdefault('Alice', 1980)
persons.setdefault('Emily', 1898)
# for 'Emily', nothing happens

Getting all keys or values:#

    ratios.keys()
    ratios.values()
    ratios.items()

dict_items([('Alice', 0.75), ('Bob', 0.55), ('Charlie', 0.8)])

Checking whether a key exists#

The in operators checks whether a key exists in the dictionary.

    if 'Bob' in ratios:
        print('found it')

found it

Note that you can use in for the same with a list as well. The dictionary is much faster!

Loops over a dictionary#

You can access the keys of a dictionary in a for loop.

for name in ratios:
    print(name)

Alice
Bob
Charlie

However, there is no stable order unless you sort the keys explicitly:

    for name in sorted(ratios):
        print(name)    

Alice
Bob
Charlie

Create dictionaries from lists#

k = ['Toy', 'Game', 'Tiger']
v = ['Toys', 'Games', 'Tigers']

#create dict
dc = dict(zip(k, v))
print(dc)
# {'Game': 'Games', 'Tiger': 'Tigers', 'Toy': 'Toys'}

d = dict(enumerate(v))
print(d)
# {0: 'Toys', 1: 'Games', 2: 'Tigers'}

{'Toy': 'Toys', 'Game': 'Games', 'Tiger': 'Tigers'}
{0: 'Toys', 1: 'Games', 2: 'Tigers'}

What data can I use as keys?#

Valid types for keys are:

integers
floats
strings
tuples
booleans

You may mix keys of different type in one dictionary. However, mutable data types such as lists and other dictionaries are not allowed as keys.

The concept behind this phenomenon is that dictionaries use a hash function to sort the keys internally. The hash function is what allows to look up values very quickly.