Python is a scripting language that runs on an interpreter. Under the hood, Python scripts are compiled into byte code and run by the Python Virtual Machine (PVM) - the runtime engine that loops through byte code instructions individually.
Python supports procedural, OOP and functional programming paradigms with classes, lambdas, generators, decorators, first-class objects and comprehension expressions.
Everything in Python is an object, and every object can be classified as mutable or immutable. Numbers, strings and tuples are immutable, while lists, dictionaries and sets are not. As Python only changes references when updating content for mutable types (similar to pointers in C), changing a referenced object in place may impact multiple references from within different objects.
Different mutable types have different built-in methods to make copies. For example, the slicing or range operations of lists yield new copies, and dictionaries have a copy()
method. All the copies made in this way are shallow, meaning that only the top-level references are copied. To make a deep copy of a nested object recursively, use the deepcopy()
from the copy
module.
Python is dynamically and strongly typed. There are no type declarations in Python. Instead, the syntax of code expressions determines the types of objects. For example, square brackets represent lists, and curly braces define dictionaries. A variable is created when you assign it a value, and a variable must be assigned a value before you can use it.
Dynamic Typing in Python
In Python, the type information is associated with objects, not variables. Each Python object contains a header field that tags the object with a type (implemented as a pointer to the type object). For example, expression a = 'Python'
creates a ‘typeless’ variable a
that points to an object. The object has a type tag pointing to the Python object str
, which is the type of the object.
Garbage Collection & References
Certain frequently used immutable objects, such as the number 1
or the string a
, may not be reclaimed by the garbage collector immediately after they lose all the references from user-defined variables (the reference count drops to 0
). The Python runtime may cache them, keep the memory around, and reuse the objects later. In addition, the Python runtime often references certain objects and keeps them in memory. For example, the number 1
will certainly have a larger reference count than one may expected.
Built-in Data Types
In Python, types in the same category share the same set of operations. For example, all sequence types, such as strings, lists and tuples, support indexing, slicing and concatenation operations. Strings are also immutable and, like other immutable types (frozen sets and tuples), do not support in-place changes.
The three main data type categories are:
- Numbers (integer, floating point, decimal, fraction…) support addition and multiplication
- Sequences (strings lists and tuples) support iteration, indexing, slicing, and concatenation
- Slicing assignments can be considered a combination of two steps: deletion and insertion of an entire section of the original object. The length of the deletion and insertion does not have to be the same; hence, it can be used to replace, expand or shrink the object at hand. For example:
- Mappings (dictionaries) support indexing by key
Boolean
Like in many scripting languages, although represented by the dedicated notation True
and False
and printed as the words True and False, internally, the Boolean type is a subtype of int and has the values 0
and 1
. You can even use semantically illogical code like True + 4
.
All objects in Python have an intrinsic, inherited True
or False
chracteristic. For example, 0
is false, but other numbers are true; the empty string ''
is false, but the string 'Python'
is true. This is similar to the truthy property of JavaScript objects.
Numbers
Python supports a comprehensive range of number types, including integers, floating-point numbers, complex numbers, decimals (the Decimal
type), rationals (the Fraction
type), and sets. The Python ecosystem also offers a wealth of libraries and packages for advanced mathematical and scientific computation, such as matrix and vector processing and sophisticated graphics and plotting.
String
The String type in Python can be seen as a sequence of one-character strings. Like all other sequence types, it supports positional ordering operations such as len() and the indexing expression s[i]. The String type has a rich set of type-specific methods such as splitlines
, encode
, endwith
and isalpha
. The full list can be viewed by dir(str)
. The String type also supports +
(concatenation) and *
(repeat) operations through operator overloading (a form of polymorphism, meaning the same operators will behave differently depending on the objects being operated on).
List
Python Lists are positionally ordered collections of objects of arbitrary types.
Lists are of sequence type in Python. Unlike strings, which are also sequences, lists are mutable, providing a flexible data structure for any arbitrary collection, such as files in a directory or to-do items in a task list.
Lists are analogous to the array type in other programming languages, with the difference that Python lists can hold content of arbitrary types. Similar to multidimensional arrays, Python Lists allow arbitrary nesting.
Many built-in Python List methods can modify their content in place, extend or shrink the list (via pop
, insert
or extend
, etc.), or change the entire list (such as sort
). However, unlike in C, growing lists by assigning items to indexes that are out of bounds is not permitted in Python, thanks to a feature called bounds checking.
Lists support list comprehension, building new lists by iterating iterable
objects without altering the existing source objects. For example:
[row[1] for row in M if row[1] % 2 == 0]
to filter out the old items in column 2 of a matrix or
[M[i][i] for i in [0, 1, 2]
to collect a diagonal from a 3x3 matrix.
Tuple
Tuple is a sequence type in Python. It is the immutable version of the List. Tuples are ordered collections of arbitrary objects. Tuple literals are coded in parentheses (as opposed to square brackets for Lists). Tuple supports arbitrary types, arbitrary nesting and the usual sequence operations.
As with all immutable collections in Python, tuples store the references to objects and not the objects themselves, which means the referenced objects can still be altered.
Dictionary
The Dictionary type is the only built-in mapping type in Python. Dictionaries are unordered collections of arbitrary types stored and retrieved by keys instead of positional offsets like the sequence types. The literal syntax for dictionaries is curly brackets {key1: value1, key2: value2, ...}
. Like lists, dictionaries are mutable and arbitrarily nestable.
Python dictionaries are similar to associative arrays or hashes in other programming languages. Internally, Python dictionaries are implemented as hash tables optimised for speed and efficiency. That means any hashable object can be used as dictionary keys.
Dictionary type does not raise an out-of-range/bound error when accessing (with the dict.get()
method; otherwise, it will yield a KeyError
) or setting values for non-existing keys, making it ideal for sparse data structures. It can even be used to simulate a ‘flexible list’ when integers are used as keys. Another common use case is to use tuples as keys to represent sparse matrices.
Set
A set in Python is an unordered collection of unique and immutable objects that support mathematical set theory operations. Since unordered sets do not map values to keys, sets are neither sequence nor mapping types [@lutzLearningPython2003]. Sets can only contain hashable (immutable) objects; mutable objects such as lists or dictionaries cannot be embedded in sets. To store compound values, use Tuples.
Sets themselves are mutable, too, and cannot be nested in other sets directly. To store sets inside other sets, you can use frozenset
built-in to create an immutable set.
Similar to the List type, Set supports comprehension expression. The only difference is that Set uses curly brackets (Set literal {}
) instead of square brackets (List literal []
).
File
Unlike many other programming languages, Python treats the file
type as a built-in core type, which one can obtain a file instance by calling the open()
method. In Python 3, the read and write operations treat files as Unicode encoded by default - unless you specifically pass in the b
(for binary) flag.
Python has many built-in utilities for handling files. For example, the pickle module serialises objects, the struct tool packs and unpacks binary data, the JSON module converts Python objects to and from JSON syntax, and the shelve module provides keyed storage and access to Python objects.
Comparison and Equality Tests for Built-in Types
In Python, the equality test (\=\=
) performs value equivalence - recursively for nested objects. The operator is
conducts a reference test. Because Python caches immutable objects internally, two distinct short strings a = 'Python', b='Python'
may pass the is
test a is b
. However, users should not rely on this compiler implementation feature and always perform the desired equality tests within the correct semantic contexts.
Different types have different definitions of equality and relative magnitude comparison results. For example, sets are equal if they contain the same items, dictionaries are equal if their sorted lists are equal, and lists and tuples are compared by each component from left to right.
Unlike JavaScript, Python does not support intrinsic conversion of types (apart from numbers - they will be converted to the highest precision) for comparison. Comparing variables of different types in Python will throw an error.
Python Virtual Environment
An environment in Python is an isolated context in which a Python program can be run. It consists of the Python interpreter and other required packages for the program. Using an environment is similar to the idea of Docker, which allows you to run an application in an isolated environment without polluting the host machines. For Python, this is especially important for operating systems that use multiple package managers for Python utilities, where the package managers may install conflicting packages and interfere with each other.
The venv
module can be used to create a lightweight virtual environment on top of an existing Python installation.
A virtual environment can be activated by running:
It prepends the venv
to your PATH
, so running regular Python commands, including pip
and python
, will invoke the virtual environment’s interpreter without being explicitly told so.
The Python Interpreter
Python interpreter comes with a collection of __builtins__
- a set of built-in functions, exceptions and other objects. Use help(__builtins__)
to learn more about it.
Statement vs. Expressions
In Python, an expression is a programming construct that can be reduced to a value or a collection of symbols that jointly express a quantity. For example: 3 + 5
or [a.x for a in some_iterable]
. A simple way to classify it is whether you can feed it to eval()
- a valid expression can always be evaluated by evel()
. Expressions can only contain identifiers, literals, and operators - including the call operator ()
, subscription operator []
, arithmetic, and boolean operators.
Statements are everything else that can make up a line. They perform actions, that is, they do something. Statements are the smallest standalone element in an imperative programming language. The distinction between an expression and a statement is an important one in avoiding common coding mistakes:
Every valid expression can be used as a statement (called an expression statement).
Variable Naming Conventions
In Python, variable names that begin with a single underscore are not imported by a from module import *
statement.
Useful Tools
PyDoc
provides multiple ways of displaying documentation for built-in and imported modules and application scripts. It can start an HTTP server locally and provide a nice web UI with search functionality and auto-generated links, allowing you to click your way through the relevant modules in your application.
Functions
Unlike in compiled languages such as C++, Python def
is a regular statement that assigns a function object to a name at runtime. Functions are first-class objects in Python (first-class object model) and can be passed around and stored in lists or dictionaries like any other Python object. Python functions are only evaluated at runtime when reached and are not compiled before the application starts. Furthermore, Python functions do not have to be fully defined before the application starts, as the code inside def
is only evaluated when the functions are called later. The below syntax is valid in Python (not in traditional compiled programming languages):
Similar to JavaScript, you can attach arbitrary information to functions in Python for later use:
OOP in Python
Class is the main OOP tool in Python. However, similar to the def
statement, the class
keyword in Python is an executable statement that assigns a special object to a name, not a declaration like in traditional OOP languages such as C++. This concept is similar to the class model in JavaScript - classes are factories that use constructors to create new instances.
Another distinctive characteristic is that classes are objects created when the application runs, typically when it is imported and read by the compiler. Like ordinary objects, classes in Python can be modified in place at runtime, and their instances will reflect any changes to the class definition. Like functions, class objects can have data attributes attached to them and shared by all instances of the classes. E.g., a counter, a flag, or other state shared by all instances.
Python classes are mostly namespace objects. The way their attributes are created is similar to Python modules and functions. When a class is imported, Python executes all the statements nested inside a class
body, from top to bottom. Assignments that happen during this process create names in the class’s local scope and assign values to them, creating the namespace abject.
The namespace concept in Python is an important one. The inheritance search goes up the chain (references) but assignments only affect the instances themselves, leading to seemingly strange behaviour. For example:
Polymorphism in Python
Unlike statically typed languages, Python has a different philosophy of Polymorphism. Let’s look at an example:
This behaviour may surprise developers from a traditional statically typed programming background. However, it is intentional and a feature in Python. When programming in Python, one should not be concerned about the data types on which a specific piece of code operates. As long as the data types confirm the appropriate protocols the code expects, they should be allowed to benefit from the existing utilities (hence the increased expandability of the language). This is commonly known as duck-typing. The very nature of Python not declaring the types of variables implies that one should not rely on the types of objects (no type check) to function correctly (apart from special requirements).
References
- Lutz, M. & Ascher, D. (2003) Learning Python. ‘O’Reilly Media, Inc.’
- Wikipedia Contributors (2019). Python (programming language). [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/Python_(programming_language).