This document is an index for online reading materials in order to learn Python and backend development/engineering concepts from scratch and develop a mastery sufficient for Senior/Principal Backend Engineers and Data Engineers.
If you have any question/proposal or want to discuss this index, do not hesitate to contact me on LinkedIn or open an issue on GitHub.
- Environment setup and installations
- General tutorials and guides
- Important Standard Library Modules
- Topics Index
- Contribution
- Contributors
First start by setting up a Python development environment - either interactive mode with Jupyter -or- an IDE where you can write code and then execute once done.
Tip: if you work on a personal PC not meant solely for development purposes you can use a VM to install whatever is required to develop Python in an isolated environment. To run a VM on Windows PC we recommend Hyper-V (requires Windows Pro on your PC), it works best for running a Windows VM, for running a Linux VM we recommend Oracle Virtual Box but Hyper-V works as well, for personal development it's a matter of preference, you can read more here.
- Python Installation
- [Windows] Python installation - for Windows make sure to have both Python and its scripts in your PATH environment variables, they'll be in your Python installation directory, example:
- [Linux] If you are using Ubuntu 16.10 or newer, then you can easily install Python 3.10 with the following commands:
$ sudo apt-get update
$ sudo apt-get install python3.10
If you're using another version of Ubuntu (e.g. the latest LTS release) or you want to use a more current Python, we recommend to use the deadsnakes PPA to install Python 3.10:
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt-get update
$ sudo apt-get install python3.10
You can specify the exact python version if needed. To verify which version of Python 3 you have installed, open a command prompt and run:
$ python3 --version
- Understand virtual environments: Real Python reference & Official docs reference
- JupyterLab and Jupyter Notebook are notebook style editors for Python where you can write code, save data, write free text explanations, etc.
- PyCharm IDE - PyCharm is a very popular IDE for Python from JetBrains, the same company that delivers IntelliJ IDE.
- Python in VS Code - VS Code is a general purpose extremely popular IDE based almost entirely on extensions & plugins: VS Code reference & Real Python reference.
- Google Colab - online Python notebook style editor by Google. Provides access to free GPU.
- Dataspell - a relatively new IDE from JetBrains which supports running notebooks cells (.ipynb files). Has features of traditional IDE as opposed to Jupyter Notebooks and JupyterLabs.
With a development environment ready you can now learn how to write Python code.
- W3Schools - beginners
- Real Python - intermediate, in depth, articles referencing useful open source packages
- Python official docs tutorial - exhaustive, most in depth, tutorial for must and should know built-in Python capabilities
- Automate the boring stuff - beginner-intermediate. Task oriented online book.
- GeekForGeeks - beginner-intermediate, website with short tutorials in many subjects. Contains many examples and a "try it out" widget.
Tip: start from going over W3Schools and what it has to offer, it's basic, then for every topic you wish to learn start with some exercise from Automate the boring stuff and read some Real Python article if exists, and if needed read Python official docs; Real Python is more accessible.
Python ships with some standard modules, some are useful and even essential for day-to-day backend development.
Tip: to learn a moduleGoogle "python <module name>" and you'll usually find a good Real Python article and many additional useful references. Read official Python docs last.
Python official module docs: https://docs.python.org/3/py-modindex.html
- builtins module - primary module auto-imported to every scope
- enum module - support for enums
- abc module - implementing Abstract Classes
- random module - enable randomized algorithms
- datetime module - working with date, times, and time differences
- decimal module - working with precise decimal floating point arithmetic
- uuid module - Python's built-in uuid/guid type
- re module - working with regular expressions
- math module - math constants (e.g. PI), functions, etc.
- itertools module - efficient looping with iterators
- cycle - infinite cyclic iterations
- repeat - infinite/bounded repetitions of a value
- chain / chain.from_iterable - concat any number of iteratables
- compress - filter iterable by matching indicators iterable
- filterfalse - filter iterable by predicate * can also use comprehensions to achieve the same
- groupby - group's iterables items by key function
- islice - behaves as Sequences' slicing syntax
- zip_longest - like zip only pads the shorter iterable to match the longer one
- product - cartesian product of any number of iterables (returns iterable of tuples) * can also use nested comprehensions to achieve the same
- combinations_with_replacements - returns an iterable with every pair (left, right) of items from the input iterable such that left <= right
- functools module - higher order functions
- partial / partialmethod - fixes some arguments of a function and returns the resulting function
- reduce - aggregates an iterable's items projected by a function
- ** singledispatch / singledispatchmethod - generic method with implementations per type
- operator module - Python operators/keywords as functions for functional programming
- array module - efficient arrays of numeric values
- collections.abc module - abstract classes for implementing custom collections * Note to use Generic versions with type-hints
- collections module - specialized built-in collections
- dequeue - queue/stack
- ChainMap - efficient view of multiple dictionaries as single mapping
- Counter - histograms of keys
- defaultdict - dictionary with default value factory
- UserDict, UserList, UserString - simpler subclassing of dict, list, string * Note to use Generic versions with type-hints
- typing module - for type hints support, specialized strongly typed classes and using advanced typing features
- TypedDict , a dictionary with typed values
- contextlib module - for implementing context managers which can be bound to "with" statements, stack them, suppress exceptions, etc.
- ** dataclasses module - for creating idiomatic data classes
- asyncio module - support for async IO, futures, coroutines and tasks
- contextvars module - support for async flows local state represented as contexts
- ** json module - working with JSON representations
- ** pickle module - binary serialization
- logging module - logging capabilities built-in, used by most third-party libs to emit logs
- ** unittest module - unit testing framework built-in
- unittest.mock module - mocking capabilities for tests, designed for unit tests
- secrets module - cryptography classes & algorithms
- pathlib module - utilities for file system paths, particularly useful Path type
- ipaddress module - provides types representing IP addresses, e.g. IPv4Address
** better third-party module below
Learning materials can also be indexed by topics common to backend development in any prog. language/platform. The index doesn't include specific technologies, e.g. SQL databases, MongoDB, GraphQL, Google APIs, etc.; it includes topics all backend software engineers can find useful.
DISCLAIMER: All below async references were added taking into consideration only the asyncio built-in module. I'm currently looking into trio and AnyIO as alternatives.
- Packages & Modules - split Python code into multiple files & packaging reusable Python code into a single package: Real Python reference & Official docs reference
- Example asyncio module source code, look at __init__.py
- Packaging modules to distribute benefit from Python Wheels
- Descriptors - reusable logic wrapping attributes access, generalizing properties which are defined using getter/setter/deleter methods: Real Python reference & Official docs reference
- Decorators - wraps method invocations with custom logic, or decorate classes for customization: Real Python reference
- Magic (Dunder) Methods - special builtin methods in python. Identified by two prefix and suffix underscores in the method name, e.g. __str__ magic method: Real Python reference & Official docs reference
- Static typing - Python type hints: Real Python reference & Official docs reference
- Union (replacable in Python 3.10 with the '|' syntax, sometimes explicit Union is needed)
- Optional (replacable in Python 3.10 with the '| None' syntax, sometimes explicit Optional is needed)
- Generic (generic constraints, covariance, contravariance, etc.)
- Protocol (essentially define interfaces for static type check purposes, can support runtime isinstance and issubclass checks using a decorator)
- MyPy is a very popular tool for static type checks, and the docs are very useful to learn how to use type hints, integrable into Pycharm IDE as well as VS Code, and supports execution in CI technologies.
- Pyright is another popular tool for static type checks, more far ahead than Mypy regarding support for newest Python typing features, comes built-in with VS Code Python extension and supports execution in CI technologies, however at time of writing not supported by Pycharm.
- Exceptions - Official docs built-in exceptions reference
- Tip: Derive from BaseException instead of Exception in order to implement an exception type that won't be caught by general purpose except: Exception blocks. This technique is used for cancellations exceptions raised by async/await libraries; general purpose exception handling shouldn't handle cancellations.
- Weak references - reference an object such that the reference doesn't keep it alive: Official docs reference
- Concurrency & Multithreading - using a thread pool, locking, producer-consumer patterns, thread locals, async IO, async generators & comprehensions, futures, async context variables, async synchronization primitives: Real Python Concurrency reference & Real Python asyncio reference & Official docs asyncio reference
- Json - fast library for working with JSON, supports dataclasses serialization - orjson
- Data Models - represent system entities as typed data models designed for static type checks - pydantic:
- Data models validations support. including support for built-in Python types as well as additional pydantic useful types mentioned in b) and c) and simple support for custom data validations for many scenarios
- Provides several QoL value objects to use as fields of data models, most useful IMO: HttpUrl, EmailStr, Json[T], SecretStr (a string hidden from logs)
- Provides constrained types such that values follow some restrictions like strings/lists of certain length, most useful IMO: constr, conint, PositiveInt, conlist, conset
- Supports reusable models configurations via inheritance from custom BaseModel and reusable validations using validator helper
- Data models JSON support - serialization and deserialization, including support for 3rd party JSON libraries e.g. orjson
- Support for converting custom classes (e.g. ORM objects) to data models
- Support for Python's built-in dataclasses module, essentially extending it
- Has MyPyplugin to cover static type check scenarios for creating models using the pydantic syntax
- Integrates withhypothesislibrary for theory testing of data models
- Code generation based on JSON schema, JSON data, YAML data, OpenAPI 3 * There's an alternative 3rd party called attrs, comparisons can be found online
- Data Manipulation - some libraries are used to manipulate/transform data
- App Settings / Configuration - representation and access to application settings & configurations, e.g. connection strings, services URLs, anything that shouldn't be hard coded and should be accessible to system code in a configurable manner.
- Extensive support using dynaconf supporting multi-environment, many formats, external config stores (e.g. Redis), unit tests and more
- Basic support from pydantic library above, can be extended to support multi-environment and custom loaders can make it leverage dynaconf
- File System
- file system access using async IO - aiofiles
- Path object: Real Python reference
- Http Client - sending HTTP requests using async IO, popular alternatives:
- httpx
- Supports: HTTP/2, client certificate, full request & response hooks, env variables config, OAuth2 extension
- Doesn't support: websockets
- aiohttp
- Supports: web sockets, client certificate, partial request & response hooks (allowed modifications: requests headers, enables Authentication flows), OAuth2 (via another package)
- Doesn't support: HTTP/2, env variables config
- Recommended httpx for richer feature set, for web sockets client use websockets
- httpx
- SQL ORM - object relational mapper for working with SQL databases - SQLAlchemy
- Integrates withMyPy for type checking SQLAlchemy models
- Integrates withpydantic to map SQLAlchemy models to/from pydantic models
- Integrates withFastAPI to expose CRUD API on top of SQL databases
- Integrates withFlask to expose CRUD API on top of SQL databases
- Django has a built-in ORM, so no SQLAlchemy integration
- Fault tolerance - I/O can fail, e.g. services can return HTTP error responses, SQL queries/commands can fail. There are known ways to handle failures:
- Retry policies - retrying failed API requests, SQL queries/commands, etc. based on some retry policy - tenacity, type hints support incomplete. There's also aioretry which requires implementing the retry policy yourself but supports type hints.
- Circuit Breaker - block execution of logic if it failed too many times recently, e.g. if SQL queries started failing due to overload on the SQL database, don't submit new queries for a while and let the database recover - pybreaker
- Binary Serialization - MessagePack is a very efficient general purpose format - msgpack
- Logging - logging capabilities for Python: Real Python reference
- DI Container - enable DI design principle with auto-wiring of dependencies - lagom
- Built-in integration with FastAPI & Flask, including per-request injectables
- There's also rodi which is inspired by .Net built-in DI container, less features and less Github activity (commits/contributors/etc.) but simpler to use
- CLI - create applications with command-line interface - typer
- Async main/command tip below "Sync to Async decorator"
- Web frameworks - build web services/applications that either provide HTML pages/components via Server Side Rendering (SSR) and templating, or RESTful HTTP APIs, or both. There are some popular alternatives, FastAPI is the recommended one:
- FastAPI - modern, specialized for type hints, supports explicit async IO and auto-generates Swagger UI (API spec)
- Flask - exists since 2010, no explicit async IO support
- Django - a very extensive framework with many many features, essentially an ecosystem, well documented, limited explicit async IO support * Comparisons exist online, FastAPI is mostly preferred due to explicit async IO * Useful web frameworks reference on how to evaluate them
- Background tasks scheduling - run background workers such that work can be scheduled (one-time or periodic):
- apscheduler is a scheduler for a single worker with flexible jobs store choices
- arq is a lightweight distributed task queue built on top of Redis
- See Distributed programming frameworks section for more complete and heavyweight solutions.
- gRPC Client & Server - communication using gRPC protocol which is more suited than RESTful HTTP for communication between services that are part of the same system since performance is better and describing contracts using RPC (remote procedure call) is simpler conceptually compared to RESTful HTTP contracts describing resources with URLS and verbs as actions - Real Python reference & Official Google docs reference
- GraphQL - data query language for web services - graphene
- Integrates with pydantic to query over pydantic models
- Integrates with FastAPI to expose GraphQL API, usually over pydantic models
- Integrates with Flask to expose GraphQL API over whatever data you choose
- Integrates with Django to expose GraphQL API over Django models
- Event Sourcing - represent persistent entities as changesets logs and incorporate pub-sub notifications for entities' changes to update representations of the entities in additional data stores, update search indexes, notify other systems, etc. - eventsourcing
- Reactive Extensions - building asynchronous event-based programs using observable collections as a concept for working with streams of asynchronous data, useful approach for implementing custom data pipelines in your Python service - RxPY
- Docker - Docker containers are amazing for microservices and have become the de-facto standard for building & deploying them: Docker official docs reference
When there's multiple Python versions / multiple virtual environments / many 3rd party packages, maintenance becomes complex and there are tools to simplify it.
- Package managers:
For day-to-day development there are some Quality of Life libraries that can speed up development, make us more productive, spare us bugs, etc.
Comprehensive useful Python libraries & frameworks index: https://github.com/vinta/awesome-python
- nameof() operator for names refactoring support - python-varname
- URL type - yarl * pydantic also ships a URL type, however it's not useful for constructing URLs as yarl
- Async IO enhancements:
- fast implementation for the asyncio module event loop - uvloop
- async versions of built-in functions - asyncstdlib, especially important aclosing to properly cleanup after async generators, see this discussion for details.
- async general purpose utilities - aiomisc
- Decorator support for asyncio.waitfor function
- Wait for any awaitable using select
- Turn a sync func into an async func using awaitable
- Async timer using PeriodicCallback / Cron scheduling
- async caching decorator, prevents dog-piling, flexible yet simple - py-memoize
- Collections:
- Immutable dictionary - immutables
- Bidirectional dictionary - bidict
- Multivalue dictionary - multidict
- Sorted collections (set, list, dict) - sortedcontainers
- Functional Programming enhancements:
- Higher order functions & iterables - toolz & more-itertools
- LINQ for Python, inspired by C# LINQ - py-linq
- Piping syntax: pipe, useful reference
- Pattern matching capabilities - pampy is simple and very popular
- Typeclasses polymorphism (better alternative for singledispatch ) - classes
- Type-safe alternative to throwing exceptions by returning Result objects - result
- Augment static type checks with very efficient randomized runtime checks that support type annotations with custom validations - for cases where static type checks aren't sufficient, e.g. you want to assert type annotations not statically checked, or you are developing a Python library (its users might not perform static type checks when calling your library's classes/functions/etc.) - beartype
- bject-object mapper for converting between similar classes - object-mapper or odin
Automated Testing is about verifying your system code works as expected with test code that will execute your system code and assert expectations on its behavior. Unit Testing in particular is about automatic testing for small atomic code entities that have some API encapsulating implementation details, usually a class but not necessarily.
- VS Code extension to run & debug tests in VS Code - Python Test Explorer
- Recommended test framework is pytest: Real Python reference & Library docs reference
- Write async tests to test async system code - pytest-asyncio
- Mocking - unittests.mock & pytest-mock: Useful article & Real Python reference & Library docs reference * Issue: no type hints for the mocked class/function.
- Fake data generator - faker
- If you use pydantic, there's pydantic-factories to build fake models
- Improved assertions syntax (fluent syntax) - assertpy
- Test coverage reports - pytest-cov
- BDD testing - pytest-bdd
- Theory testing - hypothesis library
- Parameterized tests using fixtures as parameters values' providers - pytest-lazy-fixture
- Setup mock return values (or raised exceptions) with fluent syntax by matching arguments - nextmock
- Mock datetime - freezegun
- Mocking HTTP client
- Mocking aiohttp - aioresponses
- Mocking httpx - respx
To maintain high quality code it's useful to maintain coding standards and there are tools that can help with and even handle maintaining those standards.
- Static type checker - MyPy
- Style guide enforcement - Flake8. Useful extensions:
- flake8-builtins - checks for accidental use of builtin functions as names
- flake8-comprehensions - checks for misuse or lack of use of comprehensions
- flake8-logging-format - ensures logs use extra arguments and exception()
- flake8-mutable - checks for mutable default parameter values Python issue
- pep8-naming - checks that names follow Python standards defined in PEP8
- flake8-pytest-style - check that pytest unit tests are written according to style
- flake8-simplify - checks for general Python best practices for simpler code
- Code formatter - Black
- Import statements sorting - isort
- VS Code Python Linting - so VS Code will run MyPy, Flake8 (and others)
- VS Code Black and isort
- Github action for Python Code Quality and Linting
Some useful tips & tricks that can't be classified as some stand-alone topic.
- Delete a virtual environment in 3 steps executed from a cmd on an active environment:
- command: pip freeze > requirements.txt
- command: pip uninstall -r requirements.txt -y
- command: deactivate
- action: delete the environment's folder
- Centralize imports in a reusable imports file
- Mark parameters as positional only so their names can be used in keyword arguments: Official docs reference
- Efficient string concat - converse memory that would be used in loop of string concat ops
- Sync to Async decorator - let's say a function is expected to not be async when invoked, and we want to implement it as async so it blocks until completion when invoked as a sync function, then we can apply a general decorator. Code here. * There's an issue with type-hints to express the decorator receiving a sync callable and returning an async callable, solved in Python 3.10: Parameter Specific Variables.
- Get a function's caller info: code of "findCaller" method in Python's logging source code
- Translate (some) LINQ to Python instead of using py-linq library above
- Correctly executing sync/async generators so a finally clause runs explicitly
- Multi-core async IO (multiple event loops)
- Fork-join pattern with pool executors, can read about Fork-join concurrency pattern here.
- Limiting concurrency for large number of async IO tasks
- Limiting concurrency for outgoing HTTP requests sent with aiohttp
Documenting modules/classes/functions/etc. enables other team members (and you in the future) to understand what some API in the code does without having to inspect how it's implemented. When developing open-source the documentation is a must and needs to be available as professional online docs and not just in-code docs.
- Python Documentation guide: Real Python reference
- NumPy Docstrings guidelines - recommended standard docstrings style, not only one
- Enforce docstrings style - pydocstyle
- Professional documentation with reStructuredText (to publish PDF/HTML/etc.)
- Markdown files are supported as is in VS Code, e.g. Git README.md files
- Useful VS Code extension - Markdown All in One
- Github markdown quick tutorial
In many cases the backend architecture needs to include highly scalable distributed services where clusters of service instances can efficiently communicate between each other, persist data, recover from crashes, load balance work, etc. Good frameworks can make it simpler to implement them by providing useful abstractions for distributed programming/computing and persisting durable state.
The frameworks were chosen for their wide adoption, great features, accessible documentation and supporting modern Python features such as async-await, type hints, etc. * Most frameworks build on some data store that's referenced without explanations. * General purpose containers orchestrators are great alternatives, e.g. Kubernetes
- Celery - a distributed task queue & scheduling framework which provides workers that execute RPC jobs consumed from a flexible choice of message brokers (usually RabbitMQ or Redis) with job results persisted to a flexible choice of stores.
Features:
- Periodic tasks
- Workflows
- Custom objects serialization (e.g. can use pydantic models as messages)
- Interception for many events using signals APIs
- Instrumentation: Monitoring, logging & visualization capabilities
- Admin CLI, REST API & Web UI
- No async-await syntax support, can use gevent for implicit non-blocking I/O
- No official Windows support, there are workarounds * Useful production ready deployment reference
- Faust - a microservices framework which provides abstractions for stream processing with stateful agents that process infinite streams of messages built on top of Kafka for pub-sub and queue based messaging (extensible), RocksDB as local persistent tables store and (optionally) Redis as distributed cache. Co-founded by Celery founder.
Features:
- Queue channels - single consumer per message
- Client mode - can send messages to a cluster
- Startup tasks
- Periodic tasks
- Cluster scope synchronization via leader election
- Custom objects serialization (e.g. can use pydantic models as messages)
- Interception for many events using sensor APIs
- Instrumentation: Monitoring, logging & visualization capabilities
- Admin CLI & Web UI
- Integration testing support (runs locally with everything in-memory)
- E2E testing support in staging/production deployments
- No process recovery for workers - external process supervisors are required
- Airflow - A workflow framework for orchestrating distributed scheduled/triggered workflows (a.k.a DAGs) described using Python scripts such that workers can execute tasks (a.k.a operators) in workflows. Built on top of SQL databases for workflow state persistence and can execute on a cluster directly using Celeryor Dask, on a Kubernetes cluster, or managed in AWS/GCP. Comes with several core operators and there's a great deal of independent ones and a registry of built-in & community provided ones.
Features:
- Operators: Python code, Branching, Run DAG, HTTP request, SQL checks, etc.
- Sensors (operators waiting for events): Python code, SQL query, Time Wait, etc.
- Cross DAG dependencies (task in DAG X waits for task in DAG Y)
- DAG templates with Jinja and injecting values from env variables/JSON files/etc.
- Flexible DAG runs triggers: schedule based, re-runs, tasks results, ad-hoc, etc.
- Communications between executing DAGs/tasks
- Resources management: tasks priorities, processes pools
- Interception for DAG & operator specs and for tasks before execution
- Create Python packages with utilities, operators, etc. to reuse in DAG scripts
- DAGs validation (useful reference):
- Static DAG analysis with unit tests
- Data integrity tests in the DAG itself, e.g. using Great Expectations
- Instrumentation: Monitoring, logging & visualization capabilities
- Reusable connections to external services/APIs
- Plugins for the Web UI, custom reusable connections and DAG template macros
- Admin CLI, REST API & Web UI
- No Windows support , however the Linux subsystem can be used
- Ray - A workers & actors framework for implementing distributed, fault tolerant and scalable applications specialized for data science but usable for any distributed computing. Workers are stateless, actors are stateful, both are Python processes and therefore heavyweight and coarse grained. Ray has an academic whitepaper.
Features:
- Client mode - can send RPC requests to a cluster
- Auto-scaling (integrates to cluster managers) based on scale related settings
- Java prog. language support (not just Python)
- GPU scheduling (for machine learning purposes)
- Custom objects serialization
- Performance profiling: metrics, visualization, dumps, etc.
- Instrumentation: Monitoring, logging & visualization capabilities
- Admin CLI & Web UI
- Integration testing support (runs locally with multi process cluster)
- Kubernetes deployment
- Integrates with Airflow and other useful technologies
- Missing support for interception for workers and actors
- Alpha quality Windows support , can try to run on WSL if there are issues
- Ray actors are heavyweight processes, unlike Orleans (.Net) and Akka (Java/Scala).
- Python has a lightweight actors framework Thespian, much less popular than above.
Your contributions and proposals are always welcome! Please refer to the contribution guidelines first.
I may keep some pull requests open until I look into the references myself, you could vote for them by adding π to them.
Raphael π |
Liron Soffer π |
Yael Davidov π |