Skip to content

pip user research start here

Bernard Tyers edited this page Feb 10, 2020 · 1 revision

Python

Useful Python resources:

Python is a general-purpose programming language and so designed to be used in many ways.

Python’s flexibility is why the first step in every Python project must be to think about the project’s audience and the corresponding environment where the project will run. This helps to avoid future headaches.

Package management

The Python Packaging user guide is a good place to start.

An overview of packaging for Python explains the most important parts.

The Manifest podcast is a useful podcast that explains what software package management is. There is a package manager maintainer interviewed every 2 weeks.

Packaging is all about target environment and deployment experience. There are many answers to the questions above and each combination of circumstances has its own solutions. With this information, the following overview will guide you to the packaging technologies best suited to your project.

The way Python is packaged depends on who the users of the project will be.

A talk about python packaging

Packaging for "Python technical development" usage

These are Python native distribution tools. They are mostly built for distributing reusable code, called libraries, between developer. Libraries are building blocks, not complete applications.

Packaging for "achieving users goal" usage

Here general usage means: usage is focused on using the Python language to the achieve the users goal, as opposed to the development of Python software.

These usages seem to focus on web service and mobile application development, scientific based usages, running (Python) applications, and managing computer infrastructure.

Key Python installation and packaging projects

All most relevant Python installation and packaging projects are listed with full explanation, links to repositories, IRC channels, and mailing lists on the Python packaging wiki.

bandersnatch

A PyPI mirroring client designed to efficiently create a complete mirror of the contents of PyPI.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

distlib

A library which implements low-level functions that relate to packaging and distribution of Python software.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

packaging

Core utilities for Python packaging used by pip and setuptools.

Most Python users rely on this library without needing to explicitly call it; developers of the other Python use its functionality to parse, discover, and otherwise handle dependency attributes.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

pip

The most popular tool for installing Python packages, included with modern versions of Python. It provides the essential core features for finding, downloading, and installing packages from PyPI and other Python package indexes.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

pipenv

Aims to bring the best of all packaging worlds to the Python world. It harnesses Pipfile, pip, and virtualenv into one single toolchain.

It aims to help users manage environments, dependencies, and imported packages on the command line.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

pipfile

Pipfile and it's sister Pipfile.lock are a higher-level application-centric alternative to pip’s lower-level requirements.txt file.

Python packaging user guide

Python Packaging User Guide

The Python packaging user guide

readme_renderer

A library that package developers use to render their user documentation (README) files into HTML from markup languages

Developers call it on its own or via twine, as part of their release management process, to check that their package descriptions will properly display on PyPI.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

setuptools

A collection of enhancements to the Python distutils that allow you to more easily build and distribute Python distributions, especially ones that have dependencies on other packages.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

twine

The primary tool developers use to upload packages to the Python Package Index or other Python package indexes. It is a command-line program that passes program files and metadata to a web API.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

virtualenv

A tool which uses the command-line path environment variable to create isolated Python Virtual Environments, much as venv does.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

Warehouse

The current codebase powering the Python Package Index (PyPI). It is hosted at pypi.org. The default source for pip downloads.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

wheel

Offers the bdist_wheel setuptools extension for creating wheel distributions, also offers it's own command line utility for creating and installing wheels.

The Python packaging user guide has a full explanation with mailing lists, repos and IRC channels.

What is pip?

pip is the package installer for Python. You can use pip to install packages from the Python Package Index (pypi) and other indexes.

$ man pip

Different parts of pip

Dependencies

FOSDEM 2020 talk on managing dependencies in Python

Resolver

Panel on Package managers: resolve differences

Prior Python research

Warehouse

The majority of prior research has focused on Warehouse the next generation Python Package repository designed to replace the legacy system that powers pypi.org

Goals:

  • visual identity
  • how to make UI a11y
  • what devices and browsers to support
  • how to evaluate and validate designs
  • make packages more disoverable
  • help users and package maintainers

Approaches

  • make no assumption about prior knowledge of use, instead link to relevant help
  • provide popovers for contextual help
  • visual design designed to be large and accessible

Evaluation

A site designed for beginners can be used by everyone; a site designed by experts may alienate and confuse beginners.

Evaluation was done by carrying out usability testing sessions with users of all experience levels.

Pypi - Warehouse

Podcast about new pypi.org

Nicole -joined python 2015, rebuilding pypi design and html, css.

Ernest - physics background. became a BA started using python to stop using excel. Contibutes to infra - web, mail, servers. pypi warehouse is one of the biggest projects as part of PSF.

Dustin - comp.sci, c/c++ background. Joined pypi 2 yrs ago. Contributed to the pypi website.

Warehouse - python package index

The podcast was about the design and launch of the new pypi.org website. pypi is a packager warehouse. How much python is a teaching language - might be the first experience.

It was important that it looked friendly, and reflected the community - design and a11y. There are so many users so a % will be using AT.

Domain names were changed a few times for simplicity and infrastrusture reasons, and buying new domain names.

Tech stack

It's built on Kubernetes. A new kubernetes platform project came out of the move where warehouse admins can set and configure env variables and pull down new versions,. It allows continuous deployment.

The new infra (using docker and docker compose) made it easier for people to contribute to pypi.

Pypi is running on AWS in the US, using Postgres, Amazon RDS, Elasticache and Redis.

Redesign of pypi.org

Nicole had free reign to redesign. She ignored the old codebase. We have a fresh new thing and bring it to modern world - it had to be responsive, work on all devices, and a11y.

HTML is using bim <?>, custom reusable blocks, and SCSS - ITCSS. Modifying the codebase is logical and easy to read.

Traffic to pypi.org

Rought figures were the old system did ~ 10B requests per month. The old service did 6.5B requests per month in the last month, 1.5PB of data, 150ms of latency with not that many errors. Fastly is the CDN that offered to front pypi.

At the backend the pypi website does 25-30 requests per second. Pyramid <?> is the web app framework that they use.

The rollout of the new pypi.org

They switched over the traffic from the old site to the new site bit-by-bit.

The main traffic sources are

  • pip install (by a BIG %)
  • xml-rpc
  • web traffic

They did traffic replay - take traffic examples, and then replay traffic that looked similar.

It worked fine for the first 15 mins. Packages were hosted on the same domain. They switched to making a server for new content - security.

It caused redirect loops - files.python.org - cached issues, redirected.

status.python.org

Making the change caused a weird CDN change/hosting issue to happen. Overall it was a big success, apart from the files usage (which is a big part of pypi.org).

We have to be strategic what to add to warehouse. That is the biggest long term benefit.

Warehouse roadmap

What are the new features of warehouse?

Not much difference in features - move to a new system for modernisation and infrastructure.

The Pypi Warehouse roadmap is available on the wiki.

Users can write markdown descriptions of packages on pypi.org.

The old back/front-end was highly custom and made it harder for people to contribute new features. It brings the ability for new APIs, ability to deprecate packages.

Search

The warehouse search is much better than legacy - full text search can be done across descriptions, package names, authors.

The way search works now is -

  • upload a package
  • the search index runs every 3 hours

They can't incrementally the update at the moment, but want to add it. There is autocomplete on searches. They want to make a search API in future.

pip search runs using the xml-rpc API which is being deprecated. Future thinking.

Shutting pypi legacy

It's being kepy up for a while. The old domain will eventually redirect to new warehouse.

New features

  • Audit
  • security
  • accessibility are things they want to add.
  • Github sign-on
  • mobile application (can we do a mobile UI for pypi?)

A mobile application would need an API to interact with pypi. Less than 10% of users use it on a mobile device.

  • security notification for python packages

Typosquatting where a malicious user uploaded a

Ideas for handling this are marking as insecure, deprecating, get warning. This is something that package managers need to hanld in modern world.

"It'd be nice if I could do pip security checkup these X things have security warnings, these Y have updates".

Security issues aren't something that happens often in pythin packages.

Groovy Python packages - requests, standard-lib-module-names, pretend, factor-boy. Also import-pypi isn't on pypi.

Release of pip 10

Been a long time since there was a release. It's a foundational refactoring of internals.

It makes a lot of the internal tooling available for things built around and ontop of pip. You might not need to use private APIs.