Binary Dependencies: Identifying the Hidden Packages We All Depend On
- Track:
- Tooling, Packaging, Developer Productivity
- Type:
- Talk
- Level:
- advanced
- Duration:
- 30 minutes
Abstract
Package manifests like pyproject.toml record source-level dependencies: pandas depends on numpy's code. The story is different for binary dependencies, which exist whenever compiled code, like C code, is called from Python. numpy depends on OpenBLAS's binaries, but this dependency relationship is not recorded anywhere. This makes OpenBLAS a phantom binary dependency.
Phantom dependencies are therefore hidden from programmers and researchers, which is bad for at least two reasons.
First, security. If one of your binary dependencies has a vulnerability, this means your project is probably also vulnerable — but you won't reliably find out about this, since your dependency is invisible.
Secondly, sustainability. If we can't keep track of our binary dependencies, we can't keep track of their maintainers either, which means we can't credit and financially support them. This can lead to maintainer burnout, which has already created serious supply chain issues.
Python is not only tremendously popular, but also valued for its ability to easily interface with compiled libraries. According to my research, around 20% of Python packages have binary dependencies.
This means that the problem of phantom binary dependencies is widespread, and puts the public at risk of harm, eg if critical infrastructure like hospitals or transportation is compromised by exploiting the aforementioned weaknesses.
I aim to describe how the problem of phantom binary dependencies can be fixed within the Python ecosystem, and demo some of my preliminary work.
First, binary dependencies must be identified. Tools like auditwheel and elfdeps are able to identify a project's required dynamic libraries. If we create better APIs for these tools, and integrate them with package managers such as pip and uv, we can give developers and researchers visibility into binary dependencies, dispelling the phantom.
Beyond this, standards like PEP 725, PEP 770 and PEP 804 specify how we might record binary dependency relationships in an easily accessible way. I'll explain how we can build on these standards to create tools that will allow users and researchers to explore binary dependencies and identify security issues by default.
Lastly, I want to talk about the road towards the ultimate aim of having binary dependencies be managed not by Python package managers, but by system package managers, as they should be. This will require interoperation between package managers, and I'll explain how this might work.