Shipping C++ code/libraries with Python package #1063
Unanswered
burgholzer
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey 👋🏼
I am seeking some advice on how to best package/distribute a collection of tools we develop as part of our research (specifically, the Munich Quantum Toolkit; https://mqt.readthedocs.io/en/latest/) .
Some background and the main question
All of these tools are developed in C++17, exposed to Python using pybind11, and built into a Python package using scikit-build-core.
The tools are developed to work on all major operating systems and Python versions.
Structurally, there is a core library (called
mqt-core
), which many of the other libraries build on (e.g.,mqt-qcec
ormqt-qmap
).Historically, only the top-level tools where made available as Python packages on PyPI, and
mqt-core
was consumed as a submodule (usingFetchContent
).However, recently, we also created Python bindings for the core library and started publishing them as a Python package.
This creates a situation where the C++ code in the top-level project depends on the submodule code of
mqt-core
, while the Python part of the top-level project depends on themqt.core
Python package.The general question for this discussion is: How to best handle such a situation?
The desired solution and what we tried so far
Ideally, we would want to ship the core library C++ code (more specifically, probably, the compiled libraries) with the Python package itself, so that the respective targets can be found in the top-level CMake project given the
cmake
directory is appropriately added to the search locations and the package itself provides a<...>-config.cmake
(it does).I have seen header-only projects (such as pybind11) do that, which is simple, because all it needs is the header files.
Nanobind (which is not header-only) simply ships the complete C++ sources as part of the package install and lets the consuming project build
nanobind
as part of its build.However, it is important that the code used to compile the core library Python package (mqt.core) and the top-level compiled extensions (e.g., mqt.qcec) is the same as, otherwise,
pybind11
won't be able to recognize that themqt.core
Package provides bindings for a C++ type that is being used in the interface of a method within themqt.qcec
package.By now, the
mqt-core
project provides installation instructions so that it can install all the necessary libraries as part of the Python package build. These also get properly added tosite-packages
. By default, any project builds static libraries. While this setup seemed to work at first (building themqt.core
package locally and letting it be discovered byFetchContent
in the top-level project), once we actually released a new version ofmqt.core
with wheels built by cibuildwheel, we noticed all kinds of problems when trying to use the resulting package in the top-level project:pybind11
was not able to identify a type that has bindings inmqt.core
and is being used inmqt.qcec
The next logical step would be to try and build shared libraries instead of static ones and distribute those with the project, e.g., via setting
option(BUILD_SHARED_LIBS "Build using shared libraries" ON)
in the main CMake file.However, this creates another issue since when compiling a binary extension for a pybind module, the default symbol visibility is set to
hidden
.Since none of the functions currently explicitly export symbols, the correspondingly built shared libraries cannot be linked when built alongside the pybind extension.
So I went on and read up on symbol visibility and how to set it explicitly (with the help of the
GenerateExportHeader
CMake module).However, this seemed really hard if not impossible to get working cross-platform because as soon as I fixed a visibility-related issue for one compiler or one OS, one of the other OSes or compiler would complain that this is not allowed and I had to revert.
If necessary, more details are available in the following draft PRs:
Now with a little more context, back to the general question: what is the best/recommended way to resolve this? Am I on the right track with building shared libraries and explicitly exporting symbols and do I just need to find whatever combination of visibility macros that works; or is there a better way to accomplish the overall goal.
A sketch of a minimal example
The following is a huge over-simplification, but aims to provide at least some kind of easier basis to reason about.
Core Project
Has a C++ class
that is being compiled as part of a CMake library target.
Additionally, the class is bound in a pybind11 module that links to the above library and exposes the class to Python.
Top-level project
Depends on the CMake library target of the core project and provides a method that uses the
Circuit
class, e.g.,as part of its own CMake library target.
Additionally, also the top-level project exposes its methods to Python via a pybind11 extension that links to the top-level project library.
(Assume that there are multiple top-level projects that might co-exist and all depend on the core library on the C++ as well as the Python side)
How would this best be packaged and distributed?
Please let me know, if you would need any further information!
Many thanks 🍀
Beta Was this translation helpful? Give feedback.
All reactions