-
Hello, Mesoners- I'm creating a Python package for a collection of software created by a colleague and his student comprising a few standalone C/CUDA programs (compiled with simple First, how would you recommend handling the standalone programs? Can I build them via Second, how would you recommend handling the large data files (~100 MB)? My current plan is to mimic how some AstroPy packages handle such cases: The data are not included in the project's repo, but rather stored in a permanent, citable location (e.g., Dataverse or Zenodo). The package provides a function that fetches the data and stores it in a user-designated location (by default, a hidden directory in the home directory). A user-specific configuration file identifies the data location for subsequent runs. A good example is the Any advice would be appreciated. -Tom |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Hi @tloredo, thanks for your question and interest. To answer this one first:
This seems very reasonable. I'll note that scikit-learn, scikit-image and SciPy all have data loaders that work along similar lines. I believe scikit-learn has custom code for data set downloading, while scikit-image and SciPy both use https://github.com/fatiando/pooch as an optional dependency. If the data is optional, I'd not add the option to do the data retrieval in the package build files. Rather, just let the user do If the data is non-optional, then I'd use Git LFS or similar and not provide a download function. So either way: not as part of the Meson (or setup.py) build. If you do want that after all, you'd expose it as a build option (i.e., a CLI flag, as in https://mesonbuild.com/Build-options.html). In your |
Beta Was this translation helpful? Give feedback.
Hi @tloredo, thanks for your question and interest.
To answer this one first:
This seems very reasonable. I'll note that scikit-learn, scikit-image and SciPy all have data loaders that work along similar lines. I believe scikit-learn has custom code for data set downloading, while scikit-image and SciPy both use https://github.com/fatiando/pooch as an optional dependency.
If the data is optional, I'd not add the option to do the data retrieval in the package build files. Rather, just let the user do
import mypk…