CONDA support in Guix is work-in-progress. Watch this place.
CONDA has come out on top in some ways and is becoming a popular choice for scientific software deployment. In intention it is similar to language packaging systems, such as Python pip with virtualenv and Rubygems with bundler, but its scope is wider because it includes a build environment.
CONDA itself builds on top of the underlying software distribution using C compilers and such. In addition it manages a subset of low level libraries (including readline, ncurses, libxml2). It rather is a distribution in a distribution that can act fully in user land and provides source and binary deployment.
GNU Guix, as you can read elsewhere in my notes (e.g. RUBY.org), also addresses all of CONDA features with security and reproducibility thrown in - i.e., unlike Guix CONDA can never be fully reproducible and still suffers from dependency hell.
For bioinformatics, a lot of work is going into CONDA. GNU Guix is not going to compete with that in the short term. For example in the GALAXY project they are introducing CONDA by default. To leverage CONDA we can get CONDA support into Guix - i.e., if the base distribution is reproducible, at least the CONDA packages will be predictable (on Linux) and somewhat reproducible, similar to the way we are supporting RUBY.org in Guix. It is a win-win. We can use CONDA packages if we have no alternative, and the CONDA users can use Guix to get a predictable system with less deployment hassles
To install CONDA start with miniconda which pulls in a number of python tar balls and installs them in a PATH of your preference (say $HOME/opt/miniconda). When grepping through the files this PATH is nicely propagated. Once conda is in the path you can add channels for bioconda:
conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda
conda search sambamba
Fetching package metadata .............
sambamba 0.5.9 0 bioconda
0.5.9 1 bioconda
0.6.1 0 bioconda
0.6.2 0 bioconda
0.6.3 0 bioconda
0.6.5 0 bioconda
conda install sambamba
The following NEW packages will be INSTALLED:
bcftools: 1.3.1-1 bioconda
curl: 7.45.0-2 bioconda
libgcc: 5.2.0-0
ncurses: 5.9-8
sambamba: 0.6.5-0 bioconda
samtools: 1.3.1-4 bioconda
installs sambamba in ~/opt/miniconda2/bin. Likewise if you install Ruby it will install openssl in ~/opt/miniconda2. An ldd shows that it picks up a mixture of local libs and libs installed inside miniconda. During the installation phase these binaries get patched to match the new CONDA PATH:
ldd lib/ruby/2.2.0/x86_64-linux/openssl.so
linux-vdso.so.1 (0x00007ffd9fa91000)
libruby.so.2.2 => /home/wrk/opt/miniconda2/lib/ruby/2.2.0/x86_64-linux/../../../libruby.so.2.2 (0x00007f2f5ce95000)
libssl.so.1.0.0 => /home/wrk/opt/miniconda2/lib/ruby/2.2.0/x86_64-linux/../../../libssl.so.1.0.0 (0x00007f2f5cc1e000)
libcrypto.so.1.0.0 => /home/wrk/opt/miniconda2/lib/ruby/2.2.0/x86_64-linux/../../../libcrypto.so.1.0.0 (0x00007f2f5c7e7000)
libpthread.so.0 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libpthread.so.0 (0x00007f2f5c5ca000)
librt.so.1 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/librt.so.1 (0x00007f2f5c3c2000)
libjemalloc.so.1 => /home/wrk/opt/miniconda2/lib/ruby/2.2.0/x86_64-linux/../../../libjemalloc.so.1 (0x00007f2f5c180000)
libgmp.so.10 => /home/wrk/opt/miniconda2/lib/ruby/2.2.0/x86_64-linux/../../../libgmp.so.10 (0x00007f2f5bf0d000)
libdl.so.2 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libdl.so.2 (0x00007f2f5bd09000)
libcrypt.so.1 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libcrypt.so.1 (0x00007f2f5bad2000)
libm.so.6 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libm.so.6 (0x00007f2f5b7d3000)
libc.so.6 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libc.so.6 (0x00007f2f5b42e000)
/gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/ld-linux-x86-64.so.2 (0x00007f2f5d5f0000)
Path rewriting is documented in the CONDA documentation as:
Files that should have the placeholder prefix (/opt/anaconda1anaconda2anaconda3) replaced with the install prefix at installation. Note that conda build does this automatically for the build prefix.
:
Binary files that should have their build prefix replaced with the install prefix at installation time. Due to the way this works, the install prefix cannot be longer than the build prefix. It is recommended to build against a very long prefix. The easiest way to do this is to install miniconda into a very long path. Future versions of conda build may do this automatically.
Apparently there is work going on to package the basic dependencies in CONDA too (incl. glibc) in the conda-forge project. At appears to me that CONDA would benefit from a Guix build platform instead - making reproducible environment feasible.
After discussing bioconda+Guix with Brad Chapman we are going to pursue the following steps:
- Create a miniconda/bioconda base environment in Guix, essentially a bootstrap that has the features of Guix
- Look into path rewriting as is doen in CONDA
- See if we can distribute a Guix tarball and rewrite it into CONDA
For #2 we may use sed (using a HEX translation) or Nix’ patchelf. One advantage of patchelf is that paths can be larger than the original.
I am playing with Eelco’s patchelf.
https://github.com/NixOS/patchelf. One interesting aspect is that it can grow the path. Also shell script paths we can grow in place (the script will just be larger). So, as long as we don’t hit the bash limitation we should be fine rewriting lib paths!
This makes it possible to relocate binaries.
ldd $HOME/.guix-profile/bin/hello
linux-vdso.so.1 (0x00007ffc71374000)
libgcc_s.so.1 => /gnu/store/zy233badri3sffqi2s2kq8md6qz65iiz-gcc-4.9.3-lib/lib/libgcc_s.so.1 (0x00007f13692d5000)
libc.so.6 => /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/libc.so.6 (0x00007f1368f30000)
/gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/ld-linux-x86-64.so.2 (0x00007f13694eb000)
cd ~/opt/bin
cat ~/.guix-profile/bin/hello > hellop
chmod a+x hellop
./hellop
Hello, world!
./patchelf --print-rpath hellop
/gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib:/gnu/store/zy233badri3sffqi2s2kq8md6qz65iiz-gcc-4.9.3-lib/lib:/gnu/store/zy233badri3sffqi2s2kq8md6qz65iiz-gcc-4.9.3-lib/lib/gcc/x86_64-unknown-linux-gnu/4.9.3/../../..
./patchelf --set-rpath $HOME/opt/lib hellop
./patchelf --print-rpath hellop
$HOME/opt/lib
cat /gnu/store/zy233badri3sffqi2s2kq8md6qz65iiz-gcc-4.9.3-lib/lib/libgcc_s.so.1 > ~/opt/lib/libgcc_s.so.1
strings -t d hellop | grep /gnu/store
512 /gnu/store/qv7bk62c22ms9i11dhfl71hnivyc82k2-glibc-2.22/lib/ld-linux-x86-64.so.2
14920 /gnu/store/a49zfzc6xr6g20azlpjs8sikj6v5lnkp-hello-2.10/share/locale
17024 /gnu/store/a49zfzc6xr6g20azlpjs8sikj6v5lnkp-hello-2.10/lib
We are still using the Guix elf loader ld-linux. Let’s patch it with dd:
printf "/lib64/ld-linux-x86-64.so.2\x00" > newpath
dd if=newpath of=hellop obs=1 seek=512 conv=notrunc
0+1 records in
28+0 records out
28 bytes (28 B) copied, 0.000291519 s, 96.0 kB/s
ldd hellop
linux-vdso.so.1 (0x00007ffebc7b3000)
libgcc_s.so.1 => $HOME/opt/lib/libgcc_s.so.1 (0x00007fa20ef94000)
libc.so.6 => $HOME/opt/lib/libc.so.6 (0x00007fa20ebef000)
/lib64/ld-linux-x86-64.so.2 (0xi00007fa20f1aa000)
and it still works. To patch the interpreter ld-xxx.so.2 also patchelf can be used.
To relocate stuff, I’ll want to retain the hash value. But this is just a first try. I think I have the pieces now to relocate Guix built binaries into other dirs.
Other tools of interest for dealing with elf binaries are readelf, dumpelf, eresi and objdump.