-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathINSTALL.txt
318 lines (227 loc) · 13.2 KB
/
INSTALL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
################################################################################
#################### Installation of the DADA2 nifH pipeline ###################
################################################################################
Copyright (C) 2023 Jonathan D. Magasin
The DADA2 nifH pipeline uses several external tools (e.g. R, DADA2, cutadapt).
To get them we will use miniconda3, a software package and environment manager.
This document describes how to get miniconda3 and then use it to create an
"environment" that you will use whenever running the DADA2 pipeline.
These instructions presume a unix/linux environment. They have been exercised
on Linux (workstations and a Chromebook) and macOS laptops.
Overview:
0. Get the pipeline and all documentation from the GitHub repository.
1. Install Miniconda.
2. Create the DADA2_nifH environment. This step downloads external tools and
includes them in a new environment for running the pipeline.
3. Add the pipeline scripts to your path.
-- Installation completed! --
4. Try the example
Appendix: Using R to install the DADA2 and several other R package
Below '%' is the unix shell prompt. Your's might differ. Although these
instructions should work for any unix shell, we prefer bash. You can check your
current shell with:
% echo $0
If it is not bash, you can start a bash shell (nested within your current shell):
% bash
--------------------------------------------------------------------------------
# 0. Get the DADA2 nifH pipeline, documents, etc.
--------------------------------------------------------------------------------
To download the latest version of the pipeline:
% cd ~
% git clone "https://github.com/jdmagasin/nifH_amplicons_DADA2.git"
Within your home directory, the pipeline will be in nifH_amplicons_DADA2.
Please do **not** put your analyses in nifH_amplicons_DADA2.
Verify the download by running the main pipeline command with no arguments:
% cd nifH_amplicons_DADA2
% run_DADA2_pipeline.sh
You should see an overview of the pipeline.
--------------------------------------------------------------------------------
# 1. Install Miniconda
--------------------------------------------------------------------------------
We use the Miniconda software environment manager to install external tools
required by the pipeline. (Any "conda" tool should be fine. You might like
Mamba for its speed. If you use mamba, substitute "mamba" in all "conda"
commands below.) The Miniconda installation page is here:
https://conda.io/projects/conda/en/latest/user-guide/install/index.html
Although installing Miniconda is generally straightforward, we offer two
suggestions in case you experience problems:
1. Install from a bash shell.
2. If you get errors about outdated libraries:
(a) Ask your system administrator to install newer libraries. A
system-wide upgrade impacts everyone, so your sysadmin will assess
the risk of breaking software that depend on the older libraries.
(b) You could install an older version of conda that works with your
system's older libraries, and then do "conda update conda" to get the
latest Miniconda. This will automatically get the newer libraries as
local copies. (No impact to other users.)
I used "b" to resolve an outdated glibc on an old CentOS 7 workstation.
If you successfully installed miniconda (or mamba) then your shell prompt should
change to show that "base" is the name of the active conda environment:
(base) %
If not, try starting a new terminal/shell. The installation should have added
some code to the end of your shell startup script** that will launch conda and
activate the "base" environment whenever you start a new shell.
** For bash shells the code added to the .bashrc startup script works. I have
run into problems for tcsh shells and had to update the .tcshrc startup
script manually.
Now tell conda to use the channel "conda-forge", which has many useful packages,
and to search conda-forge strictly before the "defaults" channel. Use the third
command to verify that conda-forge will be the first channel searched:
(base) % conda config --add channels conda-forge
(base) % conda config --set channel_priority strict
(base) % conda info
--------------------------------------------------------------------------------
# 2. Create the DADA2_nifH environment
--------------------------------------------------------------------------------
Create a conda environment called "DADA2_nifH" that includes everything needed
to run the DADA2 nifH pipeline as well as the ancillary scripts for pre- and
post-pipeline quality filtering and annotation.
(base) % cd ~/DADA2_nifH_pipeline/Installation
(base) % conda env create --file environment_DADA2_nifH_with_ancillary.yml
If you do not want to use the ancillary scripts, then you can substitute
environment_DADA2_nifH_minimal.yml to create a DADA2_nifH environment that
excludes packages required by the ancillary scripts. This will save some disk
space but is not recommended. The ancillary scripts are essential for running
the nifH amplicon workflow described in Morando, Magasin et al. 2024.
Let's verify the conda DADA2_nifH environment. First activate it and note that
the prompt changes:
(base) % conda activate DADA2_nifH
(DADA2_nifH) %
Now use the following commands to verify that several required tools are
available from the DADA2_nifH environment.
(DADA2_nifH) % which R
/home/you/miniconda3/envs/DADA2_nifH/bin/R
(DADA2_nifH) % which cutadapt
/home/you/miniconda3/envs/DADA2_nifH/bin/cutadapt
(DADA2_nifH) % which hmmalign
/home/you/miniconda3/envs/DADA2_nifH/bin/hmmalign
(DADA2_nifH) % which FragGeneScan
/home/you/miniconda3/envs/DADA2_nifH/bin/FragGeneScan
Please also run this short R script which additionally checks for missing R
packages:
(DADA2_nifH) % check_installation.R
Later, you can deactivate the environment with:
(DADA2_nifH) % conda deactivate
After deactivation, the 'which' commands above will either report errors or will
find default versions of the tools on your system. (E.g. R will probably be
found in /usr/bin/R.)
** Whenever you want to use the DADA2 nifH pipeline, you must first "activate"
the DADA2_nifH environment as described above. **
--------------------------------------------------------------------------------
# 3. Add the pipeline scripts to your path
--------------------------------------------------------------------------------
Short version (if you are very comfortable with unix shells):
Modify your path to include: ~/DADA2_nifH_pipeline/bin
The bin directory includes symbolic links to all of the pipeline scripts (in
'scripts') as well as ancillary scripts (hierarchy in 'scripts.ancillary').
Long version:
When you login remotely to your server or launch a terminal on your laptop, your
shell will be initialized by running a few scripts that set up various
preferences, including your 'path'. The path helps the shell locate programs
that run any commands you type into the shell, so that you do not have to type
in the full path to the command. For example, you can simply run "R" rather
than "/usr/bin/R" (if you can remember where R is!). When you activate the
DADA2_nifH environment, miniconda3 temporarily modifies your path to find the
needed tools, such as R v4.1 mentioned above. (The old path is restored once
you deactivate DADA2_nifH.)
We must modify the path so that the shell can find the scripts in the DADA2 nifH
pipeline. (The pipeline is not a conda package.) For convenience we will
modify the shell initialization script so that every time you login the DADA2
pipeline scripts will be added to your path.
Note that:
1. This section assumes you installed the pipeline in ~/DADA2_nifH_pipeline as
described above in step #0. If you installed elsewhere, just adjust path
components accordingly.
2. The commands below do not *use* the pipeline. So it does not matter if
DADA2_nifH is active or not.
First, determine what your shell is. Different shells use slightly different
syntaxes in their scripts. Below I explain for bash and tcsh only.
(DADA2_nifH) % echo $0
If your shell is bash, then do this. Be *very* careful to use the *double*
angle brackets >> so that the path-setting command is *appended* to .bashrc.
[First we will make a backup of .bashrc just in case.] Also be careful that you
type everything exactly as shown in this ASCII text document. Cut-and-paste
from an email or Word document could change the quotes. The 'tail' command just
shows that the path setting command was appended to .bashrc.
(DADA2_nifH) % cp ~/.bashrc ~/.bashrc.OLD
(DADA2_nifH) % echo 'export PATH="$PATH:~/DADA2_nifH_pipeline/bin"' >> ~/.bashrc
(DADA2_nifH) % tail ~/.bashrc
If your shell is tcsh, then do this. The cautions mentioned above apply.
(DADA2_nifH) % cp ~/.tcshrc ~/.tcshrc.OLD
(DADA2_nifH) % echo 'set path=($path $HOME/DADA2_nifH_pipeline/bin)' >> ~/.tcshrc
(DADA2_nifH) % tail ~/.tcshrc
The path modifications will only become active when you next login. Log off
(deactivate DADA2_nifH if necessary, then 'exit' or 'logout'), then ssh back
into your server or open a new terminal on your laptop.
Now verify that the shell can find these two main pipeline scripts:
% which run_DADA2_pipeline.sh
/home/you/DADA2_nifH_pipeline/run_DADA2_pipeline.sh
% run_DADA2_pipeline.sh --help
% which organizeFastqs.R
/home/you/DADA2_nifH_pipeline/scripts/organizeFastqs.R
Although the shell can find the scripts, you must first activate DADA2_nifH for
them to work.
% conda activate DADA2_nifH
That's it! You should now be able to run the pipeline on a small example.
--------------------------------------------------------------------------------
# 4. Try the mini Arctic 2017 example
--------------------------------------------------------------------------------
Please see the example:
% cd ~/DADA2_nifH_pipeline/Example
You can probably begin at step 2.
--------------------------------------------------------------------------------
# Appendix: Using R to install the DADA2 R package and a few others
--------------------------------------------------------------------------------
We strongly recommend using conda to install dada2 and other required packages.
However, the latest version of DADA2 can also be installed from within R as
described in Benjamin Callahan's DADA2 installation page:
https://benjjneb.github.io/dada2/dada-installation.html
You might wish to do this if the dada2 version in bioconda is too old. However,
installing from R requires the compilation of many R packages needed by dada2
and will likely take >30 min. More importantly, errors sometimes occur during
compilation and can be laborious to solve. If you decide to install from R, at
the very least your system must have development tools available. (If on a
Linux machine, you probably already have GNU tools. If using macOS, you might
have to install Xcode.)
*** DADA2 R package install ***
Use Bioconductor to install DADA2 as described at:
https://bioconductor.org/packages/release/bioc/html/dada2.html
In the shell with DADA2_nifH active, start R:
(DADA2_nifH) % R
Then in the R console do the following to install Bioconductor and DADA2:
## Install Bioconductor if necessary. It takes just a few seconds. When
## asked which CRAN mirror to use, "0-Cloud [https]" is good.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
## Now use Bioconductor to install the latest dada2. This takes a long time.
## Check in periodically because at some point you will be asked if you want
## to update all/some/none of several mysteriously named packages. Choose
## all.
BiocManager::install("dada2")
The install takes around 30 min on our workstation because DADA2 uses many
packages that must be downloaded and compiled. At the end of a successful
installation, you should see a few messages about tests that worked. You can
verify that dada2 was installed by loading it (and then you can exit R).
library(dada2) # Better work!
quit() # Hit 'n' when it asks if you want to save.
Notes:
1. The dada2 R package will only be available when DADA2_nifH is active.
miniconda3 manages (and other tools) that were installed to DADA2_nifH.
However, R itself must manage packages that were installed using R. For
example, the dada2 R package you will be stored in a directory with all
other R packages:
~/miniconda3/envs/DADA2_nifH/lib/R/library/dada2
This makes it possible to have different conda environments with different
versions of R and R packages.
2. The dada2 web page (link above) will mention the specific version of R
that is required. It might differ than v4.1.3 mentioned in
environment_DADA2_nifH.yml.
*** Install vegan and digest ***
While DADA2_nifH is the active environment, launch R. Then do a CRAN install of
vegan and digest (assuming you did not install them using conda):
install.packages('vegan') # You can choose the "0-Cloud" mirror again.
install.packages('digest')
quit() # Hit 'n' when it asks if you want to save.
You should now have all the required external software for running the DADA2
nifH pipeline.
-- END --