-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathimage_analysis_tutorial_swc.tex
342 lines (302 loc) · 19.8 KB
/
image_analysis_tutorial_swc.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
\documentclass[a4paper]{report}
\usepackage{amsmath,mathrsfs,amsfonts}
\usepackage{mathtools}
\usepackage{tabularx,booktabs}
\usepackage{graphicx}
\usepackage{subcaption}
\usepackage{mdframed}
\usepackage{algorithm}
\usepackage{algorithmicx}
\usepackage{algpseudocode}
\usepackage[a4paper,margin=2.7cm,tmargin=2.5cm,bmargin=2.5cm]{geometry}
\usepackage{tikz}
\usepackage{listings}
\usepackage{hyperref}
\usepackage{color}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.95}
\lstdefinestyle{matlab}{
commentstyle=\color{codegreen},
keywordstyle=\color{blue},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\footnotesize,
numbers=left,
breakatwhitespace=false,
xleftmargin=.15in,
xrightmargin=.15in,
breaklines=true,
captionpos=b,
keepspaces=true,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2,
firstnumber=last
}
\lstset{style=matlab}
\usetikzlibrary{bayesnet}
\addtocounter{chapter}{1}
\makeatletter
\renewcommand{\thesection}{\@arabic\c@section}
\renewcommand{\thefigure}{\@arabic\c@figure}
\makeatother
\newcommand{\nexercise}[0]{\arabic{exercises}\addtocounter{exercises}{1}}
\hypersetup{
colorlinks=true,
linkcolor=blue,
filecolor=magenta,
urlcolor=blue,
}
\begin{document}
\newcounter{exercises}
\addtocounter{exercises}{1}
\newmdenv[linewidth=2pt,
frametitlerule=true,
roundcorner=10pt,
linecolor=red,
leftline=false,
rightline=false,
skipbelow=12pt,
skipabove=12pt,
nobreak=true,
innerbottommargin=7pt,
]{exercisebox}
\begin{center}
\textbf{\Large{Image analysis}}
\end{center}
\section{Overview}
In this tutorial we will go through the steps of required to transform images acquired on the microscope into activity traces, which can be used for downstream analyses.
You will need to clone or copy several GitHub repositories to proceed with this tutorial:
\begin{itemize}
\item \href{https://github.com/tenss/image-analysis}{https://github.com/tenss/image-analysis} the image analysis pipeline described in this document
\item \href{https://github.com/DylanMuir/TIFFStack}{https://github.com/DylanMuir/TIFFStack} for lazy TIFF loading (Dylan Muir)
\item \href{https://github.com/BaselLaserMouse/ast\_model}{https://github.com/BaselLaserMouse/ast\_model} for neuropil correction (Maxime Rio)
\end{itemize}
You can also find all the code used in this session and an example dataset at on winstor at \href{smb://winstor.ad.swc.ucl.ac.uk/winstor/collaboration/openserialsection/SWC\_Imaging\_Course/image-analysis}{//winstor.ad.swc.ucl.ac.uk/winstor/collaboration/openserialsection/SWC\_Imaging\_Course/image-analysis}.
To begin with make sure that all the above repos are added to your MATLAB path.
\section{Loading data}
Most functional imaging data sets you will work with in practice will be too large to load into RAM in their entirety.
To overcome this problem you can implement your own batch-loading subroutines or use packages such as \texttt{TIFFStack}, which allows you to memory map image files and read them from disk on demand.
To use it, we'll create a \texttt{TIFFStack} object by calling the constructor:
\lstset{emph={TIFFStack}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
imStack = TIFFStack(stackpath);
\end{lstlisting}
We can then retrieve data by treating the \texttt{TIFFStack} as a regular MATLAB tensor.
If we want to concatenate several stacks, we can do this with \texttt{TensorStack} (\href{https://github.com/DylanMuir/TensorStack}{https://github.com/DylanMuir/TensorStack}), which will allow us to access the data transparently:
\lstset{emph={TensorStack}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
% create a TIFFStack for each file
imgs = cellfun(@(p) TIFFStack(p, false), stackpaths, 'un', false);
% concatenate TIFFStacks into a single TensorStack
imStack = TensorStack(3, imgs{:});
\end{lstlisting}
\texttt{TensorStack}'s can also be \texttt{reshape}'d just like MATLAB tensors, which is useful when working with multiple channel or z-plane datasets.
\begin{exercisebox}[frametitle={Exercise \nexercise: Using \texttt{TIFFStack} to memory map image data}]
Use \texttt{TIFFStack} to load an image stack. Using the \texttt{whos} command check memory usage of the resulting object.
Then use \texttt{size} to check the its dimensions (\texttt{TIFFStack} class implements several this and several other functions that can typically use regular matrices as arguments).
\end{exercisebox}
\section{Offset calculation}
As you recall from the discussion on imaging sensors, the acquisition hardware will typically introduce an offset to the signal.
As the result the image values will $\neq 0$ even when no light is reaching the sensor.
Estimating this offset correctly is important for measuring fluorescence responses accurately.
\begin{exercisebox}[frametitle={Exercise \nexercise: Pixel value distribution}]
To get an intuition for raw image data, examine a single imaging frame and the distribution of its pixel values.
\end{exercisebox}
One approach to estimate the offset would be to take the minimum fluorescence value in the image.
However, in the absence of light image values will vary around the offset value due to read-out noise and the minimum will underestimate the offset.
Therefore, it would be more appropriate to use the mode of the pixel value distribution.
To estimate it, we can rely on the fact that in images acquired on a resonant scanning 2p microscope, the plurality of pixels contains zero photons.
We will fit a Gaussian mixture model (GMM) to the pixel value distribution and use the component with the smallest mean as an estimate of the offset.
\lstset{emph={fitgmdist}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
% fit GMM with 5 Gaussians
options = statset('MaxIter',1000);
gmmodel = fitgmdist(double(frame(:)), 5, 'Options', options);
\end{lstlisting}
\begin{exercisebox}[frametitle={Exercise \nexercise: GMM for offset estimation}]
Fit a GMM to the pixel value distribution and examine the fit. Is the estimate of the offset sensitive to the number of GMM components used (try 3, 5, or 7)?
\end{exercisebox}
\section{Motion correction}
If you scroll through the imaging stack, movement artefacts will be readily apparent.
While we cannot correct for z-drift offline (unless we acquire fine z-stacks), we can and should identify and correct for translational motion.
To do this, we first need to define an image that we are going to use as the reference for registration.
Typically, we can simply use the mean of the stack as a reference.
However, if motion is excessive we might have to select a subset of frames where it is managable.
We can use the \texttt{mean} function directly with a \texttt{TIFFStack} or \texttt{TensorStack} object:
\begin{lstlisting}[language=Octave]
avgStack = mean(imStack, 3);
\end{lstlisting}
To estimate X/Y shifts, we will compute the cross-correlation (or more accurately, phase correlation) of individual frames with the reference image.
It is more computationally efficient to do this in Fourier domain and therefore we will first compute FFTs of the reference image and individual frames, before passing them to the accessory function \texttt{dftregister} to compute the shifts.
\lstset{emph={dftregister}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
% cropping the images to avoid the visual stimulation artefacts on the edge
trimPix = 40;
fft_template = fft2(avgStack(:, trimPix+1:end-trimPix));
xyshifts = zeros(2, nt);
for ind = 1:nt
fft_frame = fft2(double(imStack(:, trimPix+1:end-trimPix, ind)));
xyshifts(:, ind) = dftregister(fft_template, fft_frame, []);
end
% correct frame shifts
regStack = zeros(size(imStack), 'uint16');
for ind = 1:nt
regStack(:,:,ind) = shiftframe(imStack(:,:,ind), ...
xyshifts(1,ind), xyshifts(2,ind));
end
regAvg = mean(regStack,3);
\end{lstlisting}
Calling \texttt{imStack(:,:,ind)} will force \texttt{TIFFStack} to load image data from file and the resulting \texttt{regStack} will be a regular MATLAB tensor.
If we wanted to avoid allocating memory for the whole stack, we could instead apply frame shifts just before extracting ROI activity traces.
\begin{exercisebox}[frametitle={Exercise \nexercise: Motion correction}]
Compute the reference image and use the phase correlation algorithm to estimate frame shifts for all frames in the stack.
Plot the resulting shifts to check for registration artefacts.
Finally, compute the registered stack and compare the mean image before and after registration.
\end{exercisebox}
\section{Segmentation}
To proceed, we need to identify regions of interest (ROIs) in the image stack, whose fluorescence we are interested in.
In a 2p imaging experiment, these may be neuronal cell bodies, dendritic spines, or axonal boutons.
In widefield imaging, these may correspond to olfactory glomeruli or entire cortical areas.
Cell bodies and dendrites can de detected using automatic segmentation algorithms although they typically require manual curation.
Today, we will select ROIs manually using a simple GUI implemented by the \texttt{RoiMaker} class.
We'll start \texttt{RoiMaker} by providing it the mean image after registration and the range of gray values over which to scale the data:
\lstset{emph={RoiMaker}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
% select some ROIs using a simple GUI
RoiMaker(regAvg, [offset offset+300]);
\end{lstlisting}
\begin{exercisebox}[frametitle={Exercise \nexercise: Defining ROIs}]
Using \texttt{RoiMaker}, label 30-50 neuronal somata.
\begin{enumerate}
\item Click "Add ROI" to draw an ellipse.
\item Modify ellipse shape and size if desired.
\item Double-click to confirm the ROI.
\item Repeat steps 1--3 to add more ROIs.
\item Hit "Export and exit" to quit - ROIs will be saved to the base workspace.
\end{enumerate}
\end{exercisebox}
\section{Activity extraction}
Once we have identified the ROIs, we need to extract their activity traces.
To efficiently compute mean fluorescence within each ROI, we reshape the stack to flatten X and Y dimensions and use a binary mask to select pixels within each region.
\begin{lstlisting}[language=Octave]
% reshape the stack for easy indexing
regStack = reshape(regStack, nx*ny, nt);
% extract activity traces for each ROI
for ind = 1:numel(rois)
mask = reshape(rois(ind).footprint, nx*ny, 1);
rois(ind).activity = mean(regStack(mask,:)) - offset;
end
\end{lstlisting}
If you examine the extracted activity traces, you may notice that the baseline changes on slow time scales.
These fluctuations can arise from a number of sources, including z-drift, bleaching, or loss of objective immersion.
We can estimate these fluctuations by low-pass filtering the trace or computing a percentile of the fluorescence trace in a sliding window:
\begin{lstlisting}[language=Octave]
% estimate 40th percentile in a running window of 500 frames
rois(ind).drift = running_percentile(rois(ind).activity, 500, 40);
rois(ind).drift = rois(ind).drift' - median(rois(ind).drift);
\end{lstlisting}
In addition to slow fluctuations from sources mentioned above, the measured fluorescence could be influenced by contaminating neuropil fluorescence from the surrounding tissue on faster timescales.
To identify and remove neuropil contamination, we will create donut-shaped masks around each ROI, excluding any annotated regions:
\lstset{emph={makedonuts}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
% make neuropil masks around each cell excluding other labeled cells
rois = makedonuts(rois, 30, 30);
% extract neuropil traces for each ROI
for ind = 1:numel(rois)
mask = reshape(rois(ind).donut, nx*ny, 1);
rois(ind).neuropil = mean(regStack(mask,:)) - offset;
end
\end{lstlisting}
\begin{exercisebox}[frametitle={Exercise \nexercise: Neuropil contamination}]
First let us examine the extracted activity traces by plotting the matrix of fluorescence responses as a colormap.
Next, examine the relationship between fluorescence within an ROI and in the surrounding neuropil. What is the origin of the correlation between them?
\end{exercisebox}
There are several approaches for dealing with neuropil contamination, the most common involve subtracting the surrounding neuropil fluorescence from each ROI with a scale factor determined by linear regression or hand-picked manually. I will describe the method we have adopted in the Mrsic-Flogel lab, developed by Maxime Rio (Figure \ref{fig:neuropil}).
We fit both ROI and surround fluorescence to asymmetric Student-t (ASt) distributions, whose mean was determined by a common neuropil signal contributing to both ROI and surrounding fluorescence:
\begin{align}
f_r(t) &\sim \mathrm{ASt}(\alpha z(t) + \mu_r, \sigma^2) \\
f_n(t) &\sim \mathrm{ASt}(z(t) + \mu_n, \sigma^2 / N) \label{eq:ast_n}\\
z(t) &\sim \mathcal{N}(0, s^2)
\end{align}
Here, $z(t)$ is the time-varying neuropil trace, $\alpha$ is the contamination coefficient (constrained between 0 and 1 for the ROI and fixed to 1 for the surround), and $\sigma^2$ determines the scale of the two distributions.
The factor $N$ was set to 40, the typical ratio of the areas of surround and ROI masks.
The ASt distribution has different degrees of freedom $\nu_1$ and $\nu_2$ for its left and right tails.
We set $\nu_1=30$ and $\nu_2=1$, such that the left tail was approximately Gaussian, while the right tail resembled the Cauchy distribution.
Thus the model allows for large positive but not negative deviations, consistent with the nature of calcium fluorescence signals.
The advantage of this approach over widely used methods, lies in the use of the ASt distribution to model deviations in both ROI and surround signals.
The long right tail of the ASt distribution helps prevent over-estimating the neuropil component for densely active cells.
At the same time, the use of the ASt distribution for the surround signal helps account for transient increases in fluorescence arising from unannotated neurites or cell bodies, which could otherwise result in false negative transients in the corrected trace.
The challenge of fitting this model is that the posterior distributions over model parameters, including the neuropil trace $z(t)$, cannot be computed exactly.
Instead, we will use variational Bayesian methods to approximate them. The neuropil corrected fluorescence trace will then be estimated as the ``noise'' of the ASt model:
\begin{equation}
f(t) = f_r(t)- \alpha z(t)
\end{equation}
\begin{figure}[b]
\centering
\tikz{ %
\node[latent] (sigma_z) {$s$} ; %
\node[latent, right=of sigma_z] (z) {$z(t)$} ; %
\node[obs, above right=of z] (f_r) {$f_r(t)$} ; %
\node[obs, below right=of z] (f_n) {$f_n(t)$} ; %
\node[latent, above right=of f_r] (alpha) {$\alpha$} ; %
\node[latent, below right=of f_r] (sigma_f) {$\sigma$} ; %
\node[latent, right=of f_r] (mu_r) {$\mu_r$} ; %
\node[latent, right=of f_n] (mu_n) {$\mu_n$} ; %
\plate[inner sep=0.25cm, xshift=-0.04cm, yshift=0.12cm] {plate1} {(z) (f_n) (f_r)} {$N_{frames}$}; %
\edge {sigma_z} {z} ; %
\edge {z, sigma_f, mu_n} {f_n} ; %
\edge {z, sigma_f, mu_r} {f_r} ; %
\edge {alpha, z} {f_r} ; %
}
\caption{Probabilistic graphical model for neuropil contamination. By convention, gray nodes indicate observed random variables, while white nodes are unobserved and their posterior distribution has to be inferred from the data.}
\label{fig:neuropil}
\end{figure}
Optimization and correction is carried out using the function \texttt{fit\_ast\_model}, available on \href{https://github.com/BaselLaserMouse/ast\_model}{GitHub}, providing the neuropil and drift-corrected activity traces:
\lstset{emph={fit_ast_model}, emphstyle=\color{red}}
\begin{lstlisting}[language=Octave]
rois(ind).activity_corrected = fit_ast_model( ...
[rois(ind).activity - rois(ind).drift; rois(ind).neuropil],
[1 40]);
\end{lstlisting}
The second argument specifies the relative size of the ROI and donut masks and is used to set the relative scale two ASt distributions ($N$ in Eq. \ref{eq:ast_n}).
In practice, the results are robust to a wide range of settings for these values.
The ASt model is not limited to finding common contaminating signals in two traces.
In principle, we could split the neuropil donut into multiple sectors and provide a trace for each one of them.
\texttt{fit\_ast\_model} can also correct for baseline drift as a part of the neuropil estimation procedure. However, we wont be using this functionality here.
\begin{exercisebox}[frametitle={Exercise \nexercise: Test ASt model on synthetic data}]
Using the code provided in the pipeline script, test the ASt model on synthetic data and compare it to the standard approach of subtracting a scaled version of the donut trace.
What assumptions does the ASt model rely on? Try to modify the generative model used to create synthetic data to make it fail.
\end{exercisebox}
Finally, we compute $\Delta$F/F$_0$ by estimate F$_0$ using the GMM approach we used for the image offset above. This approach assumes that cells are inactive most of the time, care should be taken when interpreting $\Delta$F/F$_0$ values to ensure that the conclusions are robust with respect to estimates of F$_0$:
\begin{lstlisting}[language=Octave]
gmmodel = fitgmdist(rois(ind).activity_corrected', 2, 'Options', options);
rois(ind).f0 = min(gmmodel.mu);
rois(ind).dfof_corrected = (rois(ind).activity_corrected - rois(ind).f0) ...
/ rois(ind).f0;
\end{lstlisting}
\begin{exercisebox}[frametitle={Exercise \nexercise: Neuropil correction}]
Fit the ASt model to ROIs you have selected.
Compare the neuropil trace to the shared component estimated by the model.
Plotting activity across ROIs as a colormap, compare responses before and after neuropil correction.
\end{exercisebox}
\section{End-to-end pipelines}
The approach described above performs segmentation and activity extraction separately.
However, we can take advantage of temporal fluctuations in fluorescence to help segment images.
Intuitively, pixels that consistently go on and off together are likely reflecting activity of the same neuron.
To accomplish this, we recast the problem of segmentation and activity extraction as a fitting latent variable model, where fluorescence in the imaging stack arises as the sum of number of spatially localized sources with different fluorescence time courses.
This idea is at the core of methods such as constrained nonnegative matrix factorization (CNMF).
CNMF expresses the image stack $\mathbf{Y} \in \mathbb { R } ^ { D \times T }$ of $D$ pixels and $T$ frames in the following form:
\begin{equation}
\mathbf{Y} = \mathbf{AC} + \mathbf{BF} + \mathbf{E}.
\label{eq:cnmf}
\end{equation}
Here $\mathbf{A} \in \mathbb { R } ^ { D \times N }$ is a matrix of $N$ cell footprints $\mathbf{A} = [ \mathbf { a } _ { 1 } , \mathbf { a } _ { 2 } , \ldots , \mathbf { a } _ { N } ]$ and $\mathbf{C} \in \mathbb { R } ^ { N \times T }$ is a matrix of activity traces $\mathbf{C} = [ \mathbf { c } _ { 1 } , \mathbf { c } _ { 2 } , \ldots , \mathbf { c } _ { N } ] ^ { \top }$.
$\mathbf{B}$ and $\mathbf{F}$ are matrices of footprints and traces for neuropil components and $\mathbf{E}$ is noise not captured by the model.
To solve Eq. \ref{eq:cnmf} we must apply some constraints: (1) both $\mathbf{A}$ and $\mathbf{C}$ are non-negative; (2) footprints $\mathbf{A}$ are spatially sparse; (3) activity traces $\mathbf{C}$ obey calcium indicator dynamics. We can estimate the unknowns by alternating between solving for spatial components $\mathbf{A}$ and $\mathbf{B}$ and temporal component $\mathbf{C}$ and $\mathbf{F}$, subject to the constraints.
Suite2p takes a conceptually similar approach to define ROI masks, but then extracts activity by averaging the pixels within each mask as we had done above.
\end{document}