-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLeabraAppendix_2022_short.tex
302 lines (245 loc) · 13.7 KB
/
LeabraAppendix_2022_short.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
% this is from https://github.com/ccnlab/hip-edl/papers/hip-edl-2021/hip_edl_2021_sub2.tex
\subsection{Implementational Details}
The model was implemented using the Leabra framework, which is described in detail in these sources: \url{https://github.com/emer/leabra} \citet{OReillyMunakataFrankEtAl12}, \citet{OReillyMunakata00}, and summarized here. These same equations and default parameters have been used to simulate over 40 different models in \citet{OReillyMunakataFrankEtAl12} and \citet{OReillyMunakata00}, and a number of other research models. Thus, the model can be viewed as an instantiation of a systematic modeling framework using standardized mechanisms, instead of constructing new mechanisms for each model.
The basic activation dynamics are based on standard electrophysiological principles of real neurons, as captured by the AdEx (adapting exponential) model of Gerstner and colleagues \citep{BretteGerstner05}, using a rate code approximation that produces a graded activation signal matching the actual instantaneous rate of spiking across a population of AdEx neurons. We generally conceive of a single rate-code neuron as representing a microcolumn of roughly 100 spiking pyramidal neurons in the neocortex.
The excitatory synaptic input conductances (i.e., net input) is computed as an average, not a sum, over connections, based on normalized, sigmoidaly transformed weight values, which are subject to scaling on a projection level to alter relative contributions. Automatic scaling is performed to compensate for differences in expected activity level in the different projections.
Inhibition is computed using a feed-forward (FF) and feed-back (FB) inhibition function (FFFB) that closely approximates the behavior of inhibitory interneurons in the neocortex. FF is based on a multiplicative factor applied to the average net input coming into a layer, and FB is based on a multiplicative factor applied to the average activation within the layer. These simple linear functions do an excellent job of controlling the overall activation levels in bidirectionally connected networks, producing behavior very similar to the more abstract computational implementation of kWTA dynamics implemented in previous versions.
There is a single learning equation, derived from a detailed model of spike timing dependent plasticity (STDP) by \citet{UrakuboHondaFroemkeEtAl08}, that produces a combination of Hebbian associative and error-driven learning. For historical reasons, we call this the XCAL equation (eXtended Contrastive Attractor Learning), and it is functionally very similar to the BCM learning rule developed by \citet{BienenstockCooperMunro82}. The essential learning dynamic involves a Hebbian co-product of sending neuron activation times receiving neuron activation, which biologically reflects the amount of calcium entering through NMDA channels, and this co-product is then compared against a floating threshold value. To produce the Hebbian learning dynamic, this floating threshold is based on a long-term running average of the receiving neuron activation. This is the key idea for the BCM algorithm. To produce error-driven learning, the floating threshold is based on a much faster running average of activation co-products, which reflects an expectation or prediction, against which the instantaneous, later outcome is compared.
Weights are subject to a contrast enhancement function, which compensates for the soft (exponential) weight bounding that keeps weights within the normalized 0-1 range. Contrast enhancement is important for enhancing the selectivity of self-organizing learning, and generally results in faster learning with better overall results. Learning operates on the underlying internal linear weight value. Biologically, we associate the underlying linear weight value with internal synaptic factors such as actin scaffolding, CaMKII phosphorlation level, etc, while the contrast enhancement operates at the level of AMPA receptor expression.
The following shows the main equations used to simulate neural activity and learning (see \url{https://github.com/emer/leabra} in the README.md for complete details and discussion).
\subsubsection{Activation Equations}
\begin{itemize}
\item \texttt{GeRaw\ +=\ Sum\_(recv)\ Prjn.GScale\ *\ Send.Act\ *\ Wt}
\begin{itemize}
\tightlist
\item
\texttt{Prjn.GScale} is the Input Scaling factor that
includes 1/N to compute an average, and the \texttt{WtScaleParams}
\texttt{Abs} absolute scaling and \texttt{Rel} relative scaling,
which allow one to easily modulate the overall strength of
different input projections.
\end{itemize}
\item \texttt{Ge\ +=\ (1/\ DtParams.GTau)\ *\ (GeRaw\ -\ Ge)}
\begin{itemize}
\tightlist
\item
This does a time integration of excitatory conductance,
\texttt{GTau\ =\ 1.4} default for 1 msec default cycle.
\end{itemize}
\item \texttt{ffi\ =\ FFFBParams.FF\ *\ MAX(avgGe\ -\ FFBParams.FF0,\ 0)}
\begin{itemize}
\tightlist
\item
feedforward component of inhibition with FF multiplier (1 by
default) -\/- has FF0 offset and can't be negative (that's what
the MAX(.. ,0) part does).
\item
\texttt{avgGe} is average of Ge variable across relevant Pool of
neurons, depending on what level this is being computed at, and
\texttt{maxGe} is max of Ge across Pool
\end{itemize}
\item \texttt{fbi\ +=\ (1\ /\ FFFBParams.FBTau)\ *\ (FFFBParams.FB\ *\ avgAct\ -\ fbi)}
\begin{itemize}
\tightlist
\item
feedback component of inhibition with FB multiplier (1 by default)
-\/- requires time integration to dampen oscillations that
otherwise occur -\/- FBTau = 1.4 default.
\end{itemize}
\item \texttt{Gi\ =\ FFFBParams.Gi\ *\ (ffi\ +\ fbi)}
\begin{itemize}
\tightlist
\item
total inhibitory conductance, with global Gi multiplier -\/-
default of 1.8 typically produces good sparse distributed
representations in reasonably large layers (25 units or more).
\end{itemize}
\item \texttt{geThr\ =\ (Gi\ *\ (Erev.I\ -\ Thr)\ +\ Gbar.L\ *\ (Erev.L\ -\ Thr)\ /\ (Thr\ -\ Erev.E)}
\item \texttt{nwAct\ =\ NoisyXX1(Ge\ *\ Gbar.E\ -\ geThr)}
\begin{itemize}
\tightlist
\item
geThr = amount of excitatory conductance required to put the
neuron exactly at the firing threshold, \texttt{XX1Params.Thr} =
.5 default, and NoisyXX1 is the x / (x+1) function convolved with
gaussian noise kernel, where x = \texttt{XX1Parms.Gain * (Ge - geThr)} and Gain is 100 by default
\end{itemize}
\item \texttt{Act\ +=\ (1\ /\ DTParams.VmTau)\ *\ (nwAct\ -\ Act)}
\begin{itemize}
\tightlist
\item
time-integration of the activation, using same time constant as Vm
integration (VmTau = 3.3 default)
\end{itemize}
\item \texttt{Vm\ +=\ (1\ /\ DTParams.VmTau)\ *\ Inet}
\item \texttt{Inet\ =\ Ge\ *\ (Erev.E\ -\ Vm)\ +\ Gbar.L\ *\ (Erev.L\ -\ Vm)\ +\ Gi\ *\ (Erev.I\ -\ Vm)\ +\ Noise}
\begin{itemize}
\tightlist
\item
Membrane potential computed from net current via standard RC model
of membrane potential integration. In practice we use normalized
Erev reversal potentials and Gbar max conductances, derived from
biophysical values: Erev.E = 1, .L = 0.3, .I = 0.25, Gbar's are
all 1 except Gbar.L = .2 default.
\end{itemize}
\end{itemize}
\subsubsection{Learning Equations}
\begin{itemize}
\item \texttt{AvgSS\ +=\ (1\ /\ SSTau)\ *\ (Act\ -\ AvgSS)}
\begin{itemize}
\tightlist
\item
super-short time scale running average, SSTau = 2 default, which is first pass of sequence of running-average integrations of activity that drive temporal-difference learning.
\end{itemize}
\item \texttt{AvgS\ +=\ (1\ /\ STau)\ *\ (AvgSS\ -\ AvgS)}
\begin{itemize}
\tightlist
\item
short time scale running average, STau = 2 default -\/- this
represents the \emph{plus phase} or actual outcome signal in
comparison to AvgM
\end{itemize}
\item \texttt{AvgM\ +=\ (1\ /\ MTau)\ *\ (AvgS\ -\ AvgM)}
\begin{itemize}
\tightlist
\item
medium time-scale running average, MTau = 10 -\/- this represents
the \emph{minus phase} or expectation signal in comparison to AvgS
\end{itemize}
\item \texttt{AvgL\ +=\ (1\ /\ Tau)\ *\ (Gain\ *\ AvgM\ -\ AvgL);\ AvgL\ =\ MAX(AvgL,\ Min)}
\begin{itemize}
\tightlist
\item
long-term running average -\/- this is computed just once per
learning trial, \emph{not every cycle} like the ones above -\/-
params on \texttt{AvgLParams}: Tau = 10, Gain = 2.5 (this is a key
param -\/- best value can be lower or higher) Min = .2
\end{itemize}
\item \texttt{AvgLLrn\ =\ ((Max\ -\ Min)\ /\ (Gain\ -\ Min))\ *\ (AvgL\ -\ Min)}
\begin{itemize}
\tightlist
\item
learning strength factor for how much to learn based on AvgL
floating threshold -\/- this is dynamically modulated by strength
of AvgL itself, and this turns out to be critical -\/- the amount
of this learning increases as units are more consistently active
all the time (i.e., "hog" units). Params on \texttt{AvgLParams},
Min = 0.0001, Max = 0.5. Note that this depends on having a clear
max to AvgL, which is an advantage of the exponential
running-average form above.
\end{itemize}
\item \texttt{AvgLLrn\ *=\ MAX(1\ -\ layCosDiffAvg,\ ModMin)}
\begin{itemize}
\tightlist
\item
also modulate by time-averaged cosine (normalized dot product)
between minus and plus phase activation states in given receiving
layer (layCosDiffAvg), (time constant 100) -\/- if error signals
are small in a given layer, then Hebbian learning should also be
relatively weak so that it doesn't overpower it -\/- and
conversely, layers with higher levels of error signals can handle
(and benefit from) more Hebbian learning. The MAX(ModMin) (ModMin
= .01) factor ensures that there is a minimum level of .01 Hebbian
(multiplying the previously-computed factor above). The .01 * .05
factors give an upper-level value of .0005 to use for a fixed
constant AvgLLrn value -\/- just slightly less than this (.0004)
seems to work best if not using these adaptive factors.
\end{itemize}
\item \texttt{AvgSLrn\ =\ (1-LrnM)\ *\ AvgS\ +\ LrnM\ *\ AvgM}
\begin{itemize}
\tightlist
\item
mix in some of the medium-term factor into the short-term factor
-\/- this is important for ensuring that when neuron turns off in
the plus phase (short term), that enough trace of earlier
minus-phase activation remains to drive it into the LTD weight
decrease region -\/- LrnM = .1 default.
\end{itemize}
\item \texttt{srs\ =\ Send.AvgSLrn\ *\ Recv.AvgSLrn}
\item \texttt{srm\ =\ Send.AvgM\ *\ Recv.AvgM}
\item \texttt{dwt\ =\ XCAL(srs,\ srm)\ +\ Recv.AvgLLrn\ *\ XCAL(srs,\ Recv.AvgL)}
\begin{itemize}
\tightlist
\item
weight change is sum of two factors: error-driven based on
medium-term threshold (srm), and BCM Hebbian based on long-term
threshold of the recv unit (Recv.AvgL)
\end{itemize}
\item
XCAL is the "check mark" linearized BCM-style learning function that was derived from the Urakubo Et Al (2008) STDP model, as described in more detail in the CCN Textbook: \url{https://CompCogNeuro.org}
\begin{itemize}
\tightlist
\item
\texttt{XCAL(x,\ th)\ =\ (x\ \textless{}\ DThr)\ ?\ 0\ :\ (x\ \textgreater{}\ th\ *\ DRev)\ ?\ (x\ -\ th)\ :\ (-x\ *\ ((1-DRev)/DRev))}
\item
DThr = 0.0001, DRev = 0.1 defaults, and x ? y : z terminology is C
syntax for: if x is true, then y, else z
\end{itemize}
\item \texttt{DWt\ *=\ (DWt\ \textgreater{}\ 0)\ ?\ Wb.Inc\ *\ (1-LWt)\ :\ Wb.Dec\ *\ LWt}
\begin{itemize}
\tightlist
\item \texttt{LWt} is the linear, non-contrast enhanced version
of the weight value, and \texttt{Wt} is the sigmoidal
contrast-enhanced version, which is used for sending netinput to
other neurons. One can compute LWt from Wt and vice-versa, but
numerical errors can accumulate in going back-and forth more than
necessary, and it is generally faster to just store these two weight
values.
\item
soft weight bounding -\/- weight increases exponentially
decelerate toward upper bound of 1, and decreases toward lower
bound of 0, based on linear, non-contrast enhanced LWt weights.
The \texttt{Wb} factors are how the weight balance term shift the
overall magnitude of weight increases and decreases.
\end{itemize}
\item \texttt{LWt\ +=\ DWt}
\begin{itemize}
\tightlist
\item
increment the linear weights with the bounded DWt term
\end{itemize}
\item\texttt{Wt\ =\ SIG(LWt)}
\begin{itemize}
\tightlist
\item
new weight value is sigmoidal contrast enhanced version of linear
weight
\item
\texttt{SIG(w)\ =\ 1\ /\ (1\ +\ (Off\ *\ (1-w)/w)\^{}Gain)}
\end{itemize}
\item \texttt{DWt\ =\ 0}
\begin{itemize}
\tightlist
\item
reset weight changes now that they have been applied.
\end{itemize}
\end{itemize}
\begin{table}[hbt!]
\begin{tabular}{|l|l|r|r|}
\hline
Area & Param & Value & Default \\
\hline \hline
ECin & Inhib.Pool.Gi & 2 & 1.8 \\
\hline
ECout & Inhib.Pool.Gi & 2 & 1.8 \\
\hline
ECout & CA1ToECout.WtScale.Abs & 4 & 1 \\
\hline
CA1 & Inhib.Pool.Gi & 2.4 & 1.8 \\
\hline
CA1 & CA3ToCA1.CHL.Hebb & 0.01 & 0.001 \\
\hline
DG & Inhib.Layer.Gi & 3.8 & 1.8 \\
\hline
DG & ECinToDG.CHL.Hebb & 0.2 & 0.001 \\
\hline
DG & ECinToDG.CHL.SAvgCor & 0.1 & 0.4 \\
\hline
CA3 & Inhib.Layer.Gi & 2.8 & 1.8 \\
\hline
CA3 & DGToCA3.CHL.Hebb & 0.01 & 0.001 \\
\hline
\end{tabular}
\caption{Non-default parameters used in the model, with default shown. All other params are at default values.}
\label{tab.ndefparams}
\end{table}