-
Notifications
You must be signed in to change notification settings - Fork 2
/
sparse.tex
1209 lines (1133 loc) · 93.4 KB
/
sparse.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4paper, 11pt]{book}
%!TeX TS-program = Lualatex
%!TeX encoding = UTF-8 Unicode
%!TeX spellcheck = en-US
% !BIB TS-program = bibtex
%\documentclass[draft,english]{book}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\AuthorA}{Laurent U.~Perrinet}%
\newcommand{\AddressA}{Institut de Neurosciences de la Timone, CNRS / Aix-Marseille Universit\'e}% - Marseille, France
\newcommand{\LongAddressA}{
Institut de Neurosciences de la Timone (UMR 7289)\\
Aix Marseille Universit\'e, CNRS \\
%Facult\'e de M\'edecine - B\^atiment Neurosciences,
27, Bd Jean Moulin \\
13385 Marseille Cedex 05,
France}
\newcommand{\PhoneA}{+33.491 324 044}%
\newcommand{\Website}{https://laurentperrinet.github.io}%
\newcommand{\Email}{Laurent.Perrinet@univ-amu.fr}%
\newcommand{\Title}{Sparse models for Computer Vision}%
\newcommand{\Abstract}{
The representation of images in the brain is known to be sparse.
That is, as neural activity is recorded in a visual area
---for instance the primary visual cortex of primates---
only a few neurons are active at a given time with respect to the whole population.
It is believed that such a property reflects the efficient match of the representation
with the statistics of natural scenes.
Applying such a paradigm to computer vision
therefore seems a promising approach towards more biomimetic algorithms.
Herein, we will describe a biologically-inspired approach to this problem.
First, we will describe an unsupervised learning paradigm
which is particularly adapted to the efficient coding of image patches.
Then, we will outline a complete multi-scale framework ---SparseLets---
implementing a biologically inspired sparse representation of natural images.
Finally, we will propose novel methods for integrating
prior information into these algorithms and provide some preliminary experimental results.
We will conclude by giving some perspective on applying such algorithms to computer vision.
More specifically, we will propose that bio-inspired approaches
may be applied to computer vision using predictive coding schemes,
sparse models being one simple and efficient instance of such schemes.
}%
\newcommand{\Keywords}{sparse coding; sparse hebbian learning; natural scene statistics; sparselets; lateral connections; association field}%
\newcommand{\Acknowledgments}{%
The author was supported by EC IP project FP7-269921, ``BrainScaleS''.
Correspondence and requests for materials should be addressed to the author\footnote{\Email }.
Code and supplementary material available at \url{\Website/Publications/Perrinet15bicv}. %
}%
\newcommand{\SignificanceStatement}{%
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{siunitx}%
\newcommand{\Hz}{\si{\hertz}}%
\newcommand{\ms}{\si{\milli\second}}%
\newcommand{\meter}{\si{\meter}}%
\newcommand{\s}{\si{\second}}%
%\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\graphicspath{{./figures/}}%
\DeclareGraphicsExtensions{.pdf}
\usepackage{geometry}
\geometry{hscale=0.8,vscale=0.8,centering}
\usepackage[round]{natbib}
%\usepackage{chapterbib}
\bibliographystyle%
%{wivchnum}%for numerical citation and numerically listed entries in the bibliography
%{wivchauy}%for author--year citation and alphabetical order in the bibliography
{unsrtnat}%
%\usepackage{wileyvch}
\usepackage[breaklinks]{hyperref}
\author{Perrinet}\title{\Title}
\usepackage{fancyhdr}
\pagestyle{fancy}
\rhead{\href{https://laurentperrinet.github.io/publication/perrinet-15-bicv}{doi:10.1002/9783527680863.ch14}}
\lhead{Perrinet (2015) \emph{Sparse models}}
%\includeonly{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% OPTIONAL MACRO FILES
\usepackage{tikz}%
\usepackage{tkz-euclide} \usetkzobj{all} % loading all objects
%\usetikzlibrary{positioning} \usetikzlibrary{calc}
%\usetikzlibrary{external}
%\tikzexternalize % activate!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\usepackage[pdftex,colorlinks=true,linkcolor=black]{hyperref}
%\usepackage{sfmath}
%% The AMS math files are commonly used to gain access to useful features
%% like extended math fonts and math commands.
%\usepackage{amssymb,amsfonts,amsmath}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%============ hyperref ===================
%\usepackage[unicode,linkcolor=blue,citecolor=blue,filecolor=black,urlcolor=blue,pdfborder={0 0 0}]{hyperref}%
% symbols used in article
%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\Ss}{\mathcal{A}} % the space vector of coefficients
\newcommand{\Ps}{\mathcal{I}} % the space of natural images
\newcommand{\image}{\mathbf{I}} % the image
\newcommand{\dico}{\Phi} % the dictionary
\newcommand{\SE}{\mathtt{SE}}
\newcommand{\coef}{\mathbf{a}} % image's hidden param
\newcommand{\nb}{\mathbf{n}} % noise
\newcommand{\sh}{\mathbf{\hat{s}}}
\newcommand{\sProj}{\mbox{Proj}}%}\textbf{
\newcommand{\Rr}{{\protect\mathbb R}}
\newcommand\ra{\rightarrow} %
\newcommand\la{\leftarrow} %
\newcommand{\sparsenet}{{\sc SparseNet}}%
\newcommand{\eqdef}{\stackrel{\rm def}{=}}%
\newcommand{\seeFig}[1]{Figure~\ref{fig:#1}}%
\newcommand{\seeSec}[1]{Section~\ref{sec:#1}}%
\newcommand{\seeEq}[1]{Eq.~\ref{eq:#1}}%
\newcommand{\seeAnnex}[1]{Appendix~\ref{ann:#1}}%
%\usepackage{showkeys}
%\bibliographystyle{wivchauy}%
\begin{document}
%\include{foreword}
%\tableofcontents
%\include{preface}
\frontmatter
\mainmatter
\addtocounter{chapter}{13} % set them to some other numbers than 0
%\part{}
%\chapterauthor{\AuthorA}
\chapter{\Title}
%=========================================%
%______________%
%\section*{Abstract}
%\Abstract
%__
\paragraph{BibTex entry}~~\\
This chapter appeared as~\citep{Perrinet15sparse}:
\begin{verbatim}
@inbook{Perrinet15sparse,
Author = {Perrinet, Laurent U.},
Chapter = {Sparse models for Computer Vision},
Citeulike-Article-Id = {13514904},
Editor = {Crist{\'{o}}bal, Gabriel and Keil, Matthias S. and Perrinet, Laurent U.},
month = nov,
chapter = {14},
Priority = {0},
isbn = {9783527680863},
title = {Sparse Models for Computer Vision},
DOI = {10.1002/9783527680863.ch14},
url={https://laurentperrinet.github.io/publication/perrinet-15-bicv},
publisher = {Wiley-VCH Verlag GmbH {\&} Co. KGaA},
booktitle = {Biologically inspired computer vision},
Year = {2015}
}
\end{verbatim}
\tableofcontents
\section{Motivation}
%----------------------------------------------------%
\subsection{Efficiency and sparseness in biological representations of natural images}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
%%------------------------------%
%%: See Figure~\ref{fig:inspiration}
%\begin{figure}%[ht!]%[p!]
%\centering{
%\begin{tikzpicture}%
%\draw (0,0) rectangle (.618\textwidth, .618\textwidth);
%\draw (.618\textwidth, .236\textwidth) rectangle (.382\textwidth, .382\textwidth);
%\draw (.618\textwidth, 0) rectangle (.382\textwidth, .236\textwidth);
%\end{tikzpicture}
%%\includegraphics[width=1.\columnwidth]{inspiration}%
%}
%\caption{ {\bf Sparseness in natural image and in neural activity} (A) TODO : Include an image that is interpretable using sparse principle (occam razor) as a gestalt rule (B) TODO : any image showing sparse activity in V1? Baudot13 (C) sparseness for a single patch
%\label{fig:inspiration}}%
%\end{figure}%
%%------------------------------%
% \showthe\columnwidth
% bio: fundamental question = why sparseness in V1 ?
The central nervous system is a dynamical, adaptive organ
which constantly evolves to provide optimal decisions\footnote{Decisions are defined in their broader sense of elementary choices operated in the system at associative or motor levels~\citep{Friston12}.} for interacting with the environment.
The early visual pathways provides with a powerful system for probing and modeling these mechanisms.
For instance, the primary visual cortex of primates (V1) is absolutely central for most visual tasks.
There, it is observed that some neurons from the input layer of V1 present a selectivity for localized,
edge-like features ---as represented by their ``receptive fields''~\citep{Hubel68}.
Crucially, there is experimental evidence for \emph{sparse} firing in the neocortex~\citep{Barth12,Willmore11}
and in particular in V1.
A representation is sparse when each input signal is associated
with a relatively small sub-set of simultaneously activated neurons within a whole population.
For instance, orientation selectivity of simple cells is sharper
than the selectivity that would be predicted by linear filtering.
Such a procedure produces a rough ``sketch'' of the image
on the surface of V1 that is believed to serve as a ``blackboard''
for higher-level cortical areas~\citep{Marr83}.
However, it is still largely unknown how neural computations act in V1 to represent the image.
More specifically, what is the role of sparseness ---as a generic neural signature---
in the global function of neural computations?
% efficient: information theory, redundancy / lateral interactions / natural
A popular view is that such a population of neurons operates
such that relevant sensory information from the retino-thalamic pathway
is transformed (or ``coded'') efficiently.
Such efficient representation will allow decisions
to be taken optimally in higher-level areas.
In this framework, optimality is defined in terms of information theory~\citep{Attneave54,Atick92,Wolfe10}.
For instance, the representation produced by the neural activity in V1 is sparse:
It is believed that this reduces redundancies
and allows to better segregate edges in the image~\citep{Field94,Froudarakis14}.
% Population code in mouse V1 facilitates readout of natural scenes through increased sparseness Froudarakis14
This optimization is operated given biological constraints,
such as the limited bandwidth of information transfer to higher processing stages
or the limited amount of metabolic resources (energy or wiring length).
More generally it allows to increase the storage capacity of associative memories
before memory patterns start to interfere with each other~\citep{Palm13}.
Moreover, it is now widely accepted that this redundancy reduction
is achieved in a neural population through lateral interactions.
Indeed, a link between anatomical data and a functional connectivity
between neighboring representations of edges has been found~\citep{Bosking97},
though their conclusions were more recently refined to show that this process may be more complex~\citep{Hunt11}.
By linking neighboring neurons representing similar features, one allows thus a more efficient representation in V1.
As computer vision systems are subject to similar constraints,
applying such a paradigm therefore seems a promising approach towards more biomimetic algorithms.
% meaningful => sparse / more generally predictive coding - link with Gestalt Wertheimer23 => / CV
It is believed that such a property reflects the efficient match of the representation
with the statistics of natural scenes, that is, to behaviorally relevant sensory inputs.
Indeed, sparse representations are prominently observed for cortical responses to natural stimuli~\citep{Field87,Vinje00,Deweese03,Baudot13}.
As the function of neural systems mostly emerges from unsupervised learning,
it follows that these are adapted to the input which are behaviorally the most common and important.
More generally, by being adapted to natural scenes, this shows that
sparseness is a neural signature of an underlying optimization process.
In fact, one goal of neural computation in low-level sensory areas such as V1
is to provide relevant predictions~\citep{Rao99,Spratling11}.
This is crucial for living beings as they are often confronted with
noise (internal to the brain or external, such as in low light conditions),
ambiguities (such as inferring a three dimensional slant from a bi-dimensional retinal image).
Also, the system has to compensate for inevitable delays, such as the delay from light stimulation to activation in V1 which is estimated to be of 50~\ms\ in humans.
For instance, a tennis ball moving at 20~\si{\meter\per\second} at one meter in the frontal plane
elicits an input activation in V1 corresponding to around 45\si{\degree} of visual angle
behind its physical position~\citep{PerrinetAdamsFriston14}.
Thus, to be able to translate such knowledge to the computer vision community,
it is crucial to better understand \emph{why} the neural processes that produce sparse coding are efficient.
%
\subsection{Sparseness induces neural organization}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
% breakthrough = learning
A breakthrough in the modeling of the representation in V1 was
the discovery that sparseness is sufficient to induce
the emergence of receptive fields similar to V1 simple cells~\citep{Olshausen96}.
This reflects the fact that, at the learning time scale,
coding is optimized relative to the statistics of natural scenes
such that independent components of the input are represented~\citep{Olshausen97,Bell97}.
The emergence of edge-like simple cell receptive fields in the input layer of area V1 of primates
may thus be considered as a coupled coding and learning optimization problem:
At the coding time scale, the sparseness of the representation is optimized for any given input
while at the learning time scale, synaptic weights are tuned to achieve on average
an optimal representation efficiency over natural scenes.
%%%
This theory has allowed to connect the different fields by providing a link between
information theory models, neuromimetic models and physiological observations. %
% existing models: sparse hebbian learning
In practice, most sparse unsupervised learning models aim at optimizing
a cost defined on prior assumptions on the sparseness of the representation.
These sparse learning algorithms have been applied both for images~\citep{Fyfe95,Olshausen96,Zibulevsky01,Perrinet03ieee,Rehn07,Doi07,Perrinet10shl}
and sounds~\citep{Lewicki00,Smith06}.
Sparse coding may also be relevant to the amount of energy the brain needs to use to sustain its function.
The total neural activity generated in a brain area is
inversely related to the sparseness of the code, therefore the total energy consumption decreases with increasing sparseness.
As a matter of fact the probability distribution functions of neural activity observed experimentally can be approximated by so-called exponential distributions, % (in bits/spike)
which have the property of maximizing information transmission for a given mean level of activity~\citep{Baddeley97}.
To solve such constraints, some models thus directly compute a sparseness cost based on the representation's distribution.
For instance, the kurtosis corresponds to the 4th statistical moment (the first three moments being in order the mean, variance and skewness)
and measures how the statistics deviates from a Gaussian:
A positive kurtosis measures if this distribution has an ``heavier tail'' than a Gaussian for a similar variance --- and thus corresponds to a sparser distribution. Based on such observations, other similar statistical measures of sparseness have been derived in the neuroscience literature~\citep{Vinje00}. % and corresponds to a variant of the above-mentioned sparseness costs.
A more general approach is to derive a representation cost.
For instance, learning is accomplished in the \sparsenet\ algorithmic framework~\citep{Olshausen97}
on image patches taken from natural images as a sequence of coding and learning steps.
First, sparse coding is achieved using a gradient descent over a convex cost.
We will see later in this chapter how this cost is derived from a prior on the probability distribution function of the coefficients and how it favors the sparseness of the representation.
At this step, the coding is performed using the current state of the ``dictionary'' of receptive fields.
Then, knowing this sparse solution,
learning is defined as slowly changing the dictionary using Hebbian learning~\citep{Hebb49}.
As we will see later, the parameterization of the prior
has a major impact on the results of the sparse coding and
thus on the emergence of edge-like receptive fields and requires proper tuning.
Yet, this class of models provides a simple solution to the problem of sparse representation in V1.
% neural signatures
However, these models are quite abstract and assume that neural computations may estimate some rather complex measures such as gradients - a problem that may also be faced by neuromorphic systems.
Efficient, realistic implementations have been proposed which show that imposing sparseness
may indeed guide neural organization in neural network models, see for instance~\citep{Zylberberg11,Hunt13}.
Additionally, it has also been shown that in a neuromorphic model,
an efficient coding hypothesis links sparsity and selectivity of neural responses~\citep{Blattler11}.
More generally, such neural signatures are reminiscent of the shaping of neural activity to account for contextual influences.
For instance, it is observed that ---depending on the context outside the receptive field of a neuron in area V1--- the tuning curve may demonstrate a modulation of its orientation selectivity.
This was accounted for instance as a way to optimize the coding efficiency of a population of neighboring neurons~\citep{Series04}.
As such, sparseness is a relevant neural signature for a large class of neural computations implementing efficient coding.
%
\subsection{Outline: Sparse models for Computer Vision}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
% sparse models: other approaches
As a consequence, sparse models provide a fruitful approach for computer vision.
It should be noted that other popular approaches for taking advantage of sparse representations exist.
The most popular is compressed sensing~\citep{Ganguli12},
for which it has been proven that ---assuming sparseness in the input,
it is possible to reconstruct the input from a sparse choice
of linear coefficients computed from randomly drawn basis functions.
Note also that some studies also focus on temporal sparseness.
Indeed, by computing for a given neuron
the relative numbers of active events relative to a given time window,
one computes the so-called lifetime sparseness (see for instance~\citep{Willmore11}).
We will see below that this measure may be related to population sparseness.
For a review of sparse modeling approaches, we refer to~\citep{Elad10}.
Herein, we will focus on the particular sub-set of such models based on their biological relevance.
% intelligent machines? Neuroscience is beginning to inspire a new generation of seeing machines.
Indeed, we will rather focus on biomimetic sparse models as tools to shape future computer vision algorithms~\citep{Benoit10,Serre10}.
In particular, we will not review models which mimic neural activity,
but rather on algorithms which mimic their efficiency,
bearing in mind the constraints that are linked to neural systems (no central clock, internal noise, parallel processing, metabolic cost, wiring length).
For that purpose, we will complement some previous studies~\citep{Perrinet03ieee,Fischer07,Perrinet08spie,Perrinet10shl} (for a review see~\citep{Perrinet07})
by putting these results in light of most recent theoretical and physiological findings.
% outline for a CV audience
This chapter is organized as follows.
First, in Section~\ref{sec:sparsenet} we will outline how we may implement the unsupervised learning algorithm at a local scale for image patches.
Then we will extend in Section~\ref{sec:sparselet} such an approach to full scale natural images by defining the \emph{SparseLets} framework.
Such formalism will then be extended in Section~\ref{sec:sparseedges} to include context modulation, for instance from higher-order areas.
These different algorithms (from the local scale of image patches to more global scales)
will each be accompanied by a supporting implementation (with the source code)
for which we will show example usage and results.
We will in particular highlight novel results and then draw some conclusions on the perspective of sparse models for computer vision.
More specifically, we will propose that bio-inspired approaches
may be applied to computer vision using predictive coding schemes,
sparse models being one simple and efficient instance of such schemes.
%
\section{What is sparseness? Application to image patches}
%--------------------------------------------------------%
\label{sec:sparsenet}
\subsection{Definitions of sparseness}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
%from a measure of efficiency...
In low-level sensory areas, the goal of neural computations is to generate efficient intermediate \emph{representations} as we have seen that this allows more efficient decision making.
Classically, a representation is defined as the inversion of an internal generative model of the sensory world,
that is, by inferring the sources that generated the input signal.
Formally, as in~\citep{Olshausen97}, we define a Generative Linear Model (GLM)
for describing natural, static, grayscale image patches $\image$ (represented by column vectors of dimension $L$ pixels),
by setting a ``dictionary'' of $M$ images (also called ``atoms'' or ``filters'') as the $L \times M$ matrix $\dico=\{ \dico_i\}_{1\leq i \leq M}$.
Knowing the associated ``sources'' as a vector of coefficients $\coef=\{ a_i \}_{1\leq i \leq M}$, the image is defined using matrix notation as a sum of weighted atoms:%
\begin{equation}%
\image = \dico\coef + \mathbf{n}%
\label{eq:lgm}%
\end{equation} %
where $\mathbf{n}$ is a Gaussian additive noise image. This noise, as in~\citep{Olshausen97}, is scaled to a variance of $\sigma_n^2$ to achieve decorrelation by applying Principal Component Analysis to the raw input images, without loss of generality since this preprocessing is invertible. Generally, the dictionary $\dico$ may be much larger than the dimension of the input space (that is, if $M \gg L$) and it is then said to be \emph{over-complete}. However, given an over-complete dictionary, the inversion of the GLM leads to a combinatorial search and typically, there may exist many coding solutions $\coef$ to \seeEq{lgm} for one given input $\image$. The goal of efficient coding is to find, given the dictionary $\dico$ and for any observed signal $\image$, the ``best'' representation vector, that is, as close as possible to the sources that generated the signal. Assuming that for simplicity, each individual coefficient is represented in the neural activity of a single neuron, this would justify the fact that this activity is sparse. It is therefore necessary to define an efficiency criterion in order to choose between these different solutions. % (see Figure~\ref{fig:inspiration}-B)
% ... to sparseness
Using the GLM, we will infer the ``best'' coding vector as the most probable. In particular, from the physics of the synthesis of natural images, we know \emph{a priori} that image representations are sparse: They are most likely generated by a small number of features relatively to the dimension $M$ of the representation space. % (see Figure~\ref{fig:inspiration}-C).
Similarly to \citet{Lewicki00}, this can be formalized in the probabilistic framework defined by the GLM (see~\seeEq{lgm}), by assuming that knowing the prior distribution of the coefficients $a_i$ for natural images, the representation cost of $\coef$ for one given natural image is:%
\begin{eqnarray}%
\mathcal{C}( \coef | \image , \dico) &\eqdef& -\log P( \coef | \image , \dico)
= \log P( \image ) -\log P( \image | \coef , \dico) - \log P(\coef | \dico) \nonumber \\
&=& \log P( \image ) + \frac{1}{2\sigma_n^2} \| \image - \dico \coef \|^2 - \sum_i \log P(a_i | \dico)%
\label{eq:efficiency_cost}%
\end{eqnarray}%
where $P( \image )$ is the partition function which is independent of the coding (and that we thus ignore in the following) and $ \| \cdot \|$ is the L$_2$-norm in image space. This efficiency cost is measured in bits if the logarithm is of base 2, as we will assume without loss of generality thereafter. For any representation $\coef$, the cost value corresponds to the description length~\citep{Rissanen78}: On the right hand side of~\seeEq{efficiency_cost}, the second term corresponds to the information from the image which is not coded by the representation (reconstruction cost) and thus to the information that can be at best encoded using entropic coding pixel by pixel (that is, the negative log-likelihood $-\log P( \image | \coef , \dico)$ in Bayesian terminology, see chapter \verb+009_series+ for Bayesian models applied to computer vision). The third term $S( \coef | \dico) = - \sum_i \log P(a_i | \dico)$ is the representation or sparseness cost: It quantifies representation efficiency as the coding length of each coefficient of $\coef$ which would be achieved by entropic coding knowing the prior and assuming that they are independent.
The rightmost penalty term (see Equation~\ref{eq:efficiency_cost}) gives thus a definition of sparseness $S( \coef | \dico)$ as the sum of the log prior of coefficients.
In practice, the sparseness of coefficients for natural images is often defined by an \emph{ad hoc} parameterization of the shape of the prior. For instance, the parameterization in~\citet{Olshausen97} yields the coding cost:%
\begin{equation}
\mathcal{C}_1( \coef | \image , \dico) =\frac{1}{2\sigma_n^2} \| \image - \dico \coef \|^2 + \beta \sum_i \log ( 1 + \frac{a_i^2}{\sigma^2} )%
\label{eq:sparse_cost}%
\end{equation}
where $\beta$ corresponds to the steepness of the prior and $\sigma$ to its scaling (see Figure 13.2 from~\citep{Olshausen02}). This choice is often favored because it results in a convex cost for which known numerical optimization methods such as conjugate gradient may be used. In particular, these terms may be put in parallel to regularization terms that are used in computer vision. For instance, a L2-norm penalty term corresponds to Tikhonov regularization~\citep{Tikhonov77} or a L1-norm term corresponds to the Lasso method.
See chapter \verb+003_holly_gerhard+ for a review of possible parametrization of this norm,
for instance by using nested $L_p$ norms.
Classical implementation of sparse coding rely therefore on a parametric measure of sparseness.
% $P(s_i)\approx cste$ and $P(s_i=0)\approx 1$ leads to L0
Let's now derive another measure of sparseness.
Indeed, a non-parametric form of sparseness cost may be defined by considering that neurons representing the vector $\coef$ are either active or inactive.
In fact, the spiking nature of neural information demonstrates that the transition from an inactive to an active state is far more significant at the coding time scale than smooth changes of the firing rate. This is for instance perfectly illustrated by the binary nature of the neural code in the auditory cortex of rats~\citep{Deweese03}. Binary codes also emerge as optimal neural codes for rapid signal transmission~\citep{Bethge03,Nikitin09}. This is also relevant for neuromorphic systems which transmit discrete events (such as a network packet). With a binary event-based code, the cost is only incremented when a new neuron gets active, regardless to the analog value. Stating that an active neuron carries a bounded amount of information of $\lambda$ bits, an upper bound for the representation cost of neural activity on the receiver end is proportional to the count of active neurons, that is, to the $\ell_0$ pseudo-norm $\| \coef \|_0$:%
\begin{equation}%
\mathcal{C}_0( \coef | \image , \dico) = \frac{1}{2\sigma_n^2} \| \image - \dico \coef \|^2 + \lambda \| \coef \|_0%
\label{eq:L0_cost}%
\end{equation}%
This cost is similar with information criteria such as the Akaike Information Criteria~\citep{Akaike74} or distortion rate~\citep[p.~571]{Mallat98}. This simple non-parametric cost has the advantage of being dynamic: The number of active cells for one given signal grows in time with the number of spikes reaching the target population. But \seeEq{L0_cost} defines a harder cost to optimize (in comparison to Equation~\ref{eq:sparse_cost} for instance) since the hard $\ell_0$ pseudo-norm sparseness leads to a non-convex optimization problem which is \emph{NP-complete} with respect to the dimension $M$ of the dictionary~\citep[p.~418]{Mallat98}.%
\subsection{Learning to be sparse: the SparseNet algorithm}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
We have seen above that we may define different models for measuring sparseness depending on our prior assumption on the distribution of coefficients.
Note first that, assuming that the statistics are stationary (more generally ergodic), then these measures of sparseness across a population should necessary imply a lifetime sparseness for any neuron.
Such a property is essential to extend results from electro-physiology.
Indeed, it is easier to record a restricted number of cells
than a full population (see for instance~\citep{Willmore11}).
However, the main property in terms of efficiency is that the representation should be sparse at any given time,
that is, in our setting, at the presentation of each novel image.
Now that we have defined sparseness, how could we use it to induce neural organization?
Indeed, given a sparse coding strategy that optimizes any representation efficiency cost as defined above, we may derive an unsupervised learning model by optimizing the dictionary $\dico$ over natural scenes. On the one hand, the flexibility in the definition of the sparseness cost leads to a wide variety of proposed \emph{sparse coding} solutions (for a review, see~\citep{Pece02}) such as numerical optimization~\citep{Olshausen97}, non-negative matrix factorization~\citep{Lee99,Ranzato07} or Matching Pursuit~\citep{Perrinet03ieee,Smith06,Rehn07,Perrinet10shl}. They are all derived from correlation-based inhibition since this is necessary to remove redundancies from the linear representation. This is consistent with the observation that lateral interactions are necessary for the formation of elongated receptive fields~\citep{Bolz89,Wolfe10}.
On the other hand, these methods share the same GLM model (see~\seeEq{lgm}) and once the sparse coding algorithm is chosen, the learning scheme is similar.
As a consequence, after every coding sweep, we increased the efficiency of the dictionary $\dico$ with respect to \seeEq{efficiency_cost}. This is achieved using the online gradient descent approach given the current sparse solution, $\forall i$:
\begin{equation}%
\dico_{i} \la \dico_{i} + \eta \cdot a_{i} \cdot (\image - \dico\coef)%
\label{eq:learn}%
\end{equation}%
where $\eta$ is the learning rate. %
Similarly to Eq.~17 in~\citep{Olshausen97} or to Eq.~2 in~\citep{Smith06}, the relation is a linear ``Hebbian'' rule~\citep{Hebb49} since it enhances the weight of neurons proportionally to the correlation between pre- and post-synaptic neurons. Note that there is no learning for non-activated coefficients. The novelty of this formulation compared to other linear Hebbian learning rule such as~\citep{Oja82} is to take advantage of the sparse representation, hence the name Sparse Hebbian Learning (SHL).%
The class of SHL algorithms are unstable without homeostasis, that is, without a process that maintains the system in a certain equilibrium. In fact, starting with a random dictionary, the first filters to learn are more likely to correspond to salient features~\citep{Perrinet03ieee} and are therefore more likely to be selected again in subsequent learning steps. In {\sc SparseNet}, the homeostatic gain control is implemented by adaptively tuning the norm of the filters. This method equalizes the variance of coefficients across neurons using a geometric stochastic learning rule. The underlying heuristic is that this introduces a bias in the choice of the active coefficients. In fact, if a neuron is not selected often, the geometric homeostasis will decrease the norm of the corresponding filter, and therefore ---from~\seeEq{lgm} and the conjugate gradient optimization--- this will increase the value of the associated scalar. Finally, since the prior functions defined in~\seeEq{sparse_cost} are identical for all neurons, this will increase the relative probability that the neuron is selected with a higher relative value. The parameters of this homeostatic rule have a great importance for the convergence of the global algorithm. In~\citep{Perrinet10shl}, we have derived a more general homeostasis mechanism derived from the optimization of the representation efficiency through histogram equalization which we will describe later (see Section~\ref{sec:laughlin}).
%------------------------------%
%: See Figure~\ref{fig:sparsenet}
\begin{figure}%[ht!]%[p!]
\centering{%
\begin{tikzpicture}%[scale=1, font=\sffamily]%,every node/.style={minimum size=1cm},on grid]
\draw [anchor=north west] (.33\textwidth, 0) node {\includegraphics[width=.33\textwidth]{sparsenet_map_ssc}};
\draw [anchor=north west] (.66\textwidth, 0) node {\includegraphics[width=.33\textwidth]{sparsenet_efficiency-b}};
\draw [anchor=north west] (.99\textwidth, 0) node {\includegraphics[width=.3275\textwidth]{sparsenet_efficiency-c}};
\draw [anchor=north west] (.33\textwidth, .03\textwidth) node {$\mathsf{(A)}$};
\draw [anchor=north west] (.66\textwidth, .03\textwidth) node {$\mathsf{(B)}$};
\draw [anchor=north west] (.99\textwidth, .03\textwidth) node {$\mathsf{(C)}$};
\end{tikzpicture}%
}
\caption{ {\bf Learning a sparse code using Sparse Hebbian Learning.} \textsf{(A)}~We show the results at convergence (20000 learning steps) of a sparse model with unsupervised learning algorithm which progressively optimizes the relative number of active (non-zero) coefficients ($\ell_0$ pseudo-norm)~\citep{Perrinet10shl}. Filters of the same size as the image patches are presented in a matrix (separated by a black border). Note that their position in the matrix is arbitrary as in ICA. These results show that sparseness induces the emergence of edge-like receptive fields similar to those observed in the primary visual area of primates. \textsf{(B)}~We show the probability distribution function of sparse coefficients obtained by our method compared to~\citep{Olshausen96} with first, random dictionaries (respectively 'ssc-init' and 'cg-init') and second, with the dictionaries obtained after convergence of respective learning schemes (respectively 'ssc' and 'cg'). At convergence, sparse coefficients are more sparsely distributed than initially, with more kurtotic probability distribution functions for {\sf 'ssc'} in both cases, as can be seen in the ``longer tails''of the distribution. \textsf{(C)}~We evaluate the coding efficiency of both methods by plotting the average residual error (L$_2$ norm) as a function of the $\ell_0$ pseudo-norm. This provides a measure of the coding efficiency for each dictionary over the set of image patches (error bars represent one standard deviation). Best results are those providing a lower error for a given sparsity (better compression) or a lower sparseness for the same error. % (similarly to Occam's razor). %
\label{fig:sparsenet}}%
\end{figure}%
%------------------------------%
\subsection{Results: efficiency of different learning strategies}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
The different SHL algorithms simply differ by the coding step.
This implies that they only differ by first, how sparseness is defined at a functional level and second, how the inverse problem corresponding to the coding step is solved at the algorithmic level.
Most of the schemes cited above use a less strict, parametric definition of sparseness (like the convex L$_1$-norm), but for which a mathematical formulation of the optimization problem exists.
Few studies such as~\citep{Liu14,Peharz12} use the stricter $\ell_0$ pseudo-norm as the coding problem gets more difficult.
A thorough comparison of these different strategies was recently presented in~\citep{Charles12}.
See also~\citep{Aharon06} for properties of the coding solutions to the $\ell_0$ pseudo-norm.
Similarly, in~\citep{Perrinet10shl}, we preferred to retrieve an approximate solution to the coding problem to have a better match with the measure of efficiency~\seeEq{L0_cost}. % (see Section~\ref{sec:matchingpursuit} for a description of the algorithm)
Such an algorithmic framework is implemented in the SHL-scripts package\footnote{These scripts are available at~\url{https://github.com/bicv/SHL_scripts} and documented at~\url{https://pythonhosted.org/SHL_scripts}.}.
These scripts allow the retrieval of the database of natural images and the replication of the results of~\citep{Perrinet10shl} reported in this section. With a correct tuning of parameters, we observed that different coding schemes show qualitatively a similar emergence of edge-like filters. The specific coding algorithm used to obtain this sparseness appears to be of secondary importance as long as it is adapted to the data and yields sufficiently efficient sparse representation vectors. However, resulting dictionaries vary qualitatively among these schemes and it was unclear which algorithm is the most efficient and what was the individual role of the different mechanisms that constitute SHL schemes. At the learning level, we have shown that the homeostasis mechanism had a great influence on the qualitative distribution of learned filters~\citep{Perrinet10shl}.
Results are shown in Figure~\ref{fig:sparsenet}.
This figure represents the qualitative results of the formation of edge-like filters (receptive fields).
More importantly, it shows the quantitative results as the average decrease of the squared error as a function of the sparseness.
This gives a direct access to the cost as computed in Equation~\ref{eq:L0_cost}.
These results are comparable with the \sparsenet\ algorithm.
Moreover, this solution,
by giving direct access to the atoms (filters) that are chosen,
provides with a more direct tool to manipulate sparse components.
One further advantage consists in the fact that this unsupervised learning model is non-parametric (compare with~\seeEq{sparse_cost})
and thus does not need to be parametrically tuned.
Results show the role of homeostasis on the unsupervised algorithm.
In particular, using the comparison of coding and decoding efficiency with and without this specific homeostasis,
we have proven that cooperative homeostasis optimized overall representation efficiency (see also Section~\ref{sec:laughlin}).
It is at this point important to note that in this algorithm,
we achieve an exponential convergence of the squared error~\citep[p.~422]{Mallat98},
but also that this curve can be directly derived from the coefficients' values.
Indeed, for $N$ coefficients (that is, $\| \coef \|_0 = N$), we have the squared error equal to:
\begin{equation}%
E_N \eqdef \| \image - \dico\coef \| ^2 / \| \image \| ^2 = 1 - \sum_{1\leq k\leq N} a_{k}^2 / \| \image \| ^2%
\label{eq:error}%
\end{equation}%
As a consequence, the sparser the distributions of coefficients, then quicker is the decrease of the residual energy.
%Note that the observed distribution of coefficients follow a power-law. This was already observed in~\citep{Perrinet03ieee}. This power-law (``scale-free'') distribution is defined by
%\begin{equation}%
%\log p(a) \propto - \gamma \log a_{k}%
%\label{eq:powerlaw}%
%\end{equation}%
%The value of $\gamma$ quantifies therefore the strength of the sparseness in the signal.
% TODO: discuss this? or say it will be discussed below
In the following section, we will describe different variations of this algorithm. To compare their respective efficiency,
we will plot the decrease of the coefficients along with the decrease of the residual's energy. % as a measure of sparseness.
Using such tools, we will now explore if such a property extends to full-scale images and not only to image patches,
an important condition for using sparse models in computer vision algorithms.
% and we will explore this aspect in the following
%
\section{SparseLets: a multi scale, sparse, biologically inspired representation of natural images}
%-------------------------------------------------------------------------------------------------%
\label{sec:sparselet}
\subsection{Motivation: architecture of the primary visual cortex}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
% sparse models for CV / sparse models for image patches / extend it to full images = more sparseness / but complex : let's focus on coding - V1 inspired
Our goal here is to build practical algorithms of sparse coding for computer vision.
We have seen above that it is possible to build an adaptive model of sparse coding
that we applied to $12 \times 12$ image patches.
Invariably, this has shown that the independent components of image patches are edge-like filters,
such as is found in simple cells of V1.
This model has shown that for randomly chosen image patches,
these may be described by a sparse vector of coefficients.
Extending this result to full-field natural images,
we can expect that this sparseness would increase by a degree of order.
In fact, except in a densely cluttered image such as a close-up of a texture,
natural images tend to have wide areas which are void (such as the sky, walls or uniformly filled areas).
However, applying directly the \sparsenet\ algorithm to full-field images is impossible in practice
as its computer simulation would require too much memory to store the over-complete set of filters.
However, it is still possible to define \emph{a priori} these filters and herein,
we will focus on a full-field sparse coding method
whose filters are inspired by the architecture of the primary visual cortex.
% see http://www.csse.uwa.edu.au/~pk/research/matlabfns/PhaseCongruency/Docs/convexpl.html
The first step of our method involves defining the dictionary of templates (or
filters) for detecting edges.
We use a log-Gabor representation, which is well suited to represent
a wide range of natural images~\citep{Fischer07}.
This representation gives a generic model of edges parameterized by their shape,
orientation, and scale. We set the range of these parameters to match with what has been reported
for simple-cell responses in macaque primary visual cortex (V1).
Indeed log-Gabor filters are similar to standard Gabors and both are well fitted to V1 simple cells~\citep{Daugman80}.
Log-Gabors are known to produce a sparse set of linear coefficients~\citep{Field99}.
Like Gabors, these filters are defined by Gaussians in Fourier space,
but their specificity is that log-Gabors have Gaussians envelopes in log-polar frequency space.
This is consistent with physiological measurements which indicate that V1 cell responses are symmetric on the log frequency scale.
They have multiple advantages over Gaussians:
In particular, they have no DC component,
and more generally, their envelopes more broadly cover the frequency space~\citep{Fischer07cv}.
In this chapter, we set the bandwidth of the Fourier representation of the filters
to $0.4$ and $\pi/8$ respectively in the log-frequency and polar coordinates
to get a family of relatively elongated (and thus selective) filters
(see~\citet{Fischer07cv} and Figure~\ref{fig:loggabor}-A for examples of such edges).
Prior to the analysis of each image, we used the spectral whitening filter
described by~\citet{Olshausen97} to provide
a good balance of the energy of output coefficients~\citep{Perrinet03ieee,Fischer07}.
Such a representation is implemented in the \verb+LogGabor+ package
\footnote{Python scripts are available at \url{https://github.com/bicv/LogGabor}
and documented at \url{https://pythonhosted.org/LogGabor}.}.
%see also hansen and hess 06 for the use of such filters
This transform is linear and can be performed by a simple convolution
repeated for every edge type.
Following~\citet{Fischer07cv}, convolutions were performed in the Fourier (frequency) domain
for computational efficiency.
The Fourier transform allows for a convenient definition of the edge filter characteristics,
and convolution in the spatial domain is equivalent to a simple multiplication in the frequency domain.
By multiplying the envelope of the filter and the Fourier transform of the image,
one may obtain a filtered spectral image that may be converted to
a filtered spatial image using the inverse Fourier transform.
We exploited the fact that by omitting the symmetrical lobe of the envelope of the filter
in the frequency domain,
we obtain quadrature filters.
Indeed, the output of this procedure gives a complex number whose
real part corresponds to the response to the symmetrical part of the edge,
while the imaginary part corresponds to the asymmetrical part of the edge
(see~\citet{Fischer07cv} for more details).
More generally, the modulus of this complex number gives the energy response
to the edge ---as can be compared to the response of complex cells in area V1,
while its argument gives the exact phase of the filter (from symmetric to non-symmetric).
This property further expands the richness of the representation.
Given a filter at a given orientation and scale,
a linear convolution model provides a translation-invariant representation.
Such invariance can be extended to rotations and scalings
by choosing to multiplex these sets of filters at different orientations and spatial scales.
Ideally, the parameters of edges would vary in a continuous fashion,
to a full relative translation, rotation, and scale invariance.
However this is difficult to achieve in practice and some compromise has to be found.
Indeed, though orthogonal representations are popular in computer vision
due to their computational tractability,
it is desirable in our context that we have a relatively high over-completeness
in the representation to achieve this invariance.
For a given set of $256\times 256$ images, we first chose to have $8$ dyadic levels (that is, by doubling the scale at each level)
with $24$ different orientations.
Orientations are measured as a non-oriented angle in radians, by convention
in the range from $-\frac{\pi}{2}$ to $\frac{\pi}{2}$ (but not including $-\frac{\pi}{2}$) with respect to the $x-$axis.
Finally, each image is transformed into a pyramid of coefficients.
This pyramid consists of approximately $4/3\times256^{2}\approx8.7\times10^4$ pixels
multiplexed on $8$ scales and $24$ orientations, that is,
approximately $16.7\times10^6$ coefficients, an over-completeness factor of about $256$.
This linear transformation is represented by a pyramid of coefficient, see Figure~\ref{fig:loggabor}-B.
%
%------------------------------%
%: See Figure~\ref{fig:loggabor}
\begin{figure}
\centering{%
\begin{tikzpicture}%[scale=1, font=\sffamily]%,every node/.style={minimum size=1cm},on grid]
\node [anchor=north west] (b) at (0, .382\textwidth) {\includegraphics[width=.382\textwidth]{fig_log_gabor_filters_A.jpg}};
\node [anchor=north west] (gp) at (.382\textwidth, .38\textwidth) {\includegraphics[width=.605\textwidth]{fig_log_gabor_filters_B.jpg}};
\draw [anchor=south west] (0, .382\textwidth) node {$\mathsf{(A)}$};
\draw [anchor=south west] (.382\textwidth, .382\textwidth) node {$\mathsf{(B)}$};
\end{tikzpicture}
}
\caption{ {\bf The log-Gabor pyramid}
\textsf{(A)}~A set of log-Gabor filters showing first in different rows different orientations and second in different columns different phases.
Here we have only shown one scale.
Note the similarity with Gabor filters.
\textsf{(B)}~Using this set of filter, one can define a linear representation that is rotation-, scaling- and translation- invariant.
Here we show a tiling of the different scales according to a Golden Pyramid~\citep{Perrinet08spie}.
The hue gives the orientation while the value gives the absolute value (white denotes a low coefficient).
Note the redundancy of the linear representation, for instance at different scales.
\label{fig:loggabor}}%
\end{figure}%
%------------------------------%
\subsection{The SparseLets framework}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
\label{sec:matchingpursuit}
% over-completeness / combinatorial / bio-inspired = matching pursuit
The resulting dictionary of edge filters is over-complete.
The linear representation would thus give a dense, relatively inefficient representation of the distribution of edges, see Figure~\ref{fig:loggabor}-B.
Therefore, starting from this linear representation,
we searched instead for the most sparse representation.
As we saw above in Section~\ref{sec:sparsenet}, minimizing the $\ell_0$ pseudo-norm (the number of non-zero coefficients) leads
to an expensive combinatorial search with regard to the dimension of the dictionary (it is NP-hard).
As proposed by~\citet{Perrinet03ieee}, we may approximate a
solution to this problem using a greedy approach.
Such an approach is based on the physiology of V1.
Indeed, it has been shown that inhibitory interneurons decorrelate excitatory cells to drive sparse code formation~\citep{Bolz89,King13}.
We use this local architecture to iteratively modify the linear representation~\citep{Fischer07}.
In general, a greedy approach is applied when the optimal combination is
difficult to solve globally, but can be solved progressively,
one element at a time.
Applied to our problem, the greedy approach corresponds to first choosing
the single filter $\Phi_i$ that best fits the image along with a suitable coefficient $a_i$,
such that the single source $a_i\Phi_i$ is a good match to the image.
Examining every filter $\Phi_j$, we find the filter $\Phi_i$
with the maximal correlation coefficient (``Matching'' step), where:
\begin{equation}
i = \mbox{argmax}_j \left( \left\langle \frac{\mathbf{I}}{\| \mathbf{I} \|} , \frac{
\Phi_j}{\| \Phi_j\|} \right\rangle \right),
\label{eq:coco}
\end{equation}
$\langle \cdot,\cdot \rangle$ represents the inner product, and $\| \cdot \|$
represents the $L_2$ (Euclidean) norm.
The index (``address'') $i$ gives the position ($x$ and $y$), scale and orientation of the edge.
We saw above that since filters at a given scale and orientation are generated by a translation,
this operation can be efficiently computed using a convolution,
but we keep this notation for its generality.
The associated coefficient is the scalar projection:
\begin{equation}
a_{i} = \left\langle \mathbf{I} , \frac{ \Phi_{i}}{\| \Phi_{i}\|^2} \right\rangle
\label{eq:proj}
\end{equation}
Second, knowing this choice, the image can be
decomposed as
\begin{equation}
\mathbf{I} = a_{i} \Phi_{i} + \bf{R}
\label{eq:mp0} \end{equation}
where $\bf{R}$ is the residual image (``Pursuit'' step).
We then repeat this 2-step process on the residual (that is, with $\mathbf{I} \leftarrow \bf{R}$)
until some stopping criterion is met.
Note also that the norm of the filters has no influence in this algorithm
on the matching step or on the reconstruction error.
For simplicity and without loss of generality,
we will thereafter set the norm of the filters to $1$: $\forall j, \| \Phi_j \| =1$ (that is, that the spectral energy sums to 1).
Globally, this procedure gives us a sequential algorithm for reconstructing the signal
using the list of sources (filters with coefficients), which greedily optimizes the $\ell_0$ pseudo-norm
(i.e., achieves a relatively sparse representation given the stopping criterion).
The procedure is known as the Matching Pursuit (MP) algorithm~\citep{Mallat93},
which has been shown to generate good approximations for natural images~\citep{Perrinet03ieee,Perrinet10shl}.
We have included two minor improvements over this method:
First, we took advantage of the response of the filters as complex numbers.
As stated above, the modulus gives a response independent of the phase of the filter,
and this value was used to estimate the best match of the residual image
with the possible dictionary of filters (Matching step).
Then, the phase was extracted as the argument of the corresponding coefficient
and used to feed back onto the image in the Pursuit step.
This modification allows for a phase-independent detection of edges,
and therefore for a richer set of configurations,
while preserving the precision of the representation.
%alpha MP (COMP not needed we are at the end of learning scale invariance)
Second, we used a ``smooth'' Pursuit step.
In the original form of the Matching Pursuit algorithm,
the projection of the Matching coefficient is fully removed from the image,
which allows for the optimal decrease of the energy of the residual
and allows for the quickest convergence of the algorithm
with respect to the $\ell_0$ pseudo-norm
(i.e., it rapidly achieves a sparse reconstruction with low error).
However, this efficiency comes at a cost,
because the algorithm may result in non-optimal representations
due to choosing edges sequentially and not globally.
This is often a problem when edges are aligned (e.g. on a smooth contour),
as the different parts will be removed independently, potentially leading
to a residual with gaps in the line.
Our goal here is not necessarily to get the fastest decrease of energy,
but rather to provide with the best representation of edges along contours.
We therefore used a more conservative approach,
removing only a fraction (denoted by $\alpha$)
of the energy at each pursuit step (for MP, $\alpha=1$).
Note that in that case, Equation~\ref{eq:error} has to be modified to account for the $\alpha$ parameter:
\begin{equation}%
E_N = 1 - \alpha\cdot(2-\alpha)\cdot \sum_{1\leq k\leq N} a_{k}^2 / \| \image \| ^2%
\label{eq:error_alpha}%
\end{equation}%
We found that $\alpha=0.8$ was a good compromise between rapidity and smoothness.
One consequence of using $\alpha<1$ is that, when removing energy along contours,
edges can overlap; even so, the correlation is invariably reduced.
Higher and smaller values of $\alpha$ were also tested,
and gave representation results similar to those presented here.
In summary, the whole coding algorithm is given by the following nested loops
in pseudo-code:
\begin{enumerate}
\item draw a signal $\mathbf{I}$ from the database; its energy is $E = \| \mathbf{I} \|^2$,
\item initialize sparse vector $\mathbf{s}$ to zero and linear coefficients $\forall j, {a}_j=<\mathbf{I}, \Phi_j >$,
\item while the residual energy $E = \| \mathbf{I} \|^2$ is above a given threshold do:
\begin{enumerate}
\item select the best match: $i = \mbox{ArgMax}_{j} | {a}_j |$, where $| \cdot |$ denotes the modulus,
\item increment the sparse coefficient: $s_{i} = s_{i} + \alpha \cdot {a}_{i}$,
\item update residual image: $ \mathbf{I} \leftarrow \mathbf{I} - \alpha \cdot a_{i} \cdot \Phi_{i} $,
\item update residual coefficients: $\forall j, {a}_j \leftarrow {a}_j - \alpha \cdot a_{i} <\Phi_{i} , \Phi_j > $,
\end{enumerate}
\item the final set of non-zero values of the sparse representation vector
$\mathbf{s}$ constitutes the list of edges representing the
image as the list of couples $\pi_i = (i, s_{i})$, where $i$ represents an edge occurrence
as represented by its position, orientation and scale and $s_i$ the complex-valued sparse coefficient.
\end{enumerate}
%: TODO say we have lots of controls: %more N, more \theta, smooth vs hard
This class of algorithms gives a generic and efficient representation of edges,
as illustrated by the example in Figure~\ref{fig:SparseLets}-A.
We also verified that the dictionary used here is better adapted
to the extraction of edges than Gabors~\citep{Fischer07}.
The performance of the algorithm can be measured quantitatively
by reconstructing the image from the list of extracted edges.
All simulations were performed using \verb+Python+ (version 2.7.8)
with packages \verb+NumPy+ (version 1.8.1) and \verb+SciPy+ (version 0.14.0)~\citep{Oliphant07}
on a cluster of Linux computing nodes.
Visualization was performed using \verb+Matplotlib+ (version 1.3.1)~\citep{Hunter07}.
\footnote{These python scripts are available at \url{https://github.com/bicv/SparseEdges} and documented at \url{https://pythonhosted.org/SparseEdges}.}
%Python 2.7.8 (default, Jul 2 2014, 10:14:46) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
%IPython 2.1.0
%OS posix [darwin]
%numpy 1.8.1
%scipy 0.14.0
%matplotlib 1.3.1
\subsection{Efficiency of the SparseLets framework}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
%------------------------------%
%: See Figure~\ref{fig:SparseLets}
\begin{figure}[ht!]%[p!]
\centering{
\begin{tikzpicture}
\draw [anchor=north west] (0, 0.) node {\includegraphics[width=.382\textwidth]{SparseLets_A.pdf}};
%\draw[black, dashed, thick] (.404\textwidth/2, -.404\textwidth/2) circle (.382\textwidth/2);
\draw [anchor=north west] (.382\textwidth, 0.) node {\includegraphics[height=.382\textwidth]{SparseLets_B}};
\draw (0, 0.) node [above right=0mm] {$\mathsf{(A)}$};
\draw (.382\textwidth, 0.) node [above right=0mm] {$\mathsf{(B)}$};
\end{tikzpicture}}
\caption{ {\bf SparseLets framework.} \textsf{(A)}~ An example reconstructed image with the list of extracted edges overlaid.
As in~\citet{Geisler01}, edges outside the circle are discarded to avoid artifacts.
Parameters for each edge are the position, , orientation, scale (length of bar) and scalar amplitude (transparency) with the phase (hue).
We controlled the quality of the reconstruction from the edge information such
that the residual energy is less than $3\%$ over the whole set of images,
a criterion met on average when identifying $2048$ edges per image for images of size $256\times 256$ (that is, a relative sparseness of $\approx 0.01\%$ of activated coefficients).
\textsf{(B)}~Efficiency for different image sizes as measured by the decrease of the residual's energy as a function of the coding cost
%ratio of active coefficients
(relative $\ell_0$ pseudo-norm).
\textsf{(B,~inset)}~%Similarly to Figure~\ref{fig:sparsenet}, the decrease of coefficients follows a power-law.
%We compared the strength of this power-law on a image-per-image basis (as shown by the error bars) as a function of the image size.
This shows that as the size of images increases, sparseness increases,
validating quantitatively our intuition on the sparse positioning of objects in natural images.
Note, that the improvement is non significant for a size superior to $128$.
The SparseLets framework thus shows that sparse models can be extended to full-scale natural images, and that increasing the size improves sparse models by a degree of order (compare a size of $16$ with that of $256$).
\label{fig:SparseLets}}%
\end{figure}%
%------------------------------%
% vanilla version RMSE always achieves 5% / power-law
Figure~\ref{fig:SparseLets}-A, shows the list of edges extracted on a sample image. It fits qualitatively well with a rough sketch of the image.
To evaluate the algorithm quantitatively,
we measured the ratio of extracted energy in the images as a function of the number of edges
on a database of 600 natural images of size $256\times 256$\footnote{This database is publicly available at \url{http://cbcl.mit.edu/software-datasets/serre/SerreOlivaPoggioPNAS07}.}, see Figure~\ref{fig:SparseLets}-B.
Measuring the ratio of extracted energy in the images, $N=2048$ edges were
enough to extract an average of approximately $97\%$ of the energy of images in the database. %
%A number $N=1024$ of edges were
%enough to extract an average of $95\%$ of the energy of $256\times 256$
%images on the set of images. %
To compare the strength of the sparseness in these full-scale images
compared to the image patches discussed above (see Section~\ref{sec:sparsenet}),
we measured the sparseness values obtained in images of different sizes.
To be comparable, we measured the efficiency with respect to the relative $\ell_0$ pseudo-norm in bits per unit of image surface (pixel):
This is defined as the ratio of active coefficients times the numbers of bits required to code for each coefficient (that is, $\log_2(M)$, where $M$ is total number of coefficients in the representation) over the size of the image.
For different image and framework sizes, the lower this ratio, the higher the sparseness.
As shown in Figure~\ref{fig:SparseLets}-B,
we indeed see that sparseness increases relative to an increase in image size.
This reflects the fact that sparseness is not only local (few edges coexist at one place)
but is also spatial (edges are clustered, and most regions are empty).
Such a behavior is also observed in V1 of monkeys
as the size of the stimulation is increased from a stimulation over only the classical receptive field
to 4 times around it~\citep{Vinje00}.
Note that by definition, our representation of edges
is invariant to translations, scalings, and rotations in the plane of the image.
We also performed the same edge extraction where images from the database
were perturbed by adding independent Gaussian noise to each pixel
such that signal-to-noise ratio was halved.
Qualitative results are degraded but qualitatively similar.
In particular, edge extraction in the presence of noise may result in false positives.
Quantitatively, one observes that the representation is slightly less sparse.
This confirms our intuition that sparseness is causally linked to the efficient extraction of edges in the image.
To examine the robustness of the framework and of sparse models in general,
we examined how results changed when changing parameters for the algorithm.
In particular, we investigated the effect of filter parameters $B_{f}$ and $B_{\theta}$.
We also investigated how the over-completeness factor
could influence the result. We manipulated the number of discretization steps along the spatial frequency axis $N_f$ (that is, the number of layers in the pyramid) and orientation axis $N_\theta$.
Results are summarized in Figure~\ref{fig:efficiency}
and show that an optimal efficiency is achieved for certain values of these parameters.
These optimal values are in the order of what is found for the range of selectivities observed in V1.
Note that these values may change across categories.
Further experiments should provide
with an adaptation mechanism
to allow finding the best parameters in an unsupervised manner. %/ compromise
%\showthe\columnwidth
%------------------------------%
%: See Figure~\ref{fig:efficiency}
\begin{figure}%[ht!]%[p!]
\centering{%
\includegraphics[width=.618\textwidth]{efficiency}
%\begin{tikzpicture}
%%\draw [anchor=north west] (0, .618\textwidth) node {\includegraphics[width=.5\textwidth,height=.309\textwidth]{efficiency_A}};
%%\draw [anchor=north west] (.5\textwidth, .618\textwidth) node {\includegraphics[width=.5\textwidth,height=.309\textwidth]{efficiency_B}};
%%\draw [anchor=north west] (0, .309\textwidth) node {\includegraphics[width=.5\textwidth,height=.309\textwidth]{efficiency_C}};
%%\draw [anchor=north west] (.5\textwidth, .309\textwidth) node {\includegraphics[width=.5\textwidth,height=.309\textwidth]{efficiency_D}};
%\draw [anchor=south west] (-.2, 0) node {\includegraphics{efficiency}};
%
%\draw (0, \textwidth) node [above right=0mm] {$\mathsf{(A)}$};
%\draw (.5\textwidth, \textwidth) node [above right=0mm] {$\mathsf{(B)}$};
%\draw (0, .5\textwidth) node [above right=0mm] {$\mathsf{(C)}$};
%\draw (.5\textwidth, .5\textwidth) node [above right=0mm] {$\mathsf{(D)}$};
%\end{tikzpicture}
}
\caption{ {\bf Effect of filters' parameters on the efficiency of the {SparseLets} framework } As we tested different parameters for the filters, we measured the gain in efficiency for the algorithm as the ratio of the code length to achieve 85\% of energy extraction relative to that for the default parameters (white bar). The average is computed on the same database of natural images and error bars denote the standard deviation of gain over the database. First, we studied the effect of the bandwidth of filters respectively in the \textsf{(A)}~spatial frequency and \textsf{(B)}~orientation spaces. The minimum is reached for the default parameters: this shows that default parameters provide an optimal compromise between the precision of filters in the frequency and position domains for this database. We may also compare pyramids with different number of filters. Indeed from Equation~\ref{eq:L0_cost}, efficiency (in bits) is equal to the number of selected filters times the coding cost for the address of each edge in the pyramid.
We plot here the average gain in efficiency which shows an optimal compromise respectively for respectively \textsf{(C)}~the number of orientations and \textsf{(D)}~the number of spatial frequencies (scales). Note first that with more than 12 directions, the gain remains stable. Note also that a dyadic scale ratio (that is of 2) is efficient but that other solutions ---such as using the golden section $\phi$--- prove to be significantly more efficient, though the average gain is relatively small (inferior to 5\%).
\label{fig:efficiency}}%
\end{figure}%
%------------------------------%
These particular results illustrate the potential of sparse models in computer vision.
Indeed, one main advantage of these methods is to explicitly represent edges.
A direct application of sparse models is the ability of the representation to reconstruct these images and therefore to use it for compression~\citep{Perrinet03ieee}.
Other possible applications are image filtering or edge manipulation for texture synthesis or denoising~\citep{Portilla00}.
Recent advances have shown that such representations could be used for the classification of natural images (see chapter \verb+013_theriault+ or for instance~\citep{PerrinetBednar15}) or of medical images of emphysema~\citep{Nava13}.
Classification was also used in a sparse model
for the quantification of artistic style through sparse coding analysis
in the drawings of Pieter Bruegel the Elder~\citep{Hughes10}.
These examples illustrate the different applications of sparse representations and in the following we will illustrate some potential perspectives to further improve their representation efficiency.
%
\section{SparseEdges: introducing prior information}
%-------------------------------------------------%
\label{sec:sparseedges}
\subsection{Using the prior in first-order statistics of edges}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
\label{sec:laughlin}
% neural code and equalization: observations
In natural images, it has been observed that edges follow some statistical regularities that may be used by the visual system.
We will first focus on the most obvious regularity which consists in the anisotropic distribution of orientations in natural images (see chapter \verb+003_holly_gerhard+ for another qualitative characterization of this anisotropy).
Indeed, it has been observed that orientations corresponding to cardinals (that is, to verticals and horizontals) are more likely than other orientations~\citep{Ganguli10,Girshick11}.
This is due to the fact that our point of view is most likely pointing toward the horizon while we stand upright. In addition, gravity shaped our surrounding world around horizontals (mainly the ground) and verticals (such as trees or buildings). Psychophysically, this prior knowledge gives rise to the oblique effect~\citep{Keil00}.
This is even more striking in images of human scenes (such as a street, or inside a building) as humans mainly build their environment (houses, furnitures) around these cardinal axes. However, we assumed in the cost defined above (see~\seeEq{efficiency_cost}) that each coefficient is independently distributed.
It is believed that an homeostasis mechanism allows one to optimize this cost knowing this prior information~\citep{Laughlin81,Perrinet10shl}.
Basically, the solution is to put more filters where there are more orientations~\citep{Ganguli10} such that coefficients are uniformly distributed.
In fact, since neural activity in the assembly actually represents the sparse coefficients, we may understand the role of homeostasis as maximizing the average representation cost $\mathcal{C}( \coef | \dico)$. This is equivalent to saying that homeostasis should act such that at any time, and invariantly to the selectivity of features in the dictionary, the probability of selecting one feature is uniform across the dictionary.
This optimal uniformity may be achieved in all generality by using an equalization of the histogram~\citep{Atick92}. This method may be easily derived if we know the probability distribution function $dP_{i}$ of variable $a_i$ (see~\seeFig{laughlin}-A) by choosing a non-linearity as the cumulative distribution function (see~\seeFig{laughlin}-B) transforming any observed variable $\bar{a}_i$ into:
\begin{equation}
z_i(\bar{a}_i)= P_{i}(a_i \leq \bar{a}_i) = \int_{-\infty}^{\bar{a}_i} dP_{i}(a_i)%
\label{eq:laughlin}%
\end{equation}
This is equivalent to the change of variables which transforms the sparse vector $\coef$ to a variable with uniform probability distribution function in $[0, 1]^M$ (see~\seeFig{laughlin}-C).
This equalization process has been observed in the neural activity of a variety of species and is, for instance, perfectly illustrated in the compound eye of the fly's neural response to different levels of contrast~\citep{Laughlin81}. It may evolve dynamically to slowly adapt to varying changes, for instance to luminance or contrast values, such as when the light diminishes at twilight.
Then, we use these point non-linearities $z_i$ to sample orientation space in an optimal fashion (see~\seeFig{laughlin}-D). %
%------------------------------%
%: See Figure~\ref{fig:laughlin}
\begin{figure}[ht!]%[p!]
\centering{%
\begin{tikzpicture}
\draw [anchor=north west] (0, .618\textwidth) node {\includegraphics[width=1.\columnwidth]{laughlin}};
\draw (.0\textwidth, .55\textwidth) node [above right=0mm] {$\mathsf{(B)}$};
\draw (.49\textwidth, .55\textwidth) node [above right=0mm] {$\mathsf{(C)}$};
\draw (.0\textwidth, .3\textwidth) node [above right=0mm] {$\mathsf{(A)}$};
\draw (.49\textwidth, .3\textwidth) node [above right=0mm] {$\mathsf{(D)}$};
\end{tikzpicture}}
\caption{ {\bf Histogram equalization} From the edges extracted in the images from the natural scenes database, we computed sequentially (clockwise, from the bottom left): \textsf{(A)}~the histogram and \textsf{(B)}~cumulative histogram of edge orientations. This shows that as was reported previously (see for instance~\citep{Girshick11}), cardinals axis are over-represented. This represents a relative inefficiency as the representation in the SparseLets framework represents a priori orientations in an uniform manner. A neuromorphic solution is to use histogram equalization, as was first shown in the fly's compound eye by~\citep{Laughlin81}. \textsf{(C)}~We draw a uniform set of scores on the y-axis of the cumulative function (black horizontal lines), for which we select the corresponding orientations (red vertical lines). Note that by convention these are wrapped up to fit in the $( -\pi/2, \pi/2]$ range. \textsf{(D)}~This new set of orientations is defined such that they are \emph{a priori} selected uniformly. Such transformation was found to well describe a range of psychological observations~\citep{Ganguli10} and we will now apply it to our framework.
\label{fig:laughlin}}%
\end{figure}%
%------------------------------%
%------------------------------%
%: See Figure~\ref{fig:firstorder}
\begin{figure}%[ht!]%[p!]
\centering{
\begin{tikzpicture}
\node [anchor=south west] (a) at (0, 0.) {\includegraphics[width=.333\textwidth]{firstorder_A}};
\node [anchor=south west] (a) at (.333\textwidth, 0.) {\includegraphics[width=.333\textwidth]{firstorder_B}};
\node [anchor=south west] (b) at (.666\textwidth, 0.) {\includegraphics[width=.333\textwidth]{firstorder_C}};
\draw [anchor=south west] (0, .333\textwidth) node {$\mathsf{(A)}$};
\draw [anchor=south west] (.333\textwidth, .333\textwidth) node {$\mathsf{(B)}$};
\draw [anchor=south west] (.666\textwidth, .333\textwidth) node {$\mathsf{(C
)}$};
\end{tikzpicture}}
\caption{ {\bf Including prior knowledge on the orientation of edges in natural images.} \textsf{(A)}~Statistics of orientations in the database of natural images as shown by a polar bar histogram (the surface of wedges is proportional to occurrences). As for Figure~\ref{fig:laughlin}-A, this shows that orientations around cardinals are more likely than others (the dotted red line shows the average occurrence rate). \textsf{(B)}~ We show the histogram with the new set of orientations that has been used. Each bin is selected approximately uniformly. Note the variable width of bins.
\textsf{(C)}~We compare the efficiency of the modified algorithm where the sampling is optimized thanks to histogram equalization described in Figure~\ref{fig:laughlin} as the average residual energy with respect to the number of edges. % (in the inset, we similarly plot sorted coefficients).
This shows that introducing a prior information on the distribution of orientations in the algorithm may also introduce a slight but insignificant improvement in the sparseness. % (of the order of $5\%$) and therefore in the efficiency of the representation.
\label{fig:firstorder}}%
\end{figure}%
%------------------------------%
This simple non-parametric homeostatic method is applicable to the SparseLets algorithm by simply using the transformed sampling of the orientation space.
It is important to note that the MP algorithm is non-linear and the choice of one element at any step may influence the rest of the choices.
%Results show that by using this prior knowledge, one achieves a better coding of images.
%Indeed, by looking at the qualitative results, we could see that edges that are more common are more finely detected.
In particular, while orientations around cardinals are more prevalent in natural images (see Figure~\ref{fig:firstorder}-A), the output histogram of detected edges is uniform (see Figure~\ref{fig:firstorder}-B). %and these were detected with a better precision.
To quantify the gain in efficiency, we measured the residual energy in the SparseLets framework with or without including this prior knowledge. Results show that for a similar number of extracted edges, residual energy is not significantly changed (see Figure~\ref{fig:firstorder}-C).
This is again due to the exponential convergence of the squared error~\citep[p.~422]{Mallat98} on the space spanned by the representation basis.
As the tiling of the Fourier space by the set of filters is complete,
one is assured of the convergence of the representation in both cases.
%This indicates that in general, representation is more sparse and therefore that it better reflects the physical representation of the synthesis of images.
%Moreover, this measures only accounts for the residual energy as a function of the number of coefficients.
However thanks to the use of first-order statistics, the orientation of edges are distributed such as to maximize the entropy, further improving the efficiency of the representation.
This novel improvement to the SparseLets algorithm illustrates the flexibility of the Matching Pursuit framework. %, and one would expect that the residuals that are extracted are of lower energy and therefore that the decrease is slower.
This proves that by introducing the prior on first-order statistics, one improves the efficiency of the model for this class of natural images.
Of course, this gain is only valid for natural images and would disappear for images where cardinals would not dominate. This is the case for images of close-ups (microscopy) or where gravity is not prevalent such as aerial views.
Moreover, this is obviously just a first step as there is more information from natural images that could be taken into account.
% a note on frquentist probabilities computsd feom natural omages and bayesian statistics computzd to make infernces on beliefso
% todo check biblio of david field :-
% - 87 stats
% 93 path paradigm 94 what is the goal
% 99 log gabor
%Field, D. J., Hayes, A. & Hess, R. F. 1993 Contour Integration by the Human Visualsystem: Evidence for a Local "Association Field". Vision Research 33, 173-193. Field, D. J., Hayes, A. & Hess, R. F. 1997 The role of phase and contrast polarity in contour integration. Investigative Ophthalmology & Visual Science 38, 4643-4643.
\subsection{Using the prior statistics of edge co-occurences}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
%observations -Geisler01 Sigman01- quantifying assofield
A natural extension of the previous result is to study the co-occurrences of edges in natural images.
Indeed, images are not simply built from independent edges at arbitrary positions and orientations but tend to be organized along smooth contours that follow for instance the shape of objects.
In particular, it has been shown that contours are more likely to be organized along co-circular shapes~\citep{Sigman01}.
This reflects the fact that in nature, round objects are more likely to appear than random shapes.
Such a statistical property of images seems to be used by the visual system as it is observed that edge information is integrated on a local "association field" favoring co-linear or co-circular edges (see chapter \verb+013_theriault+ section 5 for more details and a mathematical description).
In V1 for instance, neurons coding for neighboring positions are organized in a similar fashion. We have previously seen that statistically, neurons coding for collinear edges seem to be anatomically connected~\citep{Bosking97,Hunt11} while rare events (such as perpendicular occurrences) are functionally inhibited~\citep{Hunt11}.
%------------------------------%
%: See Figure~\ref{fig:secondorder}
\begin{figure}%[ht!]%[p!]
\centering{
\begin{tikzpicture}[font=\sffamily]
% (C) edge schematics
\begin{scope}[scale=.75, yshift=2.cm, xshift=-1.cm]
%\fill[white,fill opacity=.9] (0, 0) rectangle (9, 9);
% \tkzInit[xmax=12, ymax=6]
%----------------------------------------------------------
% Defining coordinates
%----------------------------------------------------------
\tkzDefPoint(2.95, 2.25){A}
\tkzDefPoint(8, 5.25){B}
\tkzLabelPoints[above left](B,A)
\tkzDefPoint(7.5, 2.25){C}
% draw red dots at the center of edges
\tkzDrawPoints[size=10, color=red, fill=red](A,B)
%----------------------------------------------------------
% Drawing the lines and segments
%----------------------------------------------------------
\tkzDrawLine[color=red,line width=2pt, add=-1.15 and -.15 ](C,A)
\tkzDrawLine[color=red,line width=2pt, add=-1.3 and -.3 ](C,B)
\tkzDrawLine(A,B)
\tkzDrawLines[dashed](A,C B,C)
% drawing arcs for angles
\tkzMarkAngle[size=2.5,mkpos=.2](C,A,B)
\tkzLabelAngle[pos=3.5,circle](C,A,B){$\mathsf{\phi}$}
\tkzDefPointWith[linear,K=1.5](A,C)
\tkzGetPoint{D}
\tkzDefPointWith[linear,K=.75](B,C)
\tkzGetPoint{E}
\tkzMarkAngle[size=1,mkpos=.2](D,C,E)
\tkzLabelAngle[pos=1.75,circle](D,C,E){$\mathsf{\theta}$}
%----------------------------------------------------------
% Drawing normals
%----------------------------------------------------------
\tkzDefLine[perpendicular=through A, K=-.75](C,A)
\tkzGetPoint{a1}
\tkzDefLine[perpendicular=through B, K=1.5](C,B)
\tkzGetPoint{b1}
\tkzInterLL(a1,A)(b1,B) \tkzGetPoint{H}
\tkzMarkRightAngle[size=.5](H,A,C)
\tkzMarkRightAngle[size=.5](H,B,C)
\tkzDrawLines[dashed,dash phase=1.5pt](a1,A)
\tkzDrawLines[dashed,dash phase=0.5pt](b1,B)
%----------------------------------------------------------
% Drawing mediator and psi line