forked from njullien/wikipedia-lit-review
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path041-contributorsactivity.tex
239 lines (216 loc) · 13.7 KB
/
041-contributorsactivity.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
This part is a perfect illustration of the too kind of studies found
on Wikipedia. There are, actually, few studies looking at the contributors
as deeply as \citeauthor{Sundin11}'s one (2011), which presents the
day-to-day life of Swedish Wikipedia editors and underlines the importance
of the tool (Mediawiki) and of the basic rules structuring the tasks
(vandal fighting, verification of sources, improvement of sourcing...)
On the other side of the spectrum, \citet{AnthonySmithWilliamson09}
proposed a quite macroscopic, but also more comprehensive point of
view: they separated the contributors into two groups (registered
and non registered), and analyzed their contribution for the whole
English Wikipedia. If they couldn't infer much about the number of
people in each groups, they stressed the importance of those anonymous,
as they represent, for instance, 20\% of the contributions that remain
in the Spanish Wikipedia \citep{DruckMiklauMcCallum08}. In other
words, if most of the best contributions in terms of quality is done
by registered users and by a small subset of the whole contributors,
a significant number of anonymous users also do provide quality content
\citep{Javanmardietal09}.
% Activity and roles
As explained before, we will first look at the studies relying on
the data provided by the project, thus giving a global and comprehensive
view of the participants, or at least of the \textquotedblright authors\textquotedblright ,
defined as the people registered in Wikipedia and having done a contribution,
because their registration makes it possible to follow their activity
(thanks to the data stored in the MediaWiki table \textit{user\_groups}).
Regarding the activity of each of these registered authors, it has
been shown that the number of article per authors follows a power
law \citep{Voss05}, like in open source and in scientific publication
(ibid and \citealp{Maillartetal08,ArafatRiehle09} regarding open
source), something known as the Lotka's law (ibid), as does the number
of contributions per person, in all the main language projects \citep{Kitturetal07b,OrtegaGonzalez-BarahonaRobles08,Ortega09,Javanmardietal09,Zhangetal10}.
However, it seems that the percentage of contribution coming from
the users having privileges (administrators of Wikipedia) which are
the biggest contributors, is decreasing with the age of the project
\citep{Kitturetal07b,Ortega07,Ortegaetal09}. In the other hand, their
contributions dominate what people see when visiting Wikipedia \citep{Priedhorskyetal07}:
''The top 10\% of editors by number of edits contributed 86\% of
the PWVs {[}persistent word views{]}, and top 0.1\% contributed 44\%
\textendash{} nearly half! The domination of these very top contributors
is increasing over time.'' (p. 5) \citet{LaniadoTasso11}, completed
this point, using English Wikipedia's dump data, finding evidence
of ''the presence of a nucleus of very active contributors, who seem
to spread over the whole wiki, and to interact preferentially with
inexperienced users''.
This apparent paradox is easy to understand: as Wikipedia, and especially
the English language project, became bigger, the editing tasks have
increased in complexity (see \citealp{FongBiuk-Aghai10} for a proposition
of classification in terms of semantic complexity of these various
type of edits), and have increased also the proportion of non-editing
tasks. In other words, participants' types of activity have multiplied.
Behind the writing, which can be seen as the emerged part of the iceberg,
but also the most important part, for an encyclopedia, are the actions
leading to the writing (coordination tasks, discussions on the topic
of the project, etc.)
Regarding the edits, \citet{Adleretal08a,DruckMiklauMcCallum08} may
be the ones who proposed the more complex evaluation of authors' editing
contributions, based not only on the volume of add-ons, but also of
their persistence (what they call the longevity). The interest of
this statistical method, which uses dump data, is its ability to be
implemented for the all set of authors in a project. It made it possible
to identify bots and vandals \citep{Adleretal08a}, and provided insights
to \citeauthor{AnthonySmithWilliamson09}'s arguments (2009) that
anonymous contributions are important.
Another part of the literature looks at these other activities, not
only at the contribution to article writing, but also to discussion
and project pages, user talk pages, leading to a typology of participants'
behavior, or ''social roles''\footnote{For a study of social roles in Online Communities, in addition to
\citet{Welseretal11}, which rely on Wikipedia, see \citet{Gleaveetal09}.}. This can be seen as a decrease of the quantitative scope (exploitation
of the data) toward more qualitative data, in order to increase, to
deeper the qualitative understanding of the practices (exploration).
We will organize the presentation of the papers this way in the rest
of this part.
\citet{UngDalle10} emerged a ''project leader'' role, based on
project page editing activity (a project leader is the one who does
more than 5\% of the edits on a project page). They found a positive
correlation between the coordination tasks (editing activities in
the talk pages) and the contributions to the article of these leaders.
\citet{Ibaetal10} looked at a very small set of articles and people,
but went deeply into the interaction between those people in the contribution
(edits) and then in the talk pages. They used social network analysis,
the nodes being the persons and the weighted edges the number of time
author B contributes to the same article as author A. Looking at the
activity in the talk pages of four very active editors in the start
and the building of quality articles (''coolfarmers in their terminology''),
they found two types of patterns: ''the mediators, trying to reconcile
the different viewpoints of editors, and the zealots, who are adding
fuel to heated discussions on controversial topics''. They also identified
''egoboosters'', i.e. people who mainly use Wikipedia to present
themselves, which, if being done by adding entries to the encyclopedia,
is against the rules.
As for other open source communities such as Python \citep{BarcelliniDetienneBurkhardt08},
\citet{Harreretal08,Halatchliyskietal10}, both investigating sub-projects
(domains) of the German version of Wikipedia, showed the importance,
for the construction and the structuring of the knowledge in Wikipedia,
of the ''boundary spanners'', in the sense given by the Sociology
of Translation \citep{CallonLawRip86,AkrichCallonLatour06}, i.e.
those people who are at the intersection of several domains of knowledge
and because they have a broader view ''are not only responsible for
the integration of knowledge from a different background, but also
for the composition of the single-knowledge domains. Predominantly
they write articles which are integrative and central in the context
of such domains.''
\citet{Huvila10}, using a ground theory approach via an online opened
questions survey to contributors, proposed a classification in five
types for the contributors, according to their activities and to the
way they find their information (table \ref{tab:Groups-of-Wikipedia}).
\begin{table}
\caption{\label{tab:Groups-of-Wikipedia}Groups of Wikipedia contributors according
to a qualitative analysis of the research data, from \citet{Huvila10},
table 1.}
\begin{tabular}{|c|>{\centering}p{14cm}|}
\hline
Group & Description\tabularnewline
\hline
\hline
Investigators & Contributions relate to personal interest or hobby related area (of
expertise) based mostly on news sources, popular scientific or fact
literature and/or visiting the local library {[}...{]} They represent
the hard core of contributors who start articles and make considerable
contributions to existing ones.
\textit{Members of the group were mostly graduates, professionals
working on topics other than those to which they are contributing.} \tabularnewline
\hline
Surfers & Contributions are based on easily findable sources available on the
net. Surfers spend their time on using search engines and finding
fitting material for articles. Their personal interest on the topics
they are editing is similar to the group of investigators, but they
do not investigate the same sources of information.
\textit{Surfers are primarily secondary school educated, undergraduates
and professionals.}\tabularnewline
\hline
Worldly-wise & These contributors tend to focus on topics relating to their own sphere
of experience and knowledge. They do not tend to seek information
explicitly for their Wikipedia contributions and tend to rely on serendipitous
information seeking and information discovery.
\textit{Background and the level of experience vary.}\tabularnewline
\hline
Scholars & Contributions on an academic or professional area of expertise.
\textit{The archetypal contributor in this small, but quite distinct,
group is a PhD student or a relatively young researcher who is contributing
on the topics related to their research.}\tabularnewline
\hline
Editors & Some of the editors focus on administrative tasks, grammatical corrections,
correction of inconsistencies between articles, and another group
on translations from other language versions of Wikipedia. They do
not generally seek information for their Wikipedia edits.
\textit{The group was very small and rather heterogeneous in the present
study, but they shared, broadly speaking, a professional background
and a college level education. }\tabularnewline
\hline
\end{tabular}
\end{table}
\citet{Welseretal11} directly referred to social role literature
and provided, in addition to a synthesis of \citet{Harreretal08,Halatchliyskietal10,Ibaetal10},
a complementary perspective of Huriva's classification, integrating
the social interactions (the discussion activities). They looked for
''structural signatures social attributes of actors'', i.e. the
actions taken, but also the network of interaction, and the social
interaction, especially in the talk pages, in the user pages, and
in the user talk namespaces. It is a rather exploratory survey, based
on qualitative analysis for identifying roles, and studying the differences
in action, network, social interaction of these roles using dump data.
It does not provide a lot of extra information about the first four
type of contributors, beside the fact that they pointed out that some
of these contributors, they named ''substantive experts'', ''invest
time in fact checking and article talk to discuss details of articles''
(p. 4). But their work seems to indicate that Huriva's ''editors''
can be split into three sub-groups, ''technical editors'', ''make
numerous small changes to content pages, frequently specializing in
a particular type of problem'' and with few presence in the talk
pages (p. 7), ''counter vandalism editors'' (ibid) who correct vandalized
pages and post warnings in vandal's user pages, and the ''social
networking editors'' (ibid), who invest few in the editing, but a
lot in the social interactions, the community building.
%Here also is the introduction of the time, the evolution of the people
Of course, as the motivations vary, the level or type of contribution
may also vary among time. However, \citet{PancieraHalfakerTerveen09},
using internal data of the English Wikipedia, with time series, and
\citet{DejeanJullien15}, surveying French Wikipedia contributors
reached the same conclusion: the level of participation strongly depends
on the first contributions to the project. \citet{AntinCheshireNov12}
went a bit further, showing that not only the level of activity, but
also the type of tasks can be statistically predicted by the first
contributions. But this does not mean that everybody follows the same
path. For instance, \citet{OkoliOh07}, looking at English Wikipedia
contributors, showed that people having lots of participation in various
articles (they assimilate to ''weak links'', in a \citet{Granovetter85}'s
framework) are more likely to become administrator (to have administrative
rights) than those more focused on a sub-set of articles and talking
with a small subset of people (and then developing strong(er) links).
In addition to this, it seems that the administrators are not among
the most active contributors to the articles, and that their share
in the total contributions is decreasing over time, at least for the
English Wikipedia \citep{Ortega07}. This lead \citet{Zhuetal11},
relying on \citet{Bryantetal05}'s study, to propose two main careers
for the people, coherent with Okoli et Oh's findings: from non-administrators
to administrators and from non-members to Wikiproject regular members
to Wikiproject core members (figure 1, page 3433). On that aspect,
\citet{AntinCheshireNov12} confirmed that people involved from the
beginning in more diverse revision activities are more likely to take
administrative responsibilities.
These findings reinforce the perception that there is an à la Becker
career for contributors, and different paths of participation, with
a learning process (future contributors are firstly readers ''dipping
their toes in to passively participate while learning more about a
complex system'', according to \citet{AntinCheshire10}, surveying
a population of US college students). As it is the case for the involvement
in other communities of practice like open source (see, for instance,
\citet{FangNeufeld09} and \citet{SchillingLaumerWeitzel12}) there
is a period of apprenticeship, via legitimate peripheral participation
\citep{LaveWenger91}, as showed by \citet{Bryantetal05}.
These last paragraphs question the existence of an ''efficient''
structure of interaction to produce the articles and of an ''efficient''
process of inclusion. But before looking at these interactions, we
have to better understand what is produced, the pieces of knowledge
that are the articles and the articulation between them.