-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathraw.txt
executable file
·8180 lines (4267 loc) · 650 KB
/
raw.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Click “comments” above (next to the big blue share button) to see a list of all conversations we’re having in this document.
Random notes in comment ---->
Click show more to see the whole thing and click on the comment itself to reply to it.
Asking About Data:
Exploring different realities of data with a Social Data Flow Network
Introduction
In Information Systems and Technology studies (IST), I have noticed that practitioners use and understand the term “data” differently than the people they are helping. The purpose of this research is to explore the different conceptions of data that may exist beyond the domain of IST and demonstrate a methodology that allows practitioners to access the conceptions of data present in their workplace.
Exploring a conception of data is fundamentally a philosophical problem. A person’s conception of data stems from the affordances they attach to it, their belief in its underlying qualities, and their differentiation between data and non-data. However, this philosophical problem cannot be solved through intuition alone: a methodology is necessary to extract a person’s conception of data.
These individual conceptions can then be formalised as “philosophies of data.” By ‘philosophies’ we mean answers to the questions like, ‘What is data?’, ‘What is data for?, ‘How do I know the data is reliable?’, and ‘What are the properties of data?’ While individuals may not “have philosophies,” understanding that individuals engage philosophically with their conceptions of data allows the creation of a tool to probe those philosophical conceptions of data in a workplace. By probing conceptions, the IST practitioner effectively uncovers de facto philosophies of data in individuals.
This research, however, does not propose to uncover fundamental philosophies of data, only some common conceptions of data that may exist in workplaces. These different conceptions of data can produce frustration, error, and miscommunication if people with different conceptions interact unknowingly. Conceptions of data include context, reliability, constraints as to its nature (can it be a description, must it be a number), the means of collection, and the means of manipulation.
I have created a methodology called the Social Data Flow Network (SDFN). This interview technique has elicited people’s conceptions of data (their de facto philosophical approaches towards knowing that something is or is not data), demonstrating three different conceptions within a particular industrial research workplace. A survey developed from the SDFN technique hints that there may be different conceptions of data present in the intelligence analysis community and the IST practitioner community.
It is my hope that IST practitioners can use the SDFN I have developed to make better interfaces and databases: through the understanding of a client’s expectations of data, the system can provide natural interaction methods that conform to the client’s expectations of what data is and is not. The SDFN might also be used within an organization to reduce miscommunication and error: the explicit definition of one particular conception of data for a workplace.
Methodological summary
The primary result of this thesis is the methodology of the Social Data Flow Network. The SDFN uses repeated categorization to explore how individuals group informational or communicative flows into categories. By eliciting categorizes that focus on data, information, and knowledge, the participants use the categorization to operationalize their epistemological understanding of data: they indicate what is and is not data and how it becomes information and knowledge. This elicitation helps both the interviewer and the participant to discover their own situational conceptualization of data.
The repeated categorization allows participants to generate and resolve cognitive dissonance situated around the differences between their theoretical definitions of data and their practical uses and categorizes of data. In interviews, participants demonstrated a refined understanding of their own conceptions of data at the end of the interview, catalyzed through their participation in the SDFN.
The SDFN involves the articulation of roles as entities, descriptions of content flows between those entities and the categorization of those flows as data, information, knowledge, or other. Participants iterate over a task domain defined at the start of the interview, discussing all the entities and flows between those entities involved in the task. The interview concludes with an opportunity for the participant to self reflect on their “philosophy” of data, discussing what they categorize data as and how it becomes information and/or knowledge.
A scenario based survey, inspired by the SDFN was also trialled with less satisfactory results. While the survey did demonstrate that intelligence officers, IST professionals, and other industrial research employees did have different conceptions of data, it did not do so with any statistical rigor nor with the depth of discussion that the interviews provided.
The \SDFN\ combines two concepts for a novel purpose. It is a graph\footnote{A graph, strictly speaking, is any diagram that contains edges and nodes. A node is the component of a graph that is a point. The point can be labeled or unlabeled. The node is the element of the graph that is a representation of a thing. Sometimes the thing being represented is a computer or a person, or a place, but in any event the node represents a noun. Edges on the other hand are the relationships or connections between nodes. An edge represents a \quotation{flow} of action or stuff between nodes. Edges traditionally have served as network links, roads, phone lines, and simple representations of adjacency. A graph is a non-topological method of representing the relationships between entities through edges and nodes.
Edges can be directed: they show a flow or relationship from one node to another. The direction on the edge indicates the direction of relationship. For example, consider Alice and Bob. To represent Alice sending a letter to Bob, we would make both of them nodes and draw a directed edge from Alice to Bob indicating the one way flow of the letter. By adding the concept of directionality to edges, a causal element is introduced to the representation-specifically, that the originating node causes a relationship to the recipient node. This addition of causality then precipitates the idea of connectiveness.
A node may or may not be reachable by other nodes. A graph or subgraph where every node can be reached from every other node is called a strongly connected graph. A graph where that's not true is weakly connected. When we apply the idea of strongly connected graphs to social networks, we can identify small groups by identifying strongly connected subgraphs within a larger, weakly connected graph.} that combines the idea of the social network with that of the data flow diagram. In social network analysis, it is possible to represent interactions between people, a social network, through graphs. Each node on a graph represents a person and each edge represents some sort of connection between people, as a function of the interactions of interest to the researcher\cite{Scott1988}.
The Data Flow Diagram\cite{Larsen, Sun2006} contributes its diagrams to the \SDFN. A \DFD\ originally was designed for structured programming. The document produced by the \DFD\ would combine the delineation of a universe of discourse via the context diagram with the highly precise definition of flows into and out of that diagram. A \infull{UoD} (\UoD)\cite{Wiener1950} is the term used for defining the topic under consideration. Everything within the \UoD\ is relevant and must be modeled. Everything outside the \UoD\ is irrelevant. Interestingly, as the \DFD\ was repurposed for business modeling, the \UoD\ remained the same: it is still asking, \quotation{What bit of reality do we care about right now?}
The \DFD\ would then be refined through a process of \quotation{zooming in} on that context diagram to expose the transformations required to produce the outputs from the inputs. Each additional level would seek to conserve inputs and outputs, and thereby produce a diagram that could be mapped to the functions and variables necessary for a structured program.
The \DFD\ contributes great ideas to the \SDFN. It contributes the idea that data {\em is} something that can be modeled. The conception of data embodied by the \DFD\ is that the modeler can translate reality into data-as-bits and that data could be described through text. All actions in the data flow diagram are considered either flows or transformation. Data flows from sources through transformations, and out into sinks. The sources and sinks are entities outside the scope of the diagram. By decomposing these transformations into ever simpler and more detailed sets of sub-transformations, modelers could design an entire software system intended to process and transform data. The modeler acts as translator: taking the described reality by the client and forcing it into a computerized mold. Repurposing the methodology of the \DFD\ by subtracting the modeler's translation suggests that it might be possible to use my method to probe and document a client's subjective reality.
The \DFD\ also contributes an iterative structure for the definition of reality. The iterative techniques explore the \UoD\ in order of increasing specificity from the vague context diagram describing the universe of discourse to highly detailed sub-sub-sub (etc.) transformations required deep in the diagram. By starting with broad generalizations, the \DFD\ insured that the client was thinking about the whole task and did not immediately become fixated on any one aspect. With the \DFD\ iterating across each declared \quotation{transformation} and decomposing it, the details of each transformation were both evoked and then situated in the scaffolding of the broader context. The requirement to conserve inputs and outputs eliminated any question of missing aspects of the diagram or other design-based blind alleys. The idea of iterative exploration and definition is extremely valuable to the \SDFN.
The Social Network Graph provides the concept of a social network\footnote{A social network graph is a mapping of a person's relationships with other people into non-topological graph format. Each relationship is a directed edge; each person, a node. The social network graph is used in many different fields: communications, social media, and sociology are some of them. In many ways, the idea of the social network graph is strongly related to the ideas of actor-network theory\cite{Nejdl2006}.} to the \SDFN. The Social Network Graph also contributes a novel idea about the {\em scope} of edges. Edges in the \DFD\ were simple {\em flows} of data, representing the movement of trivial signs. In the social network graph, edges can be individual communications, orders, relationships, and objects. The huge diversity of edge types suggested by a social network graph, when combined with the \DFD, ruins the \DFD\ for its original purpose: the modeling of software systems. However, they also suggest different possible models that can be applied to the \DFD\ format.
\placefigure[]
[fig:SNA]{Social network graph of \#sla2009 tweet replies to June 19, 2009 \quotation{The thicker the line, the more times you sent an @reply to that person. The more lines you have, the more @replies to different people you sent. If you don't appear on the graph, but know that you sent out @replies, it's because the person you sent your @reply to never sent out an @reply and so that person won't appear on the graph and unfortunately, you can't either! Interestingly, a few people only sent replies to themselves, so they do appear on the graph as a line that goes back to themselves.} -Image used with permission, created by: Daniel P. Lee, MLIS.}
{\externalfigure[Chapter3/SNA.jpg][factor=fit,frame=on]}
In communicative analysis, social network graphs are used for linguistic analysis\footnote{\in{figure}[fig:SNA] provides a trivial example of linguistic analysis as applied to a set of twitter replies during a conference. The different line weights are used to denote quantity of communications along a radially distributes set of nodes. Other approaches can be far more complex, looking at patterns beyond simple frequency\cite{Barnes1983, Reffay2002}.}. It is possible to explore the control structures of a group by noting, with an edge, who is talking to whom. By exploring the frequency and directionality of those notes, analysts gain insights into the power and influence roles of social networks. As such, the \quotation{thought leaders} of the small group can be identified.
Moreover, by graphing flows of communication, it is possible to identify small groups within larger groups, as these small groups will communicate strongly between each other and vaguely to nodes outside. In other circles, this behavior is known as siloing\cite{Jones}. One design intent of the \SDFN\ is to confer the ability to identify siloing. By rendering flows between members of an organization, it should be possible to identify strongly connected sub-graphs, which suggest communicative silos within that organization.
The social network graph contribution alters the diagramming rules of the \DFD. Social network entities can be any actor that participates in a communication. The \SDFN\ is a diagram exploring flows of data between actors, instead of flows between transformations. By creating a web of affiliation\cite{Pondy1967} between these entities, it should be possible to describe the communicative realities that an individual perceives. It should therefore be possible to explore how they understand the nature of data by exploring how they describe its movement from entity to entity in the \SDFN.
Despite the terminology of actors, and the use of a social network, my research does not yet incorporate actor-network theory\cite{Latour2005}. While Latour's work offers many useful ideas for understanding the world, it still imposes a framework from which biases may be imparted. Therefore, while I do not use actor-network theory here, it may be useful in later research exploring the implications of held philosophies on Latour's work.
The \SDFN\ does not try to be explanatory, comprehensive, or objective. The point of the \SDFN\ is to reveal part of how the participant understands a concept, not to build upon that understanding nor transform it into a model for a computer system. Consequently, no design provisions in the methodology allow two or more peoples' categories to be reconciled. More work will be necessary before the \SDFN\ can be used directly as a design methodology.
\stopcomponent
Analysis and Results
These questions of interest are posted to the reader to keep in mind in the results section. My personal analysis, presented after the “raw data,” uses these questions of interest as framing devices for my reflections on the individual interviews.
\section{Questions of Interest and the Methodology of Analysis}
My \quotation{hypotheses} are described as questions of interest to reflect the rapid iterative nature of abductive explorations. They provide research directions that act as broad guides to the formation of a universe of discourse for future research rather than predictive statements about reality.
The intent of the questions is to frame analyses and guide it towards useful and interesting areas. We need to consider how the evidence relates to these questions of interest.
Each interview, after transcription, was subjected to recursive analysis for my personal reflections on the interviews. I summarized six to ten lines of each interview in a one-line summary. Then between three and six summaries were summarized, filtering for statements about the user’s conception of data. Although self-transcription transmits personal bias, two significant factors prevent a traditional double-blind study. An untested methodology is no place for the mass utilization of volunteer interviewers. The limited scope allowed me to retain control of the interview process and to provide for the best possible interviews for each participant while retaining the basics of the \SDFN. Because I conducted each interview, the bias would have already been introduced; providing for pious-sounding human coding would have lent false reliability to something inherently subjective.
My personal reflections are very simple. I have tried to extract each participant's intuitions about data from the recursive analysis.
\subsection{Question of interest 1: Do People have different philosophies of data?}
If this research produces nothing else, it must investigate whether people have different conceptions of data. This idea was the central intuition that prompted this research, and its testing will demonstrate whether or not there is anything to my intuition.
As the organizing factor of my analysis, this question of interest will focus my activities. It will justify further research on the philosophy of data from my experimental results, or else its demonstrable failure will justify not doing so.
The question \quotation{Do people have different philosophies of data?} defines an overly large universe of discourse, one impossible to study at a useful level of granularity in one research project. The very breadth of the question precludes the determination of any useful and specific facts about the world besides simple exploration of the assertion that people have different understandings of data. The intent of this research and of this question is to generate interest in the research of the philosophy of data and to demonstrate that there are questions to research.
I want to see if, beyond my intuitive insight, people actually have different conceptions of data or if my perception of different philosophies is an artifact of the requirements-gathering process of designing a database. It is therefore not sufficient to state that people have different understandings of data depending on whether they are dealing with it in a technical or scientific context. We must look for evidence.
This question of interest, in its reach, is not ambitious. It suggests no predictions about peoples' conceptions of data, how they act with different realities of data, or any other fact about the world. Instead, it simply directs us to see if there is anything of interest for further explorations.
\subsection{Question of interest 2: Can my methodology probe people's philosophies of data?}
My methodology has a simple job: to assess what people mean when they use the term \quotation{data.} This question of interest is designed as a sanity check. I am investigating a new idea with an untested methodology. It is vital to consider that the success or failure of Question of interest 1 is directly modulated by the success or failure of Question of interest 2. Therefore, the methodology itself deserves distinct analysis.
The methodology should be of use to more people than just those investigating peoples’ conceptions of data. If the methodology is useful and judged to add value to Question of interest 1, analysis of the methodology should indicate whether other people could use it to investigate matters of interest to them.
Question of interest 2 is asking: do these results make sense? Sense-making is a matter of internal and external consistency. This question should force me to explore whether the \SDFN\ correlates with interview results and whether the types of results make sense relative to the survey.
Beyond consistency, I must also ask: Is it possible to get these results from this methodology? In this case, I need to make sure that I am not reading imaginary meaning in the tea leaves of the results. Because this kind of external self-reflection is difficult, the question must be simplified to: Do the results surprise me? If they do not have elements of surprise, then the probability that I am projecting meaning into them must be strongly considered.
All of these are very self-critical questions, as they must be to explore the impact of an untested methodology. I am trying to consider whether my methodology can present a persuasive story, and if it can, does it?
\section{Interview Analysis}
My interview analysis discovered three different conceptions of data. It would be hard to deny that interviews I and II have data as communication, III and IV have subjective observations (with IX hinting at them) and the rest considering data as objective fact. With these broad differences evident, I feel question of interest 1 has been satisfied.
The observation constructions differ strikingly from the numeric constructions, possibly differing on a fundamental perception of reality. As one interview is trying to render the relationships between matter in the world as numbers (objectivist), another is suggesting that everything emits data and we must filter it. The conflict is records versus measurements versus signs. Does data measure objective reality, record subjective reality, or merely transmit signs? Numbers are seen as a result of precision most of the time, whereas observations are building their way towards knowledge.
\subsection{Data as communications}
Data, in the communicative sense, merely requires signs and things to communicate with those signs. The data can be rendered as bits or marks on paper, but it is seen as a factor of semiotic import rather than as something to be discovered or filtered.
This construction is substantively different from the other two inasmuch as it does not uphold data to be an aspect of reality. Instead, data is produced as a function of human intent. Because this understanding does not concern itself with interactions of the real, there is a far greater difference between this and the other two than between the subjective-objective philosophies. However, the passivity of this construction allows it to accept facts produced from either source as something to be encoded, stored, and transmitted. Significant research needs to be done to explore how this construction of data relates to the other two.
\subsection{Data as subjective observations}
Data, in the subjective observations, requires contextualization and filtering. Everything emits data as sense impressions\footnote{Like the ancient Aristotelian idea of species (particles of sensation). Light was the medium that visual species traveled within.
While this ancient philosophy of image is not hugely useful to us, the same intuitions that led to it could have some parallels with data as subjective observations. This research area could make an interesting bridge between intuitive and experimental philosophies.
} that can be captured by us. Thus, to perform sense-making activities, we must filter and contextualize the interesting data so that it can become information.
Subjective data lends itself more to cyclic hierarchies, where data begets the information and knowledge used to collect more data, reflecting an interestingly constructivist view of knowledge. There is quite a lot here available to future research, and I do not feel sufficiently confident in my sample size to make any assertions as to relationships between data and the various philosophies of knowledge or science, though the subjective nature of observations may tend slightly more towards Latour or Feyerabend.
Of more interest is that this inherently subjective data is constructed from the mind's impressions of the surroundings, rather than revealed through measurement of the surroundings. The understanding of the embodiment of data is a significant difference between the two understandings of data.
\subsection{Data as measured facts}
Objective data comes with its own context \quotation{baked in.} It is, in many ways, rare: it requires positive effort to generate, and higher quality data requires a commensurate increase in effort. Data requires analysis to uncover the extant patterns of reality, and with enough data, knowledge about the singular real can be generated.
Objective data requires that data be a fact, usually a numerical, reproducible representation of reality that conveys an understanding of measurement quality and units. Objective data is not filtered, because it is collected with prior intent and all elements of the \quotation{data set} may produce interesting patterns.
Both humans and sensors can reveal objective data, which is embodied in the things being measured. There seems to be no significant link with any of the major philosophies of science. Although my investigations did not explore confirmation, falsifiability, or paradigms, there seems to be a common understanding that data-as-fact accurately represents the universe within the constraints of measurement. This may be because the participants believed data to be a building block upon which their hypotheses or understanding of the universe could be built.
Literature Review
The aim of this thesis is to ultimately facilitate better workplace communication, user interfaces, and database design and management. In order to do that, I borrow heavily from concept elicitation methodologies in order to produce personal constructs of data. These personal constructs of data, rendered in a concept map, allow for explicit exposition of the concept of data in a workplace and thereby reduce miscommunication through self-aware modification to available mental maps of the purpose and role of data.
Concept elicitation methodologies are a subset of knowledge elicitation methods, a tool used in many disciplines to “obtain the information needed to solve problems” (Burge). Knowledge elicitation, in the main, is focused on direct problem solving: exploring requirements and understanding the meanings of those requirements. However, by turning the techniques of knowledge elicitation onto epistemological questions of category, we can discover not the direct meaning behind requirements, but some of a person’s semiotic models of the constructions behind those requirements.
My research looks to investigate a person’s personal construction of data. I borrow from data flow diagrams with a similar intent to the RepGrid methodology, though the end product differs significantly. The idea of personal constructs, discussed by Kelly (1955) and reformulated under many names: “Terms that have been used to describe these personal perspectives include “schemas” (Cossette and Audet, 1992; Jelinek and Litterer, 1994), “cognitive maps” (Eden, 1992; Weick and Bougon, 2001), “technological frames” (Orlikowski and Gash, 1994), and “mental models” (Daniels et al., 1995).” (Tan) I, like Tan, will use personal constructions as the operative term.
Kelly (1955, p74) describes a personal construction as a combination of philosophy and psychology. A construct, being subjective, is a personal epistemological tool of categorization and differentiation: “A construct is a way in which some things are construed as being alike and yet different from others.” His thesis denotes constructs as framing devices where we can situate objects-as-signs in our way of knowing. He continues, “We have departed from conventional logic by assuming that the construct is just as pertinent to some of the things which are seen as different as it is to the things which are seen as alike.” Here, the fact that an object is not categorized as something can be an important factor in a person’s personal construction of reality. Constructs are bipolar, admitting knowledge of the sign/concept and its opposite rather than simple negation. The SDFN extends this bipolar methodology of construction construction by asking people to categorize elements as data, information, or knowledge. By articulating a tripolar construction, we not only can articulate the positive categorizations of data, but can more closely examine data as it transforms into specifically deliniated categories.
Much would be lost if participants were asked to categorize “data or not data” as the “not data” construction comprises everything that is not data, and is therefore not particularly interesting as a means of indicating the ontological and epistimological affordances of data. By requiring positive categorization, relationships between data and other concepts can be elicited more easily than simple negation would warrant. However, I also recognize that a a given categorization may simply be irrelevant in respect to data (relevancy is a far more useful and pragmatic benchmark than negation). Kelly notes that personal constructions are bounded (p76), and are not necessarily “convenient” methods of categorization. In that light, the interview methodology will allow participants to articulate other categories that do not belong to the trinary construct of data-information-knowledge.
The repGrid, (Tan 2002), is a similar concept elicitation method. Tan describes the IST uses of the technique as: “a set of procedures for uncovering the personal constructs individuals use to structure and interpret events relating to the development, implementation, use, and management of IST in organizations.” While it is more overtly focused on organizational modelling, and the interpretation of events, it is a study of cognitive processes in an organizational setting to more effectively articulate information system requirements. The repgrid relies on participants sorting a pre-established schema of entities or objects, defined as a common set of “nouns or verbs” to constructs, the framing understanding around those concepts. Tan describes repGrid concepts as: “Constructs represent the research participant’s interpretations of the elements. Further understanding of these interpretations may be gained by eliciting contrasts resulting in bi-polar labels. Using the same example, research participants may come up with bi-polar constructs such as “high user involvement – low user involvement” to differentiate the elements (i.e., IS projects).” The creation of framing dichotomies echos the construct framework of Kelly and then allows users to sort elements within those constructs with a variety of different methods.
However, the repgrid is not the best tool for understanding constructions of data: while it does articulate a dichotomy, it fails to expose the manipulations attached to data. Elicitation of affordances and transformations of data is crucial to understanding a person’s construction of data in sufficient detail to provide useful tools designed for them. Furthermore, while the statistical reliability of the repGrid is appreciated, especially as it can be subject to content analysis through simple frequency counting, the lack of an explicit period of participants to articulate their self-schemata robs interviewers of the potential insights of an articulated schema.
A representation grid draws on the personal construct framework for its own purposes of organizational knowledge modelling. In many ways, a”RepGrid” is a means of evaluating a social construction of reality, as discussed by Berger and Luckman. The social construction of reality echos the idea of personal constructions (though never explicitly calls out the term) by evoking the different realities of objects, “Different objects present themselves to consciousness as constitutents of different spheres of reality. I recognize the fellowmen I must deal with in the course of everyday life as pertaining to a reality quite different from the disembodied figures that appear in my dreams. The two sets of objects introduce quite different tensions into my consciousness and I am attentive to them in quite different ways.” This evocation of personal constructions framing the affordances of interaction was one of the other inspirations behind this project. While Berger & Luckman articulate the primacy of our shared reality, this investigation explores one area where that shared understanding may break down.
Shared understandings of reality as encoded as self-schemata and expressed as understandings of terms. While this practice should just as easily be expressed as a linguistic pursuit, the aim of this investigation is to uncover elements of that primal construction of reality, not in differences in linguistic expression of that construction. I have found that the best way to explore an individual’s construction of reality is to ask them to express that reality in database design. The act of rendering the real-in-mind into diagrams expressing that causes an awareness of the self-schemata to coalesce simply by bring it into the forefront of consciousness. Through introspection into cognitive activity, self-schemata are formed: “attempts to organize, summarize, or explain one's own behavior in a particular domain will result in the formation of cognitive structures about the self or what might be called self-schemata. Self-schemata are cognitive generalizations about the selj, derived from past experience, that organize and guide the processing of self-related information contained in the individual's social experiences.” (Marcus 1977) It is this very process which the creation of the data flow diagram occasions in regards to an individual’s data manipulation activities. Furthermore, it is this act of schemata creation and subsequent discussion that I aim to elicit with the /SDFN.
The idea of schemata {/em qua} personal constructions of reality influencing human computer interfaces and system design is not novel. Though in the HCI field, the term “mental model” is used. Wilson and Rutherford were exploring this very topic in 1989. Specifically, while they identify a significant variation in the defintions of the term “mental model,” they generalize the term to: “a representation formed by a user of a system and/or task, based on previous experience as well as current observation, which provides most (if not all) of their subsequent system understanding and consequently dictates the level of task performance.” (1989) The definitions they synthesize this from extend back into the seventies, and there is no fundamental disagreement that the practice of human-computer interaction is, in some way, the practice of presenting an interface to these mental models.
It is important to note that there are philosophical distinctions between the terms mental model, personal construction, and self-schemata. A personal construction is, in many ways, the philosophical reality of a term. The construction provides for understanding of when and how to use the term for all use cases as well as its personal and cultural semiotic identifications. A self-schemata is the articulated and explicit epistemological conceptions of the term: it is the developed understanding of an individual understanding how they categorize and use a term. A mental model, on the other hand, is the situated understanding in procedural memory. These mental model are, themselves, socially constructed through routines in organizations. (Cohen and Bacdayan 1994). The mental model is the procedural manifestation of he personal construction in the recognized semiotic affordances of the concept of data.
Extending the mental model to expected manipulations of data, rather than expected interactions with a system is the providence of the /DFD, though the /DFD/ holds to an objective reality which synthesizes many mental models. The /SDFN, therefore, is a way to inspect the subjective mental models of humans as they relate to the expected interactions and transformations that their world applies to the thing they call data. As the term is never formally taught, we must evolve our models by experience with the world. Rasmussen asserts that mental models evolve with world-experience: “A mental model of a physical environment is a causal model structured in terms of objects with familiar functional properties. The objects interact in events, i.e., by state changes that propagate through the system “Kelly (1955) argues that individuals use their own personal constructs to understand and interpret events that occur
around them and that these constructs are tempered by the individual’s experiences.”
As our experience with the world differs, so to must our models diverge to make individual predictions about the systems we encounter in our subjective, constructed, reality. Through articulated schema creation, we can expose a person’s mental map in a suffiicently valid framework for database designers and philosophers to puzzle over.
\subsection{Hierarchies of Data}
This work is not the first to ponder the nature of data. There exist two significant and pre-established relationships of data to information and knowledge: Ackoff's and Tuomi's. My findings mostly tend to echo the realities of data described by Ackoff or Tumoi. While not every interview or survey articulates a hierarchial relationship between data, information, and knowledge, it is clear that Ackoff's work has entered the \quotation{common knowledge}. A number of interviewees discussed a hierarchy of data first promulgated by Ackoff. While they never cite his influence, their descriptions of relationships between data, information, and knowledge match his quite precisely.
Interestingly, this hierarchy has a management basis, rather than one grounded in philosophy or practical understanding of how we appreciate and use data. There are many signs of cognitive dissonance between what is perceived as the traditional hierarchy and how data is used in practice. Ackoff has updated his hierarchy many times, including a book \quotation{Management F-Laws}\cite{Ackoff2007} wherein he not only states that \quotation{to managers, a pound of wisdom is worth an ounce of understanding} but also belabors the useless metaphor with the extension that an ounce of wisdom is worth \quotation{65,536 ounces of data.}
Tuomi presents a more philosophically rigorous conception of data in his cyclic hierarchy, with data feeding into information, which in turn feeds knowledge, which provides the ability to collect more data. In the analyses, I will refer to this as a cyclic hierarchy.
\subsubsection[Ackoff]{Ackoff's Traditional Hierarchy}
The traditional hierarchy linking data, information, knowledge, and wisdom in a strict hierarchy of dominance and importance was created by Ackoff\cite{Ackoff} in 1989. In a summary of his work, Bernstein notes\cite{Bernstein2009}:
\startextract
Ackoff was a management consultant and former professor of management science at the Wharton School specializing in operations research and organizational theory. His article formulating what is now commonly called the Data-Information- Knowledge-Wisdom hierarchy (or DIKW for short) was first given in 1988 as a presidential address to the International Society for General Systems Research. This background may help explain his approach. Data in his terms are the product of observations, and are of no value until they are processed into a usable form to become information. Information is contained in answers to questions. Knowledge, the next layer, further refines information by making \quotation{possible the transformation of information into instructions. It makes control of a system possible} (Ackoff, 1989, 4), and that enables one to make it work efficiently. A managerial rather than scholarly perspective runs through Ackoff's entire hierarchy, so that \quotation{understanding} for him connotes an ability to assess and correct for errors, while \quotation{wisdom} means an ability to see the long-term consequences of any act and evaluate them relative to the ideal of total control (omnicompetence). While a scholarly perspective on this hierarchy might prioritize the processes of inquiry and discovery, Ackoff does not account for them. But his concept of omnicompetence, which refers to \quotation{the ability to satisfy any and every desire} (Ackoff, 1989, 8), does encompass the satisfaction of user-defined needs.
\stopextract
In this ontology, data are subjective observations. Curiously, despite data being subjective observations, Ackoff does not suggest any need for filtering (a common theme in subjective/observation philosophies of data).
\subsubsection[Tuomi]{Tuomi's Cyclic Hierarchy}
Tuomi's ontology is simple and counterintuitive: knowledge is a framework of the world from which we build information. Information provides a local framework from which to extract data from the world. Thus, the apex of the hierarchy is data, which then filters downwards to modify knowledge and information. This approach represents an abductive approach towards the philosophy of knowledge and the philosophy of data as the hypotheses being tested are information, generated from knowledge-of-world, rather than induced from data points\cite{Tuomi}:
\startextract
The generally accepted view sees data as simple facts that become information as data is combined into meaningful structures, which subsequently become knowledge as meaningful information is put into a context and when it can be used to make predictions. This view sees data as a prerequisite for information, and information as a prerequisite for knowledge. ... [Exploring] the conceptual hierarchy of data, information and knowledge, showing that data emerges only after we have information, and that information emerges only after we already have knowledge.
\stopextract
Tuomi's conception of a reverse hierarchy is useful to my research in two significant ways. Obviously, it allows me a prior idea to which to compare analyzed philosophies of data. Although Tuomi's research is not explicitly about user’s conceptions of data, his analysis of hierarchies is an acceptable complement to it. By presenting a novel relationship hierarchy, Tuomi challenges the \quotation{everyone knows} mentality of much of knowledge management.
Tuomi's research into the fundamental questions of knowledge is one of the fundamentals of my research, for he demonstrates that it is possible to have a different understanding of data from an intuitive-philosophical standpoint. This demonstration of difference allows a questioning of the nature of data and acts as an external source of validation for my analysis.
This ontology of data is supported by a study of \quotation{intelligence} published in {\em Nature} by McNab and Klingberg\cite{McNab2008}:
\startextract
Thus, high-capacity individuals (who can remember more information at once and who tend to do better on aptitude tests) might simply be better at keeping irrelevant information \quotation{out of mind,} whereas low-capacity individuals may allow more irrelevant information to clutter up the mental in-box. The difference may just be a matter of having better spam filters.
Some of our own recent work on differences in controlling access to working memory has provided evidence favoring this mental spam-filtering idea. In one experiment, measuring electrical signals emitted by the brain enabled us to show that high-capacity people were excellent at controlling what information was represented in working memory: they let in information about relevant objects but completely filtered out that about irrelevant objects. Low-capacity individuals, in contrast, had much weaker control over what information entered the mental in-box; they let in information about both relevant and irrelevant objects roughly equally. Surprisingly, these results mean that we found that low-capacity people were actually holding more total information in mind than high-capacity individuals were-but much of the information they held was irrelevant to the task.
\stopextract
The idea of consciousness as filter discussed by all of these researchers is not particularly novel, although the localization of filtering activities by fMRI to those physical regions of the brain suggests that this ontology has a closer connection to our biological minds than does Ackoff's.
\subsection{Links with the literature}
There are two authors whose research into the various relationships of data, information, and knowledge presented useful assistance in my analysis. The first is Dr. Ilkka Tuomi, who articulates a reverse hierarchy of data, information, and knowledge, a hierarchy to which at least one of my participants subscribes. The second researcher is Dr. Chaim Zins, whose surveys in many ways validated Ackoff's \quotation{standard} hierarchy with his experiments, despite showing that there are many definitions of data, information, and knowledge.
\subsubsection[Zins]{Zins' concepts}
Zins performed a \quotation{collective knowledge mapping} of a number of researchers using a critical Delphi methodology over three rounds of research\cite[Zins2007,Zins2007a,Zins2007b,Zins2007c]. Although his approach looked at definitions directly, he found two different concepts of data and a similar ontological split over subjectivity, objectivity, and communication.
While I have utilized his works for justification of my literature, I did not closely examine his conclusions to avoid biasing my analyses. Zins, in 2003, identified areas of difference similar to those I identified\cite{Zins2007b}:
\startextract
Six distinctive concepts. Having established the distinction between the subjective and the universal domains, we are in a position to define the three key concepts data, information, and knowledge. In fact, we have six concepts to define, divided into two distinctive sets of three. One set relates to the subjective domain, and the other-to the universal domain.
[Data-Information-Knowledge] in the subjective domain. In the subjective domain, data are the sensory stimuli, which we perceive through our senses. Information is the meaning of these sensory stimuli (i.e., the empirical perception). For example, the noises that I hear are data. The meaning of these noises (e.g., a running car engine) is information. Still, there is another alternative as to how to define these two concepts-which seems even better. Data are sense stimuli, or their meaning (i.e., the empirical perception). Accordingly, in the example above, the loud noises, as well as the perception of a running car engine, are data. Information is empirical knowledge. Accordingly, in the example above, the knowledge that the engine is now on and the car is leaving is information, since it is empirically based. Information is a type of knowledge, rather than an intermediate stage between data and knowledge. Knowledge is a thought in the individual's mind, which is characterized by the individual's justifiable belief that it is true. It can be empirical and non-empirical, as in the case of logical and mathematical knowledge (e.g., \quotation{every triangle has three sides}), religious knowledge (e.g., \quotation{God exists}), philosophical knowledge (e.g., \quotation{Cogito ergo sum}), and the like. Note that knowledge is the content of a thought in the individual's mind, which is characterized by the individual's justifiable belief that it is true, while \quotation{knowing} is a state of mind which is characterized by the three conditions: (1) the individual believe[s] that it is true, (2) S/he can justify it, and (3) It is true, or it is appear to be true.
[Data-Information-Knowledge] in the universal domain. In the universal domain, data, information, and knowledge are human artifacts. They are represented by empirical signs (i.e., signs that one can sense through his/her senses). They can take on diversified forms such as engraved signs, painted forms, printed words, digital signals, light beams, sound waves, and the like. Universal data, universal information, and universal knowledge mirror their cognitive counterparts. Meaning, in the objective domain data are sets of signs that represent empirical stimuli or perceptions, information is a set of signs, which represent empirical knowledge, and knowledge is a set of signs that represent the meaning (or the content) of thoughts that the individual justifiably believes that they are true.
Signs Versus Meaning. Defining the Data-I-Knowledge phenomena as sets of signs needs to be refined. There is a fundamental distinction between documented (i.e., written, spoken, or physically expressed) propositions and meanings. \quotation{E = MC2},\quotation{E = MC2}, and \quotation{E = MC2} are not three different types of knowledge. These are three different sets of signs that represent the same meaning. In other words, they are three different utterances of the same knowledge. Knowledge, in the collective domain, is the meaning that is represented by written and spoken statements (i.e., sets of symbols). However, because we cannot perceive with our senses the meaning itself, which is an abstract entity, we can relate only to the sets of signs (i.e., written, spoken, or physically expressed propositions), which represent it. Apparently, it is more useful to relate to the data, information, and knowledge as sets of signs rather than as meaning and its building blocks.
\stopextract
My work profoundly agrees with the discoveries he made, though my research focuses far more on data and differentiates three different orders of data to his two.
\subsubsection[Galison]{Trading Zone}
Also of interest is the way that the interviewees demonstrated the idea of a trading zone. As Galison defines it\cite{Galison1997b},
\startextract
These considerations so exacerbated the problem [of physicists communicating] that it seemed as if any two cultures (groups with very different systems of symbols and procedures for their manipulation) would be condemned to pass one another without any possibility of significant interactions. Here we can learn from the anthropologists who regularly study unlike cultures that do interact, most notably by trade. Two groups can agree on rules of exchange even if they ascribe utterly different significance to the objects being exchanged; they may even disagree on the meaning of the exchange process itself. Nonetheless, the trading partners can hammer out a local coordination despite vast global differences. In an even more sophisticated way, cultures in interaction frequently establish contact languages, systems of discourse that can vary from the most function-specific jargons, through semi-specific pidgins, to full-fledged creoles rich enough to support activities as complex as poetry and metalinguistic reflection. The anthropological picture is relevant here. For in focusing on local coordination, rather than on global meaning, one can understand the way engineers, experimenters, and theorist interact. At last, I come to the connection between place, exchange, and knowledge production. Instead of looking at laboratories simply as the places at which experimental information and strategies are generated, my concern is with the site -- partly symbolic and partly spatial -- at which the local coordination between beliefs and action takes place. It is a domain I call the trading zone.
\stopextract
The requirement of locally true definitions applies across the original trading zones between cultures and, more interestingly, to the various cultures of physics. In the interviews, I noticed some evidence for trading zones in the interview material. Specifically, when the various participants referred to the terms \quotation{raw data} and \quotation{derived data,} they seemed to be using a local definition of data that did not correspond with their own philosophy, strictly speaking. Instead, they were referring to various sensor products that were, indeed, \quotation{raw data} to every member of the team.
The extension of the Galisonian trading zone concept is not new to this research. In fact, business researchers have used the idea of trading zones and some sophisticated ideas of boundary demarcation for quite some time, going so far as to use graphs and knowledge maps (as opposed to my \SDFN) to identify different groups. Wilson and Herndl use this methodology when they describe their understanding of knowledge maps and trading zones\cite{Wilson2007}:
\startextract
The knowledge maps we created and shared with project participants encouraged cooperation and mutual understanding rather than the slash-and-burn rhetoric of demarcation events. When technical experts discuss the parts and subfunctions they have made, they get to describe their local practice, explain their knowledge, and open up their community-specific lexicon within the ecological relations of the boundary object. As they trace the lines connecting the boxes on the knowledge map, participants articulate communities of practice: each distinct but also connected through the boundary object. Because it is plastic and robust, the knowledge map balances the demands of identification and division in Burke's terms. As boundary objects, the knowledge maps help to create a rhetorical space that is best understood through Galison's notion of the trading zone.
\stopextract
This methodological description of their work is focused in the anthropological study of finding sub-cultures, rather than language differences. Despite this, my analysis produced many of the same results as theirs (methodologically speaking, if not with respect to content) because both my research and theirs tried to understand different philosophies/cultures with the metaphor of trading zones.
Just as they extend the concept of trading zone to consultants on the Washington beltway creating local definitions for a program already approved by the Pentagon, I extend the idea to how different philosophies of data interact. Their use of Graphviz-generated graphs as talking points to determine cultural ramifications matches my experience of using said graphs to generate philosophical insight:
\startextract
In the case we have been exploring, the knowledge map is crucial to the emergence of something like Galison's (1997) trading zone. Participants develop Galison's \quotation{possibility of communication and joint action} (p. 803) through the map as it emerges. The map continually structures how the team understands and explains the project.
\stopextract
The differences between their study and mine are quite pronounced, although just as they observed different groups creating temporary trading zones through the use of knowledge maps, I observed something similar with the pidgin concepts of \quotation{derived data} and \quotation{raw data.} While waiting for interviews, I saw researchers passing around sheets of paper with pictures of phenomena on them and referring to them explicitly as raw data. This practice almost certainly serves to inform the local definitions and create the trading zone necessary for successful research practice.
\subsubsection[Accent]{Evaluative accents}
An evaluative accent is the set of interpretive filters a recipient applies to incoming communication, thereby changing its meaning based on the biases applied by the recipient, shared understanding, and cultural mores. It was originally used to explore the effects of Marxist propaganda, but it can also be an interesting way to explore how trading zones operate effectively.
V. N. Voloshinov suggests an idea of an evaluative accent\cite{Volosinov1994}:
\startextract
Any word used in actual speech possesses not only theme and meaning in the referential, or content, sense of these words, but also value judgment: i.e., all referential contents produced in living speech are said or written in conjunction with a specific evaluative accent. There is no such thing as word without evaluative accent.
What is the nature of this accent, and how does it relate to the referential side of meaning?
The most obvious, but at the same time, the most superficial aspect of social value judgment incorporated in the word is that which is conveyed with the help of expressive intonation. In most cases, intonation is determined by the immediate situation and often by its most ephemeral circumstances. To be sure, intonation of a more substantial kind is also possible. ...
All six [uses of a single word in a removed quote] by the artisans are different, despite the fact that they all consisted of one and the same word. That word, in this instance was essentially only a vehicle for intonation. The conversation was conducted in intonations expressing the value judgments of the speakers.
\stopextract
Beyond the verbal intonation is context and use. In a more modern sense, people of different generations use what amounts to steganographic\footnote{Steganography is the act of hiding messages within other messages, where, only if you know the pattern or encoding scheme, can you identify the hidden message.
Social steganography is the use of shared context to provide polymorphic (meaning-changing) meanings to one's social communications, depending on context and other available social cues.
} encryption in their status messages, relying on context and the source of quoted material to produce different meaning for different people.
An instance of this social steganography appears in the following example to pass different meaning through a Facebook post to the subject's friends and mother:
\startextract
When Carmen broke up with her boyfriend, she \quotation{wasn't in the happiest state.} The breakup happened while she was on a school trip and her mother was already nervous. Initially, Carmen was going to mark the breakup with lyrics from a song that she had been listening to, but then she realized that the lyrics were quite depressing and worried that if her mom read them, she'd \quotation{have a heart attack and think that something is wrong.} She decided not to post the lyrics. Instead, she posted lyrics from Monty Python's \quotation{Always Look on the Bright Side of Life.} This strategy was effective. Her mother wrote her a note saying that she seemed happy which made her laugh. But her closest friends knew that this song appears in the movie when the characters are about to be killed. They reached out to her immediately to see how she was really feeling.
\stopextract
The use of \quotation{Always Look on the Bright Side of Life,} as Boyd discusses, is an example of a successful steganographic encoding of a message. Her friends could decrypt the hidden message because they shared a private context of culture with Carmen, a shared evaluative accent\cite{Boyd2010}.
Although that process is called social steganography, its unintentional practice causes gulfs in evaluative accent. Failed obscure jokes are an example of an incorrectly parsed communication. The obscure joke, in this case, relies on a shared commonality to be correctly anticipated by the recipient, and this mode of receptive listening is informed by the evaluative accent. In more common use, the language in a business memo may be so full of \quotation{business-speak} that someone who is not used to the company may misunderstand the provided references. This misunderstanding is especially deadly if it makes sense within the reader's incorrectly applied accent. As the statement can be parsed by the listener, only the mismatch of reality models in later conversations can hint at the source of the problem: the misinterpreted statement.
Voloshinov believes that the evaluative accent partially belongs with the speaker, but also that there exist \quotation{side bands} of communication, such as intonation and body language, that are specifically interpreted by the recipient of any communication, in the recipient's context.
Literal intonation has very little to do with specific constructions of data. Yet the term is used in everyday, technical, engineering, and scientific speech. The full weight of the evaluative accent, as seen in the interviews, falls into the use of context and role. While originally it was seen as way to frame ideologies; almost a post-hermeneutic way of explaining some failures of Marxism\cite{Voloshinov1929}. The idea of an evaluative accent can be combined profitably with the philosophy of the trading zone.
The idea of an evaluative accent corresponds well with the idea of a trading zone. The construction of misunderstanding the same language could hardly be the result of a simple linguistic misunderstanding. When a data-as-subjective-observation person says \quotation{data} to a data-as-objective-hard-numbers person, both of them are using a \quotation{functionally correct} definition, a definition shared by many people. They are encountering a trading zone. As Galison states, \quotation{In the trading zone, where two webs meet, there are knots, local and dense sets of quasi-rigid connections that can be identified with partially autonomous clusters of actions and beliefs.}\cite{Galison1997b} My identification of different philosophies of data certainly corresponds with these diverse beliefs. And those beliefs inform the evaluative accents that people use when they use and receive the term \quotation{data.}
While different uses of data may be boundary objects for more profound cultures,\cite{Chrisman1999, Collins2007, Gorman2002, Star1989} this minimal investigation can scarcely provide an anthropological look into the various research cultures in existence. I can present research into the intentional creation of a local language. The use of \quotation{raw} and \quotation{derived} as a semiotic prefix presents a linguistic indicator to switch evaluative accents to recipients who are {\em aware} of that indicator. The practice of forming local trading zones, by repeatedly presenting symbols to the other party in an environment where people are aware that bridging must occur\cite{Galison1997b}, is the non-ideological practice of causing sufficient cognitive dissonance in the recipients for them to \quotation{bud off} a new evaluative accent for interpreting the incoming sign-set.
The evaluative accent of the local definition of a word can be understood in the context of Peircian semiotics. As Aktin notes\cite{sep-peirce-semiotics}:
\startextract
In Peirce's theory the sign relation is a triadic relation that is a special species of the genus: the representing relation. Whenever the representing relation has an instance, we find one thing (the \quote{object}) being represented by (or: in) another thing (the \quote{representamen}) and being represented to (or: in) a third thing (the \quote{interpretant}).
\stopextract
The interpretant serves as the developed representation of meaning. Thus, we can understand local trading zones to be the product of evaluative accents present in the interpretant\cite{Chandler2001}.
Aims and Justification
\IST\ both suffers and benefits from its multidisciplinarity. One of the key tools taught in \IST-training programs is merely the ability to understand the jargon of the other sub-disciplines of \IST. As a database professional, I can still speak the terminologies of networking, web design, and enterprise systems. The process of communicating in these various jargons, however, necessitates different mental models of the world\footnote{Exploring computer problems and their solutions is an exercise in quickly changing levels of abstraction. Small, vital, and technical details fight tooth and claw against the broad vision of the designer\cite{Medin1989}.}. In addition, each of those individual levels will have its own uses for the term “data”. The field is not amenable to a single probe, and even if it were, each sub-discipline understands reality in its own way, as it must to solve problems according to the constructed protocols of that profession.
One thing underlying all of \IST, however, is the use of the term \quotation{data.} Every aspect of \IST\ uses data, but their understanding of what constitutes that data is significantly different. Moreover, in this computerized age, everyone interacts daily with data to some degree. The difficulty is in the question: what is data?
There is a need for to understand how people understand data because conflicting definitions of \quotation{data} inform communications. Peoples' inherent conceptions of data inform how they interact with the constructed data of the world\footnote{From a pragmatic point of view, many linguistic elements are socially constructed and our understanding of them is shaped by our linguistic interactions with other people. Data, being something categorized by humans, is a great example of a linguistic construction\cite{Berger1967}.}. Some people consider data to be objective facts, others consider it to be subjective observations, and still others consider it to be electronically stored signs\footnote{For more details, see the results section, page \at[Summary].}.
When people discuss data, information, and knowledge, their understanding of data informs their understanding of information and knowledge - be it synonymous, Ackoff's hierarchy from 1989, or any of the other hierarchies suggested by the literature\footnote{For more details, see page \at[Ackoff].}. When these different understandings collide, the best case is that the people involved recognize that they have different understandings and create a local trading zone with words that have functionally identical meanings to both people. In the worst case, both people use the term in the way to which they are accustomed, and errors go uncaught until large mistakes are made.
\stopcomponent
\startcomponent c_2_intent
\product prd_Chapter2_Justification
\project project_thesis
Data is defined by its use. It is a socially constructed term\footnote{While the term Data, as language is socially constructed, there are a large number of people who feel that the content of Data, as measurements of reality, cannot be so constructed\cite{Bruffee1986}.} rather than a reflection of some property of the universe. Therefore, data is subjective relative to the person using the term. I have identified a need to probe other peoples' understandings of data. It is easy to mistake professional training as a single, true, definition of data. The problem with intuitive definitions is that their elegance may not be used or tested in reality. For research to be useful to practitioners, it must deal with the philosophical problems that they face, not add another definition onto the large heap. This research aims only to provide a tool and a reason for practitioners to use that tool.
\startcomponent c_2_1_goals
\product prd_Chapter2_Justification
\project project_thesis
I want to help improve communications, and I believe that a means for understanding different constructions of data could be one way to do so that has not been thoroughly explored. It may offer a theory towards explaining some errors in intergroup and intragroup communication. Furthermore, it may offer some direction towards exploring the philosophical basis of error by offering another take on system maps transmitted through communications\footnote{A system map is simply a person's internal mental model of how a thing operates and of how to get it to transition between different states. These maps may be communicated through instruction or alluded to\cite{Roy2008}.}.
I will, while exploring our ability to define and communicate data to people around us, lay a foundation for the exploration of the reality created by our use of data in computer systems. Our systems use data in multiple levels, from the hardware and simulated hardware through software and into fantastic constructions and games that embody and then produce their own data from any philosophical meaning. This study will not explore the various sub-constructions of data present on the Internet, in games, or in virtual worlds. Nevertheless, I hope that the methodology I create and validate can be applied to all sorts of computerized data: from the traditional bits down a wire into a simulation of a physics experiment inside Second Life\cite{Brown2008}. As humans use and create all of these tools, our constructions of reality inform them. To serve that end, this research creates and tests a methodology that can probe peoples' understanding of data.
In database design, the hardest task is trying to understand the client's reality. Modeling a current organization's memory structures, its files and paperwork, and the relations among them in the minds of practitioners is an extremely difficult task. To facilitate understanding, this methodology is a tool for designers. The tool may allow them to understand what their clients think data actually {\em is}.
By understanding the type of data being modeled, database designers make two significant gains. First, their data models can correspond with how their clients think about reality, and thereby create intuitive relationships and map the computerized model to their client's mental model more capably. Second, and in some ways more critically, they can then explain the database design {\em to} their clients in their clients' language, potentially shortening design times by reducing miscommunications.
In the same way, the proposed methodology should help extend normal modeling practice: simply making designers more aware of the different types of data constructions may make more responsive designs possible. The demonstration of different conceptions of data is important to designers because it offers another meta-aspect of reality to be captured and incorporated.
I also want to create a method that can help extend \HCI\ design practice. This methodology should be applicable to all sorts of design, as it is a tool for rendering clients' realities and not a specific kind of technical reality. The discovery of practical meaning of terms, ideas, and affordances\cite{Norman1999} of data is another tool with which HCI designers can understand how to render data presented in an interface. A tool that can make elements of private jargon explicit, and that is focused on that task (rather than treating it as a happy byproduct) can significantly contribute to the HCI design cycle.
This research investigates individual constructions of data, because there is no clear consensus on the exact nature of data, much less on the exact nature of data in technical design. However, as there is no recognized domain of the philosophy of data, this research, as a more practical matter, must lay the simplest foundations for that multidisciplinary field.
My basic discoveries, both methodological and philosophical, should have pragmatic results. I hope to create a methodology that improves communication and database design. I explore how we socially construct and use the term \quotation{data}. From this invesitgation, I can offer potential insights into how we create trading zones between different cultures of data use. While true understanding of the nature of data may be outside the scope of this present research, the construction of a foundation is not. Any methodology created must be robust enough to provide useful observations and a compelling story.
\stopcomponent
Methodology
\startcomponent c_3_
\product prd_prd_Chapter3_Methodology
\project project_thesis
The field of requirements generation is heavily overpopulated with methodologies. These methodologies on the whole, generally presume tha the participants are attaching similar meaning to the terms they use, especially when the terms are seemingly uncomplicated ones like “data”, “information”, and “knowledge.” The Social Data Flow Network is an elicitation metdhology that can be used prior to normal requirements generation. This methodology helps to map the shared and unshared components of a group’s social construction of reality as it relates to data flows.
The \SDFN\ has its methodological roots in the data flow diagram (\DFD). Information technologists currently use a \DFD\ as a tool for probing current data flows within an organization. I have designed the \SDFN\ as a compliment, allowing a pracittioner to uncover an individual's subjective constructions of data, one unburdened by the methodological constraints of the \DFD. The \SDFN\ can be used as an artifact for sparking discussion around practical definitions without the investigator having to enter the interview and ask participants about their personal constructions of data directly. Once data has been characterized by the participant, other requirements generation methods can then be employed to extract a formal understanding of what is needed, paying special attention to where different individuals understand the components of the same process differently.
By allowing people to probe their own constructions of data, the \SDFN\ helps them to express their own understanding in their own language without worrying about being judged incorrect. By creating a sense of cognitive dissonance\footnote{While Festinger's original work is important here, I believe that this model represents the satisfaction of constraints imposed by the categorization of terms as per Shultz\cite{Festinger1957, Schultz1996}. } between the participants' application of categories and their theoretical definitions, the methodology discussed in this section will serve as a way to illuminate how people understand the nature of data. It seems quite feasible to extend this methodology to other research endeavors.
\stopcomponent
\startcomponent c_3_SDFN
\product prd_prd_Chapter3_Methodology
\project project_thesis
This chapter documents the methodology I used in conducting the interviews with the company. This section is organized first into definitions, an exploration of the \SDFN\ as a concept, and then a practical discussion of running an interview centered on the \SDFN.
\stopcomponent
\startcomponent c_3_terms
\product prd_prd_Chapter3_Methodology
\project project_thesis
This section will introduce the major terms of the \SDFN\ and how those terms are used. The introduction of a new methodology, especially one borrowing from many different fields, is fraught with definitional dangers.
An entity is a noun: a role that can manipulate data. A flow is a noun, representing the {\em flow} of communication or symbols between entities. An entity dictionary is a way of brainstorming entities to make the participant feel more at ease.
\stopcomponent
\startcomponent c_3_entity
\product prd_prd_Chapter3_Methodology
\project project_thesis
An entity is something that plays a role receiving, manipulating, or transmitting data. In the \SDFN, this act of input or output is represented by a noun described in a few words, which then have an oval bubble drawn around them. This bubble is a node in graph theory, with all of the corresponding attributes. The nature of the role is not restricted to a person or physical entity. It is anything that can be made-a-thing-of such that it makes an independent manipulation of data.
Roles are anything that can be conceptualized as an independent manipulator of data. What differentiates an interesting role from one worth skipping over is whether the role somehow transforms data passing through it. There is no restriction on the number of roles that can belong to one person or thing. Just as one person can do multiple jobs, one can also have multiple roles. One hypothesis for future testing is that the role determines the perceived affordances of data. Every role has its own unique activities and therefore uses data in its own way. As the framing of the role changes, the definition of data may change along with it.
Participants should never describe themselves as a singular entity due to ambiguity. Their description of an entity as \quotation{self} is ambiguous to other people reading the chart, who do not understand the tacit assumptions of role and interaction from the same perspective as the participant. Instead, participants should articulate the potentially many roles that they play in an organization as separate entities. While participants self-articulating roles adds a certain artificality to the interview, the self-identification of roles also allows participants to adopt some of the framing of those roles. It is thought that the increased precision gained from artifical role distinctions is worth the contrived nature of the process.
Every role should be unique. However, there is no requirement for a one-to-one mapping from person to role. Because people and things are adaptable and can serve many roles in an organization, artificially forcing the participant to select one and only one definition of self would be contrary to the intent of this exploration. This requirement allows and encourages people to represent passing information to themselves in the guise of the different roles they play.
The avoidance of ambiguity is crucial. It makes the \SDFN\ easier to interpret by other people and it forces the individual creating the diagram to define the nature of the entity explicitly. It is far too easy to use the self as a catchall to avoid the cognitive dissonance of thinking about thinking. It is important to document discrete and unambiguous roles, even though they may map to the same person, because it is the role that understands data, not the person. Furthermore, these different roles-as-self can pass data to one another. I used the following example in the interview: An entity as lecture designer (myself) would pass requirements to the database lab developer (myself) who would pass data to the lecturer (myself). Each of those roles has different requirements for the nature of data. Crucial insights would be lost if they were all collapsed into one entity with self-pointing flows of data.
Entities however, should not be ready-to-hand\footnote{Ready-to-hand roughly means tools that form an unconscious extension of the self. However, I will avoid a discussion of Heideggerian Daesin and other terminology. To learn more: Dreyfus's discussion of Heidegger is not too painful (p.230 for ready-at-hand) and Marshall is using the idea in interface research\cite{Dreyfus2004, Marshall_2003}.}. Devices that take on independent roles are fundamentally different from those that function as parts of another entity. The keyboard used for typing these letters into this document should not be considered an entity in the \SDFN\ sense because except when engaging in this self-reflexive behavior, the entity \quotation{author of dissertation} does not explicitly pass data to the keyboard-rather, the \quotation{author of dissertation} passes data to the computer for processing. The keyboard is part of a large entity and does not manipulate data in my own construction of data. Instead, as an input device, the keyboard is an extension of the computer and represents an interface for the electronic recording of symbols.
At the same time, entity creation rules should not be strictly enforced, as each person may have his or her own conceptions of what an entity could be. A role can be a person, machine, place, or group. An entity is any noun that the interviewee regards as accepting or receiving data meaningfully. Participants must define their own entities, as their own conceptualization of roles is one of the strongest sources of insight into their understanding of their construction of data.
\stopcomponent
\startcomponent c_3_flow
\product prd_Chapter3_Methodology
\project project_thesis
A diagram consisting solely of entities, known as an entity dictionary, is not particularly useful. To represent relations among these nouns, however, we need flows. A {\em flow} indicates a transfer of something between one entity and another. We are concerned with the nature of the transfer instead of the act of the transfer, a verb describing how the transfer is accomplished is not particularly useful during categorization.
This expressed relationship, usually, will be a self-categorized flow of data, information, or knowledge. These flows are edges, represented by arcing lines between one or more entities, although most flows link one entity to another, singular entity. There is little objectivity in these indications of relationship. A flow represents a documented relationship, instantiated from the recipient's understanding of reality, not necessarily a true thing in the shared reality of all participants. As the \SDFN\ is intended as a tool for exploring constructions of data there is little need to find a design that corresponds to the real world and the stakeholders' needs. On the contrary, the subjective expressions of reality can be compared against each other to identify where areas of miscommunication arise.
Practically speaking, flows must be represented as arcing lines between one or more entities. The arc allows readers to differentiate the labels of each flow, with a clear distinction between the over and the under component. Recursive flows, which link an entity to itself, are discouraged, as they tend to represent ambiguous and broad entities. Participants selecting recursive flows should be encouraged instead to delineate the starting and ending roles as entities more clearly.
Each flow has a label and a category. The label describes the content of the flow. The category relates the flows to other flows and ideas in the diagram. Each flow, above the arc, should be labeled with the {\em contents} of the flow. The label is a one- to three-word description of the \quotation{stuff} being transmitted. This description must be unambiguous and unique to the contents of the flow. If two entities are transmitting the same content, care should be taken to ensure that the exact same thing is being transmitted. Minor content variations should be indicated by adding adjectives or other modifiers to the name: \quotation{Results} becomes \quotation{Summary of Results} and \quotation{Formal Results.} Each label indicates a result being transmitted, but the different nature of the things changes the understanding of the thing. Reducing ambiguity is the job of the interviewer and is one of the hardest parts of conducting a \SDFN\ session.
For practice in clarifying the nature of results and in the type of thinking needed to conduct an \SDFN\ session, I recommend the game called \quotation{Zendo} by Looney Labs\footnote{The rules of Zendo can be found here: http://www.koryheath.com/games/zendo/
The essence of the game is that players, through the use of transparent colored pyramids, must use inductive logic to find a \quotation{secret rule.}
An example of a secret rule is \quotation{A [set of pieces] [is true] if it has at least one green piece.} And through rating constructions of their own true and false the leader of the game describes a universe of discourse with the secret rule as the governing element.
The critical element of the game, for purposes of this research, is that the leader of the game must, by the rules, refine ambiguity from any guesses the players may make. \quotation{Clarify the Guess: If the Master does not fully understand your guess, or if it is ambiguous in some way, the Master will ask clarifying questions until the uncertainty has been resolved. Your guess is not considered to be official until both you and the Master agree that it is official. At any time before that, you may retract your guess and take back your stone, or you may change your guess. If any koan on the table contradicts your guess, the Master should point this out, and you may take back your stone or change your guess. It is the Master's responsibility to make certain that a guess is unambiguous and is not contradicted by an existing koan; all Students are encouraged to participate in this process.} The process of clarifying guesses to eliminate ambiguity is exactly identical to clarifying entities and the labels of flows. Besides being a fun game, it is crucial practice to get a feel for the level of precision required in the SDFN. }. Specifically, if one can run multiple sessions of the game successfully, the same skills in clarifying statements and assessing the nature of things will be used in this methodology.
The core of the \SDFN\ is the process of categorization. The \SDFN\ encourages participants to discriminate and categorize flows. By relating different flows through the use of category, it is then possible to induce the definition of the category through its flows. The category should be written under the arc. In computerized renderings, the over/under distinction is less important, so long as the label and category of the flow are clear.
When creating the flow, the participant should first be prompted to label and then to categorize the flow according to a pre-formulated short list of categories. This list of categories should contain the most common expected categories of participants. By prompting the participant with a list, the interviewer focuses the categories on the topic of the participant's choice. However, participants should always be able to add their own categories to this list. For example, in the interviews, I always prompted participants with \quotation{Data, Information, Knowledge, or Other.}
There should always be the option for Other. But the Other category should never remain as Other; the participant should name it. Some participants used categories such as Emotion, Wisdom, or Request. In no case was a flow allowed to remain Other. These new categories were created on the fly and used as part of that participant's diagram from then on.
At the same time, people should not classify their own domains without any initial guidance. All but the most self-reflective will be paralyzed by the many choices available and not entirely clear on the distinctions the interviewer wants them to draw. Thus, my question took the form of \quotation{Data, Information, Knowledge, or Other} rather than \quotation{How would you categorize this flow: data or not-data?} Denoting sample categories creates a negotiable universe of discourse for the categories.
\placefigure[]
[fig:flow]
{A trivial SDFN used to illustrate the idea of “flows” and wormholes. Crossed lines become unbelievably messy, and so the \quotation{wormholes} are a far better method for routing lines across other lines.}
{\externalfigure[Chapter3/SDFNWormhole.pdf][factor=fit,frame=on]}
If a diagram becomes too crowded, it is quite acceptable to make \quotation{wormholes} on the paper design during the interview. A wormhole is some symbol (usually an *) and accompanying identifier\footnote{A character, number, or unique symbol all serve well.} placed more than once on the paper. Each symbol sharing an identifier should be considered connected, which may allow for easier routing. In extreme cases, a wormhole may have one arrow leading from it to represent the inbound connections of all of the flows connected to the other wormhole. The only real restriction is that the creation of wormholes should be unambiguous and clear both at the time of creation and afterwards. Sometimes, in the case of major changes, it is better to redraw the design than to use too many wormholes. This action is sometimes quite desirable, due to the edits that the participant may introduce in the entities, flows, or topology on the second draft.
\stopcomponent
\startcomponent c_3_ED
\product prd_prd_Chapter3_Methodology
\project project_thesis
Entities and Flows are the core parts of any \SDFN\ diagram. However, not all participants may have the ability to easily understand the nature of entities. For that reason, and as a precursor to group-based \SDFN\ creation, I engaged some participants with creation of an entity dictionary, a simple list of entities that may be involved in the \SDFN.
An entity dictionary is a simple, non-authoritative, brainstorming device in case the participant is unsure about where to start. Instead of starting the \SDFN\ with two entities and a flow connecting them, I will encourage the participant to imagine all the different entities with which they engage on a daily basis, to name them, and to describe their roles. The immediate feedback, both positive and negative, on each described entity teaches participants to think in terms of roles. Once they have filled a page, most will have internalized the meaning of entity.
Through the creation of this entity dictionary, a number of interesting themes will appear, based on participant enthusiasm or repetition. I was especially careful to pay attention to offhand comments about entities or the participant's work during the creation of the dictionary, as these comments will most likely indicate interesting topics for the interview. The dictionary should be started by encouraging each participant to name an entity that represents them in some role, and then the scope should be gradually broadened to things and people they work with.
To those familiar with the \DFD\ methodology, the idea of the entity dictionary is almost completely opposite to that of the \quotation{data dictionary} of the \DFD. While the Data Dictionary is a device for the accurate specification of data in the data flows, compiled during and after the creation of the diagram, the entity dictionary is a piece of scaffolding designed to help participants think the right way about entities.
Unlike a data flow diagram, the entity dictionary is not authoritative\footnote{Authoritative: a canonical listing and extremely precise description of the structure and components of variables.}. In a \DFD, all flows must be decomposed\footnote{Decomposed: simplified by breaking the components of a flow (or transformation) apart into separate components. An example of a decomposition may be an \quotation{Address} flow, that is subsequently decomposed into 4 flows \quotation{street address + city + state + postal code} In the same way, a transformation can be decomposed. \quotation{Mail a letter} can be decomposed into \quotation{Look up address -> Find Zip Code -> Assess Postage -> Attach letter} } to their atomic definitions\footnote{For example, a \quotation{string} is defined as a "series of characters from a to z and A to Z as well as numbers, spaces, and punctuation. This level of excruciating detail is necessary for accurate implementation in a computer.}, which correspond with database or programming structures. This requirement exists because the \DFD\ has its roots as a programming design, and therefore must be able to explicitly define the data structures of a program. Because the \SDFN\ is probing a non-computerized theoretical area, the requirement of precision is unnecessary and counter-productive, as it distracts the participant from their task. The object of the \SDFN\ is to probe functional definitions, not to have all participants arrive at the same constructed definition of the \UoD.
\stopcomponent
\startcomponent c_3_create
\product prd_Chapter3_Methodology
\project project_thesis
This section will describe the process of creating a \SDFN\ in full. This describes the methodology used in the study presented in the next chapter. In brief, the \SDFN\ begins through the explanation of terms, a summary of the ideas expressed above. If participants do not understand the nature of entities, an entity dictionary should be created. When participants understand entity and flow, a topic is chosen and the diagram is created. After the creation of the SDFN, it is used as an artifact into a short open-ended interview for self-reflection.
\stopcomponent
\startcomponent c_3_starting
\product prd_Chapter3_Methodology
\project project_thesis
The terms and nature of the \SDFN\ should be gently explained to the participant. If they seem unsure, engage in the brainstorming tactic of creating an entity dictionary before proceeding. From conversation during the introduction, the participant and interviewer should agree upon the topic, referred to as the universe of discourse\footnote{The universe of discourse is the bounded realm under investigation. }. The topic should be drawn from the participant's common work experience, to give them sufficient memories to draw upon. The pre-defined topic sets the limits of exploration and acts as a \UoD. When those limits have been reached, the \SDFN\ is completed. During the interview, avoid using terms like \quotation{universe of discourse} or even \quotation{ready-to-enhand} because the jargon only distracted from the topic.
Instead, in the small chat at the start of the interview, ask them to speak about their job. Then ask open questions about \quotation{interesting} aspects of their job, ones that involve data in some way. This casual conversation is crucial for reassuring the participant and steering the direction of the discussion. If the small talk was not enough, move to the creation of a heavily scaffolded entity dictionary, to see what entities they are most interested in, and thereby define the universe of discourse.
Participants begin by describing or selecting two entities within the universe of discourse. One entity should be a role associated with the participant for ease of imagination and the other can be anything with which the participant interacts. Quite a lot of prompting will generally be necessary during this first interaction. Prompting should take the form of open-ended questions, guiding the participants to first establish their own roles as entities. Once they have described themselves, they should identify the role they interact with as another entity, and then be guided into describing a flow between those entities. Through the use of guiding open-ended questions and the interviewer serving as scribe, each participant should create the entities, flows, label the flow, and then categorize the flow. The interviewer should never label or categorize anything.
This process of identifying flow and entity should continue in an iterative loop until the interviewee starts tiring. Generally, the topic will be sufficiently broad for a 30- to 40-minute diagramming session. If you end earlier, repeat with a different topic in a new diagram. After identifying the first pair of entities, however, the order changes. The subject should be encouraged to identify a flow first, and then to add entities as necessary. Participants should only create one new entity at a time, and then try to relate other entities to that one.
The graph, for purposes of clarity and ease of expression, should remain at least weakly connected\footnote{Roughly speaking, a weakly connected graph means that every entity must somehow be attached to all the entities present in the SDFN. While \quotation{subgraphs} (groups of entities not connected to the rest of the graph at all) are possible, they tend to increase confusion and should be dealt with separately.
\quotation{A directed graph is called weakly connected if replacing all of its directed edges with undirected edges produces a connected (undirected) graph. It is connected if it contains a directed path from u to v or a directed path from v to u for every pair of vertices u, v. It is strongly connected or strong if it contains a directed path from u to v and a directed path from v to u for every pair of vertices u, v. The strong components are the maximal strongly connected subgraphs.} Wikipedia -- http://en.wikipedia.org/wiki/Connectivity_(graph_theory)
}. Isolated subgraphs should be moved to their own papers and explored as completely different graphs. Separating the graphs can create distance between the topics. This distance allows one topic to be completed and then another role to be assumed when talking about the other topic.
The interviewer has a number of tasks during this process. He or she should provide enough scaffolding\footnote{Scaffolding: structured guidance to the participant to reduce choice paralysis and help direct them to the correct actions in the circumstance. Different people need different amounts of scaffolding, and it can be progressively removed as the participant learns what they need to do. The metaphor is well discussed by stone in relating to children's learning, but can also be applied to interface design\cite{Stone1998}.} so that the interviewees feel comfortable in suggesting their own flows and entities. This will require progressively less scaffolding as the first few entities will provide both positive and negative feedback. It is vital to gently clarify incorrect entities and flow descriptions the moment they are suggested. The interviewer must insure each flow added is unique, understandable, and directed. Although correcting flows after the fact is encouraged as the participant refines his or her terms and understanding of the diagram, ambiguity must be caught immediately, before it can sabotage the \SDFN.
The process of clarification can be seen in this transcript:
\startextract
Interviewer: What other flows are there?
Participant: Well, it just sends back results.
Interviewer: Same results or are these results different from these results?
Participant: They are different. But not in nature. Just in ... obviously, I'm not going to take every result I take from the code and send it on. Because that would be ridiculous.
\stopextract
When a participant uses an entity word in a different way, it is important to catch the usage and ask questions about it. Clarifications also serve as negative results, as \quotation{what do you mean by that?} changes their mental term for an entity as the term is refined. In contrast, simple and low-key responses like \quotation{Cool, so would you classify that as data, information, knowledge, or other?} are positive feedback, indicating a mild approval and acceptance of the concept. In the beginning, it is better to be more detailed about the nature of entities so that, by the end of the interview, the labels on flows and entities are just flowing naturally.
This iterative approach is also useful as it saves significant and boring theoretical explanations at the start of the interview, which may bore the participant, make them hostile (as some do not like being explained to), or be redundant because they're not listening anyways.
The objective of the \SDFN\ is a page or two of bubbles connected by arcing flows\footnote{See figure \in[fig:flow].}. This paper graph can be trivially digitized in Graphviz. Graphviz is a graph layout program that accepts a text description of the desired diagram and then renders it graphically. The application of Graphviz to the problem saves significant post-processing time in labeling and diagramming flows. Although this research was rendered in Graphviz on Linux workstations, any program that can render graphs can be used for post-processing.
Post-processing involves roughly three steps. First, in one file, describe the list of entities and the relationships among them\footnote{See appendix B page (\at[AppendixGraphviz]) for code.}. Entities should be defined first with distinct labels. The distinct labels are very useful because they provide a way to ensure the quality of the subsequent graph. Graphviz is quite permissive with entities. Typos in entity names in either the entity or relationship section will be happily accepted as valid input by the program. Identifying unusual entities that are not expected on the final output is a great way of checking for node validity.
Edge validity can be checked by counting the total number of edges of each entity\footnote{Starting from the top of an entity, make a tiny mark at the edge chosen, then circle clockwise around it, counting each edge.
The count should be the same for the entity on paper and the entity rendered in the computer-based visualization. This practice is more effective than counting every edge in the diagram at one time because when the count is off, it is easier to figure out what edge is missing, but faster than comparing edge by edge.
}. There should be a 1:1 relationship between the paper level of connectedness and the diagram. By counting the number of edges around each node and comparing that total to the original graph, one can trace errors in the diagramming to specific entities and then fix those errors.
\subsection{Running an Interview}
This section will discuss the necessary items and methodology for running the interview. There are only two physical requirements: good paper and two good recorders. A backup recorder is essential because these interviews are impossible to duplicate: as people resolve their internal cognitive dissonance throughout the interview, their answers change. It is therefore impossible to re-run the interview, though running follow-up interviews tends to be quite fruitful. It is important to prepare for all the ways in which an interview might fail. A repeated interview covering the same ground should instead focus on a discussion of categorization choices on the interviewee's already completed \SDFN. The \SDFN\ should have clarified their internal thinking as to what their personal construction of data was so all that remains is to re-record their ideas.
During my interviews, I used a mini-recorder and my laptop. The laptop, despite being large and distracting, served as an excellent recording device because it recorded directly in the audio post-processing program Audacity. Audacity is highly recommended both as an interview-recording program and as a sound post-processing program. It is important to process the recordings before transcription due to inevitable background, A/C, and RF noise. Phones should be turned off during the interview as they generate inordinate amounts of RF noise that can severely corrupt the recording.
Ease of access is a function of recording availability and limits the utility of many mini-recorders. Extracting recordings from some recorders involves considerable effort and requires proprietary software and cords. It is important to test the full extraction process from all of the candidate recording devices before engaging in an interview. If it is not easily feasible to extract common file formats from the device, select another device. Optimally, the device will produce an mp3 audio file, as that is the {\em de facto} compressed audio standard. Voice, being easily compressible, is a perfect fit for mp3, and many hours of recording can be stored with ease. An earlier uncompressed format (wav) is also suitable, being compatible with any modern computer. The wav file sizes are, however, much larger. Before the interview, make sure there is sufficient space on the devices for twice the estimated interview length.
Paper selection is significantly easier. A large pad of paper is sufficient, though higher quality pads are desirable as they will tear less easily, and absorb the ink from the pens. Fast drying pens are preferred, though any tip will work. Each sheet on the pad should be labeled as it is used with the number of the interview, the page count, and the date. In case the pages are arranged out of order, this information is sufficient to reconstruct the drawing order and interview.
\stopcomponent
\startcomponent c_3_timing
\product prd_Chapter3_Methodology
\project project_thesis
The interviewer should allocate around 15 minutes on both sides of the interview for equipment preparation. During the interview, another 10 to 15 minutes should be spent on breaking the ice and making the interviewee comfortable. Creating the \SDFN\ will take half an hour to an hour, depending on how complex a diagram the participant desires. Although it is theoretically possible to compile the answers for a \SDFN\ diagram very quickly, try to encourage the participant until either a page is filled or he or she is clearly horribly lost.
A subsequent discussion, once the \SDFN\ has been completed, is completely optional. Some of the people interviewed want to reflect, whereas others, uncomfortable with the procedure, do not. Due to this huge variation, there is no standard duration for this discussion, because it can go as long as the participant would like it to go. It is unusual, however, for it to go more than half an hour. If the participant is still interested after half an hour, attempt to schedule a second, follow-up interview. The \SDFN\ creation tends to be quite draining, and new insights may appear after a few days off for internal self-reflection.
Preparation is fairly trivial with enough pre-interview logistics work. It is important to have liquids and treats for both parties available, as there will be a significant amount of oral discussion. In the meeting area, try to position the discussion around a corner of a table. Having large separation between the interviewer and participant is contraindicated on both theoretical and practical levels. Theoretically, it is a bad idea to introduce any sense of distance or remoteness, as it will just increase the difficulties of icebreaking. Practically, the sheet will change hands many times, and a short distance will allow both parties to read edits and additions as they happen. Normally, the interviewer will serve as scribe to render the participant's descriptions in a common and consistent format. The participant should nevertheless see what is being scribed in real time, to offer feedback and corrections of his or her own thoughts.
The final element of preparation is to ensure the operation of both recorders. The recorders should be positioned out of the direct line between interviewer and participant. If possible, they should be positioned to pick up the participant clearly and isolated from the table to reduce the thumps and scratches transmitted by the table. Recording devices will pick up hand movements and emphatic gestures that hit the table depressingly well.
Have some sort of subtle timing device to ensure the interview is proceeding according to schedule. Make sure that it is possible to look at the timer without disrupting the concentration of the participant. A watch in this regard is a poor choice, as looking at a watch is a social cue for many people. Cell phones pose a similar problem (and in any case, they and other radio devices should be off during the interview to prevent transmission interference.) Try to record at least 30 seconds of silence before the interview begins, and turn off the option on the recorders to not record white noise, because those measures will help with post-processing operations.
\stopcomponent
\startcomponent c_3_conducting
\product prd_Chapter3_Methodology
\project project_thesis
The interview necessary to explore a participant’s individual construction of data has three phases. Initially, the interviewer should collect demographics through the introduction. The introduction is primarily a means to diffuse anxiety and to gain the critical basis for comparison between parties. The second phase is that of constructing the \SDFN. The \SDFN\ exposes the practical understandings of data of the participant through repeated categorization. The final component of the interview, the denouement, tends to be a discussion of the participant’s understanding of data uncovered by the \SDFN. As the process can cause a construction to change as it is articulated, this self-reflection period is an excellent opportunity for the participant to air their thoughts and describe their new or revealed understanding.
\stopcomponent
\startcomponent c_3_conducting_intro
\product prd_Chapter3_Methodology
\project project_thesis
The introduction serves multiple purposes. Primarily, it diffuses anxiety, explains the background of the subject, and creates a scaffold for the intuitive prompting of the \SDFN. In these interviews, people display many different sources of anxiety.
The most common is a sort of performance anxiety, wherein they do not believe their opinions are sufficiently privileged to describe their \quotation{understanding of data.} Another common difficulty is job anxiety. Participants may feel that they are revealing secrets of their job to an outsider who, either as a spy for management or for some other reason, would steal the secrets to the participant's detriment. It is vital, in this stage, to reassure the participants of the intent of the interview and to make them feel in control-as they in fact are.
The other goal of the introduction is to provide the interviewer with an understanding of the background of the participant. This background understanding will provide for demographics and will hint at the topic of the \SDFN\ diagram. By investigating their work and educational experience, it is possible to gather data regarding any possible links between work, education, and their understanding of data. Understanding educational background is also important because it shapes the nature of the jargon used, and is an explicit way of changing vocabulary.
As the participant discusses their work experience, especially in relation to their understanding of data, incidents that are important to them will arise. By drawing out these incidents for any significance of data flows, one can choose a topic for the \SDFN\ that is both engaging to the participant and a fruitful for examination during the \SDFN. If repurposing this methodology for other tasks, at this point the task-specific goals should be emphasized, because by choosing a topic for discussion, the participant is implicitly assuming a role and engaging in a particular mindset.
After the participant engages in the discussion, it is important to explain the nature of the \SDFN. Lightly explain flows and entities, the purpose of the diagram, and the nature of categorization. This explanation should be far less philosophical than even the descriptions presented above. A flow, to participants, is \quotation{any flow of data, information, or knowledge between one entity or another}; an entity is \quotation{a person, place, or thing that can interact with the flows.} This is a significant point of divergence for participants. Some people will understand the nature of entities quite clearly, as shown by their body language, and others will not. If it looks as though the participant does not understand, correct that problem by building an entity dictionary. The discussion of categorization should explain that, \quotation{The content of the flow will go above the flow. Content is roughly what is flowing between the two entities. Then I'll ask you to categorize the nature of the flow, whether it's data, information, knowledge, or other.}
If the participant is confused about entities, help them to create an entity dictionary. Ask them to describe typical entities from their workday and to describe themselves in various roles. Then ask them to describe other roles and things with which they work. The building of the entity dictionary provides the maximum scaffolding for teaching them about the nature of entities.
\stopcomponent
\startcomponent c_3_sdfn_build
\product prd_Chapter3_Methodology
\project project_thesis
After the participant was comfortable with the topic, and an entity dictionary was built (if appropriate), the \SDFN\ began. I avoided asking for definitions of data to avoid contaminating their categories with half-remembered definitions from their educational days.
The \SDFN\ is designed to encourage participants to intuitively define their understanding of data. Classification is a way of probing operational (rather than theoretical) understanding. Repeatedly confronting people with their \quotation{gut reactions} creates a cognitive dissonance\footnote{See Schultz for a theory of cognitive dissonance\cite{Schultz1996}.} between the theory and practice that the participant will articulate during the process.
It was important to engage the participant as a subject-matter expert. The \SDFN\ should explore a safe topic within the subject's expertise. A project, a process, or everyday interactions are excellent topical areas, as long as the participants have a strong familiarity with the domain. The choice of topic is important because it empowers the participants. Their experience in the domain reduces their uncertainty and fears of being wrong. Explaining what you do every day and are good at to someone who is interested and willing to listen also tends to be pleasant for most people, because of the validation\footnote{Validation is a confirmatory statement that increases a person's self-worth\cite{Leary2005}.} inherent in the discussion.
\stopcomponent
\startcomponent c_3_sdfn_method
\product prd_Chapter3_Methodology
\project project_thesis
\placefigure[]
[fig:flowchart]
{This is a flowchart exploring the complete SDFN interview process. Different portions of this chapter will refer to different elements.}{\externalfigure[Chapter3/SDFNFlowchart.pdf][factor=fit,frame=on]}
When constructing the \SDFN, I acted as primary scribe. While the participant should have access to a pen so that he or she can scribble corrections, the interviewer will do most of the drawing. Because the activity of the \SDFN\ is to iteratively construct flows of \quotation{data, information, and knowledge} between entities described by the participant as the subject-matter expert, this section will discuss the structure I provided to participants.
Describing a flow always began with entity declaration. The participant declared which two entities the flow is between and then declares the flow itself. My prompting for categorization changed throughout the interview. Initially, the questions were quite explicit. \quotation{Who starts the flow? What do they do?} In this high-scaffolding variant, I explicitly identified the source of the flow and then guided participants to identify the destination and then the nature of the flow. By reducing the focus of the question to the smallest possible parts, I helped the participant not to feel confused by trying to think about too many unfamiliar things at once. Early in the interview, breaking the questions into tiny sub-questions allows for prompt feedback. As the participant learned through positive and negative reinforcement, their awareness of expectations reduced the need for tiny sub-questions.
I mentally examined an entity before committing it to the diagram. When validating the source entity, my decision tree was simple: Does the entity exist? If it does not exist, is the role that the participant described short and unambiguous? Entities must have short names, as names take up valuable space. If the entity bubble is more than about an inch in diameter, it takes up too much space and is likely to require redrawing the diagram.
I encouraged the participant to generalize to the point that the entity can reasonably cover all objects in its class. \quotation{Person} is too vague; it does not give the reader any sense of what role the person occupies. \quotation{Brian} is both too vague and too specific: it identifies a specific person, but does not suggest his role (thereby requiring more explanation) and does not allow for similar types of person. Good entities would be \quotation{Dissertation Author} or \quotation{Casual Reader} or \quotation{Examiner.} They can be generalized to one and only one role; they are simple (few words), and they allow for anyone who fits that role to be classified without unnecessary recourse to edge cases. By using examples from my personal experience, I was able to consistently give the same scenario example across multiple interviews, without looking as if I was reading from a script.
With the source constructed, the participant needs to define the destination using the same methodology. Later, the scaffolding can mostly be withdrawn and the entire process summarized with an \quotation{And then?} as the participant understands what is being asked. The transition should be gradual, rather than abrupt, and is predicated on the error rate of the participant. If the error rate increases, I increased the scaffolding to compensate.
Once the entities are identified and written, the participant should describe a flow. Using an arcing path with a clear \quotation{above and below,} connect the two entities. These arcing lines, representing flows, are oriented to the entities, not the page. I rotated the page if it allowed for cleaner arcs and more space above and below for description.
With the role description in mind, the participant was asked to note what the flow contains with a theme and variations on: \quotation{This is a flow of...} I made sure, when asking these questions, that they were as open-ended as possible. Flows also must be unambiguous. Ambiguity may be introduced via the introduction of other, similarly named flows. When a flow is described, check all the extant flows for identical and similar names. In the case of an identical name, inquire whether the same contents are flowing. If the names are similar, make sure there are enough adjectives around the flow to distinguish the two. I edited the prior flow if it makes more sense to do so.
When the flow has been described, and the noun (with or without adjectives) has been written above the arc, then prompt the participant to categorize the arc as data, information, or knowledge: \quotation{And is this flow of x data, information, knowledge, or other?} Unusually, this element of scaffolding is never completely removed, although it may be shortened as appropriate if the participant has already categorized things of its nature on the diagram.
\stopcomponent
\startcomponent c_3_sdfn_theory
\product prd_Chapter3_Methodology
\project project_thesis
Having spent at least a productive half an hour on the \SDFN\ building, participants were then encouraged to conclude the session with a theoretical discussion on their own revealed constructions of data. Not all participants will want to have this discussion, and beyond a vague prompt of \quotation{And would you like to talk about your thoughts on the philosophy of data?}, it is not worth the effort to force reevaluations here. This discussion generally explores the ontological and epistemological questions and novel categorizations that arose during the \SDFN creation.
Participants, if not uncomfortable from the unusual thinking demands of the \SDFN, generally engaged in a self-reflective discussion. It wass vital to ask open questions that build a scaffold for the participant's self-discovery.
Of interest in the discussion are how the participant transitioned from one of the categories to another. The \SDFN\ allows for a very solid investigation of interesting questions of categorization. In this case, I was exploring how the participant categorized things as data, information, or knowledge, their boundaries, and their transitions.
The main opportunity of the discussion is to study how the participant thought about the interaction between categories. In many interviews, the participants would construct a hierarchy in the theoretical discussion and discuss how data became information. The self-generation of a theoretical ontology of data from the practical categorization experience is the main purpose of this methodology.
Other items to query include any departures from the normal flows. The participant, when creating the \SDFN, will prefer certain categories to others, depending on topic and personal preference. When given the opportunity, I tried to ask about the unusual categories: instances in which the participant either made up a category, combined or concatenated categories, or used a rare classification. For example, when certain participants strongly favored data, only a few flows were classified as knowledge, mainly because they fell outside the scope of the discussion. Subsequent conversation then focused on those flows to try to get a balanced understanding of their nature.
When discussing combined or concatenated flows, it is important to understand the type of metaphor that the participant is using\cite{Lakoff2003}. While this is mostly important in terms of analysis, my participants normally used low key but figurative metaphors. During this process, I tried to ensure that they elaborated on each metaphor and clearly distinguished the type they were using. A container metaphor (\quotation{data holds information}) is different from that of a concatenative metaphor (\quotation{data alongside information}) and both are different from a combinatory metaphor (\quotation{data and information}).
A brief discussion of metaphor is in order. Participants gave me clues to their personal construction of data through their use of metaphor, especially their use of verbal affordances. Anytime the participants said, \quotation{filter data,} it was a strong clue that they were very interested in subjective masses of data that needed to be winnowed down. Discussions of precision and accuracy, or any kind of implicit meta-data were pointers to objective elements.
An example of container metaphor in action:
\startextract
Interviewer: ... To you, what is Information?
Participant: It's something that's not physical, basically. That means it could be a communication, it could be a conversation or a story. Something verbal.
Interviewer: So Information is any non-...
Participant: Data to me is physical, basically. It's an entity that [fuzzy word] an entity of some sort.
Interviewer: You would say that letter is Data?
Participant: I would say that letter is Data. But what is on that Data is Information. Just to be a bit confusing.
Interviewer: Tell me more.
Participant: Coming from the field we're in, in [removed]. You've got different areas. It would be, as I've said, Data or record. That's just the physical entity.
\stopextract
I notice the \quotation{what's on that Data is Information}, because \quotation{on} is a \quotation{container} word.
\startextract
Interviewer: Data is a container...
Participant: Yes, of the Information.
Interviewer: And Information is content of what type? Is there something common to all Information?
Participant: it's not easy. I'm struggling. It's a struggle. Information, what is Information? It's just... No, I'm drawing blank.
Interviewer: We have right now Data
Participant: And then of course you get Knowledge.
Interviewer: We have Data, Data is a container for Information.
Participant: I'm quite happy with that.
Interviewer: You say physical at some point?
Participant: Yes, physical in [unintelligable]. It doesn't actually have to be physical in a piece of paper, but it can be physical as in an e-mail message.
\stopextract
Whereas, in a different interview, we have the combinative \quotation{and}:
\startextract
Interviewer: Analysis, explanation. Analysis is?
Participant: Something like: model outputs or calculations. Explanation is what that actually means in context of your [work environment].
Interviewer: Class of analysis is Data, Information, Knowledge?
Participant: It's probably more on the Information. Well, it's Data and Information. And the explanation is Knowledge, it had better be.
Interviewer: When you say that, what do you mean?
Participant: You hope that when someone gives you their explanation, you know more than before they told you. Not always true. They can tell you stuff, and you can go \quotation{Well, I understand even less than when I started.} Because if it's completely contradictory to your understanding, you are now really confused. And in the cross cultural... you're often doing these meetings with interpreters in the room. Am I not asking it right? You don't necessarily get an interpreter that speaks Technical [other language]. Often they're here for some other meeting, and they just bring someone from one of the marketers who will be bilingual. Now, they all are bilingual, but particularly senior people will choose not to speak English because it's embarrassing when they speak badly. That said, we can't speak [other language], so who are we to criticize?
\stopextract
The keyword \quotation{and} of \quotation{Analysis is data and information} meant that they are combinable and can sit in one flow. The concatenative idea is a lot more difficult. Although in the following example I allude to a container, the interaction between data and information is not the strict container of the first example, but rather has more \quotation{alongside} affordances\footnote{Metaphor provides the affordances of the thing being related in the metaphor. Therefore, a container metaphor affords \quotation{putting into.} It is by analysis of which set of affordances is hinted at most strongly by the participant that important clues are gleaned towards the participant's conception of data.}:
\startextract
Participant: As part of the Information flows to those, I may include a little bit of raw Data, but not very much.
Interviewer: Does this raw Data ...
Participant: Usually photos or a graph.
Interviewer: So you would say that photos are raw data.
Participant: Yes.
Interviewer: Would you say that the raw data is contained in the Information? i.e. you send them interpretation of measurements. As part of that interpretation you have to send them some of the measurements that are really interesting. Would you say that the Data is inside the Information flow, and we can just label this as Information? Or would you say that it's Information + Data?
Participant: I'd keep it inside the Information flow, because if it was just raw Data. They could very easily reach what I think is the wrong idea -- misinterpret.
Interviewer: Therefore, you're not going: \quotation{Here's the Information, here's the Data.} You're going \quotation{Here's the Information, here's some Data inside the Information to back it up.}
Participant: Yeah, that's right. But with just the raw numbers and no context, that's Data.
\stopextract
This discussion, if fruitful, will lead into definitions. Having looked at the ontological transitions in the discussion above, the participant may now be prepared to examine the ontological definitions of the various categories. Here was one of the more treacherous spots of the presentation, because it would have been extremely easy to put words into the participant's mouth through suggestions or overly specific leading questions.
Instead, I tried to allow the participants to use their own inductive process on the categories and transitions they have defined. Normally, the basis of the definitions will occur in the discussion of transitions, but it may not happen in every case. If possible, guide the participants to identify and discuss their own thoughts of how they categorized something as data.
Although the relationship questions are normally deeper, leading into this discussion through transitions means that the various affordances and other philosophical handles of data, information, and knowledge will be discussed first. Data, lacking form, has no \quotation{natural} or non-constructed affordances. The reification of data through the \SDFN\ caused the participant to suggest their own affordances, and thereby strongly hint at their conception of data. The other component is to ask them to discuss their understanding of how they know something is data or of how they categorize it. The only structure possible here is that provided by the \SDFN\ itself. The participants were also prompted to explain their categorization methodologies.
\stopcomponent
\startcomponent c_3_sdfn_post
\product prd_Chapter3_Methodology
\project project_thesis
\stopcomponent
\startcomponent c_3_survey
\product prd_Chapter3_Methodology
\project project_thesis
After the completion of the interview process, a decision to create a survey was made. The purpose of the survey would be to test two things: would it be possible to replicate the success of the interview technique in a more automated form and would the different constructions of data be evident in a more varied audience? Unfortunately, the survey suffers from both methodological and coding flaws, and is therefore presented for intellectual curiosity only. It is clearly a necessary direction for future research. One of the most critical problems is the framing of the survey, explitly asking for a differentiation between data, information, and knowledge: \quotation{This survey is exploring what you think about Data. To do that, the survey will present a list of short "scenarios". We will ask you to categorize the scenario as involving Data, Information, Knowledge, or something else, depending on your own understanding of the terms.} Unfortunately, this posits an artifical distinction between data, information, and knowledge that the participants may not originally have percieved.
During the process of collecting data, an unexpected opportunity arose: a mailing list of retired intelligence officers and agents was interested in my research. To take advantage of this opportunity, I created a survey. Optimistically, the first survey was a direct copy of the interview process, starting with a complex demographic interrogation, asking the participants to create flows and entities, and then asking them to self-direct their own investigation into their understanding of data.
It was a complete failure.
In the first survey pilot test, 18 people attempted to complete the survey. Only the person who had participated in my interviews had any idea of what the survey was talking about, and even that attempt produced no useful data. Most of the participants failed to complete the survey after taking the demographics section as an exemplar of the whole, and stalled horribly at the \quotation{now describe an entity} section.
I believe the people who attempted the survey ran into two significant problems. The first, and more critical, was the symptom known as \quotation{tl;dr} or \quotation{too long; did not read.} An associate, practiced in survey creation, suggested that no one taking a survey would read more than three sentences of instructions. As these surveys presented multiple paragraphs detailing and defining terms, it was clear that the obstacle of tl;dr was in full effect.
More subtly, though, the very abstract and theoretical nature of the questions was a problem in creating scaffolding. In the interview, because I was able to provide assistance and incremental steps according to {\em my assessment} of the participant's comprehension, I do not believe that any participant found the process exceedingly difficult. Rather, because the survey was self-guided, its impersonality was its primary point of failure. In the interest of making a survey that people could finish quickly, I had created one that was not able to adapt the scaffolding processes that made the interviews successful. Therefore, the only people to complete it were those who {\em already} knew about the concepts being discussed: one of my interview participants, and an academic who specialized in teaching the \DFD\ methodology.
From this failure, I learned that a new methodology would be required. The primary lesson was that the direct translation of interview techniques failed. My intuition was that the success of the interview was based on the feedback given by the interviewer, not the structure of the interview {\em per se}. In an online survey, people expect mostly to click answers, rather than to type essays in a web form. Very few long-answer questions are appropriate to such a format, however, and a survey comprised entirely of them is wholly inappropriate for anything but a final exam. The informal nature of a survey makes the kind of focused concentration required of long answers quite difficult, especially considering the lack of any reward besides the completion itself. It was also a mistake to establish expectations in the demographic area of the survey and then violate them on the next page through a longer theoretical component.
When considering what to include in my second attempt, I could not simply consider all the myriad ways that the first survey failed. It was also important to understand the few ways in which the first survey succeeded. The two principal successes of the first survey were in the demographic section and in the tool itself. The demographic section successfully captured interesting demographic information at a high granularity. The tool, Limesurvey, performed far beyond expectations. It is well written, database-agnostic, secure, free, and open source. The mechanism for importing and exporting surveys is streamlined and very functional.
A slavish copy of the methodology of the interview was clearly unsuccessful, and so any theme and variation on that would almost certainly share the same fate. I had to reconsider what question it was that I was trying to answer. In the first survey, the question developed into, \quotation{With a self-constructed \SDFN, can you articulate your own philosophies of data, information, and knowledge?} The respondents presented a very straightforward answer: \quotation{No.} The essence of the \SDFN\ is in the process of categorization. Although the interview length lent itself to a thorough exploration of the self-declared roles and their own data transfers, the essence of the \SDFN\ was in enticing categorization of many different, and distinct, flows.
I realized that it was possible to remove person-specific flows and allow people to classify a general set of scenarios. I wanted to explore a specific question: \quotation{How does a specific role categorize data, information, and knowledge.} The question of role was tricky, despite the success of the demographics section; the participants' answers did not suggest which role headspace they were considering their answers from. A hypothesis, while creating this, was that people would have different answers to the categories depending on the role in which they were thinking at the time-an explanation substantiated by the remarkably different interview answers one participant gave when interviewed twice about remarkably different topics. I needed to assess the person's role, rather than just his or her generalized demographics while keeping the results completely anonymous. As the scope of the prior project was in many ways its fatal flaw, minimalism was the rule of the day in the second attempt.
The survey opened with \quotation{This survey has requested that you answer it from the perspective of one of the jobs that you do. Please describe the duties of that job (in general).}. Earlier, I asked participants to: \quotation{We believe that people can have different philosophies, depending on what job they're doing. For this survey we ask that you think about the scenarios from the perspective of one of your jobs.} The phrasing of the first sentence was unfortunate, invalidating the survey’s \quotation{scientific reliability.} It is my belief that the question solicits all necessary demographic information without extending beyond the participant’s comfort zone of anonymity.
The survey questions after this point all had the same format. They would begin with: \quotation{I am trying to understand what you think of as Data, and why. The questions below ask you to categorize the scenario, and then explain the categorization. Please read the following one sentence scenario. Categorize the highlighted word or phrase in context of the scenario.}
Then the scenario is presented. Here are all the scenarios in order. The scenarios were chosen such that the highlighted phrase would help to differentiate between the three constructions of data found during the interviews. This strategy was not particularly effective.
\startitemize
\item Alice receives a letter from Bob.
\item Alice receives a letter from Bob containing instructions on how to build a machine.
\item Alice receives a letter from Bob containing a short story he has written.
\item Alice determines the locations for parts of a Rube Goldberg style machine to cook her breakfast.
\item Alice receives a letter from Bob. The letter is a time chart of what shows he has watched on TV for the last week.
\item Bob receives an e-mail from Alice, it is a record of the daily temperatures outside her apartment for the last week.
\item Bob receives a flash drive from Alice. It contains mp3 music files.
\item Bob attends a symphony with Alice and enjoys the live music.
\item Bob ignores the traffic noise outside the symphony.
\item As Bob is mugged walking home, the mugger demands his wallet and watch.
\item Charlotte finds a microfilm in a hollow coin, it contains a list of numbers and times about something unknown.
\item Charlotte finds a microfilm in a hollow coin, but cannot decypher the code.
\item Charlotte finds the secret key to the code, and realizes it’s a letter for technical support to the spy’s handlers.
\item Charlotte finds a microSD card in a hollow coin, it contains a planning program for something unknown.
\item Charlotte creates a statistical profile of a spy, to predict their actions.
\item Dave lectures to a classroom about database design.
\item Dave grades quizzes from a relational alegabra course.
\item Dave discusses the reasons behind one of Eve’s incorrect answers.
\item Dave writes a survey asking people to describe their impressions of a user interface.
\item Dave saves an empty word document in preparation for his later work on a conference paper.
\item Eve writes poetry describing the winter wind.
\item Eve interviews students for the campus TV station and gets short quotes for her topic.
\item Eve looks at the weather report and decides to bring an umbrella.
\item Eve receives a letter from an ex-boyfriend, telling her to take her stuff back.
\item Frank selects which instrument readings to include in his experiment.
\item Frank designs an experiment
\stopitemize
Each of these surveys asked the participant to categorize the highlighted phrase as data, information, knowledge, or other. The other then provided a text box for elaboration. Participants were then given a large text area to explain their choice if they wished. This survey structure allowed for similar kinds of self reflection as found in the \SDFN, but did not adequately phrase the questions.
\stopcomponent
\startcomponent c_3_survey_tools
\product prd_Chapter3_Methodology
\project project_thesis
With minimalism in mind, the first survey was completely reinterpreted. The final section of the first survey, which asked people to write an essay on their own conception of data, was removed, the demographics section was reduced to one question, and the flow diagramming was reduced to simple categorization.
While people were quite willing to answer the demographic questions, as stated earlier, I feel that the initial questions distracted from and reduced peoples' attention span for the subsequent survey. The only demographic question that really matters is about the participant's mindset.
The new survey asked participants to \quotation{vividly imagine} a role, thereby artificially putting them into that role's mindset. Asking participants to engage in a specific mindset is effectively asking them to play a role: to pretend to think in ways that are foreign to their current state of mind. They retain their authentic deep expertise in the domain that they have chosen. By engaging in role-playing, participants assume the understanding of data of that role as the scenarios are filtered through the mental maps imposed by the role.
By asking participants to vividly imagine and then {\em describe} that role, the survey made it possible for them to reveal as little or as much as they wanted about their thinking methodology without disclosing potentially identifiable information. The survey gave the following instructions for the role:
\startextract
This survey is exploring what you think about Data. To do that, the survey will present a list of short scenarios. We will ask you to categorize the scenario as involving Data, Information, Knowledge, or something else, depending on your own understanding of the terms.
We believe that people can have different philosophies, depending on what job they're doing. For this survey we ask that you think about the scenarios from the perspective of one of your jobs.
\stopextract
While this was a long answer question, the fact that it came first and was asking them simply to describe what they imagined seemed to allow for it to be effective.
\placefigure[]
[fig:scenario1]
{Scenario 1 from the survey. The dropdown box asks participants to choose Data, Information, Knowledge, or Other. If they choose Other, a text box appears for more details.
}{\externalfigure[Chapter3/Survey1.png][frame=on]}
After vividly imagining and describing a role, the survey moved into the pure measuring phase. This phase involved presentation of a one-sentence scenario with a term highlighted in bold\footnote{The full survey, in printable version, can be found in Appendix A.}. The first scenario was, \quotation{Alice receives a letter from Bob.} The survey requested: \quotation{Please read the following one sentence scenario. Categorize the highlighted word or phrase in the context of the scenario.} Participants were asked to classify via a drop-down box whether a given scenario was \quotation{Data, Information, Knowledge, or Other} and, if they chose other, a text-box appeared so that they could enter their own classification. Happily, this option was often utilized, suggesting that most participants did not choose a false category of data, information, or knowledge.
When participants selected a choice from the survey, they were then encouraged to explain themselves: \quotation{Please explain in one or two sentences why you categorized the scenario that way.} This phrasing offered participants the choice to engage in as much self-reflection as they wanted about the only critical thing that mattered: the act of categorization. While it would have been desirable to have a more comprehensive survey, I feel that the second survey incorporated the lessons learned from the first and was able to produce some surprising results, regardless of its small scope.
The audience for this survey was chosen very informally from three roughly distinct groups. The first group \quotation{chosen} was from my social networks via Twitter and Facebook. This group was initially contacted to pilot test the survey. The power of social networking tools in this kind of research cannot be overstated.
As the first few results came in, the survey looked sufficiently effective at capturing the participant's understanding of data to launch it without modification. The second group consisted of the respondents to the flyer\footnote{See Appendix C.} sent to the INTELST Forum, a mailing list coordinated by the U.S. Pentagon and Army. The people who responded to the flyer were then e-mailed a link to the survey. Very little can be said about this group, save that they are all active or retired intelligence professionals from both the military and civilian side of things. The responses from this group were fascinating, and clearly reflected a personal construction of data different from that of the others.
The third group was recruited through a presentation at the company, summarizing the findings of my interview research and inviting participants to take the survey. Unlike the more focused set of my \quotation{initial trial} group, a large subset of researchers and staff from the company was invited to my talk. This sampling allowed me to invite many distinct people to take the survey, and I feel that quite a few different jobs from the company were represented in the final results.
\stopcomponent
Results Chapter
Introduction
This chapter presents the excerpted research results from my interviews as well as a presentation of data collected from the surveys. The interviews presented here feature lengthy excerpts representative of the interview as well as anonymized SDFN diagrams.
The first interview presented, the pilot interview, will serve to illustrate the methodological explanations I provided to participants.
\startcomponent c_3_bubble
\product prd_Chapter3_Methodology
\project project_thesis
The technical pilot interview was the first interview conducted, and it served two purposes: to vet the equipment, and to provide a test of the methodology. After exposure to my methodology, my advocate could then use their experiences to persuade their co-workers to participate in my interview. Due to the more public nature of this interview, as well as the work with uncertain equipment, my advocate and I chose to work with a harmless topic: our mutual participation in an online game.
\stopcomponent
\startcomponent c_3_bubble_collect
\product prd_Chapter3_Methodology
\project project_thesis
\placefigure[]
[fig:bubble1]
{A sample entity dictionary. The participant was brainstorming possible entities for us to explore.}
{\externalfigure[Chapter3/bubble1.png][factor=fit,frame=on]}
\placefigure[]
[fig:bubble3]
{The first \quotation{\SDFN\ diagram.} Note how each flow is categorized below the flow and labeled above the flow, showing the necessity for curved flows. }{\externalfigure[Chapter3/bubble3.png][factor=fit,frame=on]}
\placefigure[]
[fig:bubble4]
{Another \SDFN\ diagram. Note the presence of wormholes.}
{\externalfigure[Chapter3/bubble4.png][factor=fit,frame=on]}
As this was the technical process, \in{figure}[fig:bubble1] illustrates the creation of an entity dictionary. As we can see, the entities are just sketchy bubbles with entity names in them, which may or may not appear in later cases.
The following is my explanation of the process to the participant:
\startextract
Interviewer: Well, we'll be getting back to basically this question after we build the data flow diagrams. This is something to let simmer in your subconscious.
Participant: Yeah it would... it's something that we should discuss more. Yeah, there's context of knowledge, and there's context of data. Maybe in a superficial analysis they don't meld. But if you think about it more deeply, you go, 'Oh, hang on.' It's not just making arbitrary distinction that this is knowledge and this is data. Think about, \quotation{Why do I consider that knowledge? Why do I consider that data?} That really is knowledge because its stuff I know not just there You're doing the Ph.D.
Interviewer: Well, yes, but I'm doing the Ph.D. based on what you tell me. So, when we're building this data flow diagram. What I'm going to be doing is two symbols. Well, we're going to be making circle symbol and a circle symbol will have some sort of designator that is important for you in it. It represents a person, or an entity, that communicates, transmits, verbs, data. Whatever you consider data to be.
Participant: Does that include knowledge?
Interviewer: Does it include knowledge?
Participant: Point Taken.
Interviewer: And then are going to be arcing lines sometimes with an arrow on it to another entity. The arcing lines we'll label with whatever you would classify the data/knowledge/information, whatever term you want to do. It's, for example, you said that there was a 'fail res'. As an example I'm not sure you would use this, one transmission of data would be you to the server, res the fort. We would label this line, 'you, server, rez fort.' And if you want to, you could say this is data, or that this isn't data, this is information. Whatever you think is important to define about that transmission. We'll label that line as.
Participant: Oh, so are they different classifications?
Interviewer: There are whatever classifications you want. What I'm going to be doing here, is looking at how you classify these things, and what small groups you identify and how you think that other people classify these things. And then compare it to how other people classify the things, and look at how the perceptions shift from person to person. There's no detail too trivial for this discussion. Because I'll be using it to look at what other people will be doing.
\stopextract
In this, the theory of the \SDFN\ is not deeply explored. The important thing to do before the start of the process is to gently explain the process and diagramming techniques that will be employed in the rest of the interview. As the participants tend to be unsure at this stage, it is also important to avoid giving them definitions of data, information, or knowledge as they may be looking for specific cues to suggest which construction of data to use.
In this \SDFN, I did not ask the participant if they wanted to do an entity dictionary, I just started with the process of articulating the entities. It is acceptable if the participant is chatty during this process. Not only will this help to define the universe of discourse, and establish them in their own minds as subject-matter experts, but also the process of being chatty is allowing them to slip more deeply into the role they are discussing.
\startextract
Interviewer: This is literally trying to render how you perceive the game. ... We'll start with a circle. How would you like to label this circle? What do you think is a representation of you? We can write you here, we can write the computer here.
Participant: We can talk about me as a Clan Master. I'd put that one there. That's to start with. I'd like to have another one, for my different roles. Cause I'm also a player of the game. Which is different from their role as a Clan Master for sure. Because they're often in conflict.
Interviewer: What other entities can we identify in general?
Participant: Across the whole game?
Interviewer: Well, that will transmit data, whatever that is, or information or knowledge. When I say data, feel free to assume I'm also talking about information or knowledge if that helps.
Participant: That's good. Cause often, if you get too transfixed on your own view of data, jeez, there's lots. Like the developers. You've got individuals within there. Now, I guess this is when the granularity comes out, because you can talk about other clan masters as a circle. But they're individuals amongst themselves. So I can identify. Within that, you have the concept of a clan master. There's [Active Clan Master 1] but then they have other concepts such as --
Interviewer: [Active Clan Master 1] is a person, yes?
Participant: Person, yeah. You've got active CMs, which [Active Clan Master 1] can be part of this group. And you've got inactives.
Interviewer: Inactive CMs?
Participant: Yeah. So there's individuals, but within there, you could break that. And then say: "Well, look [Active Clan Master 1]'s here, or something like that. And [Clan Master 2]'s here. We'll often talk about semi-active as well. There's me as a thing and there's also others within, and break it down like that.
Interviewer: So what we'll want to do here is create some sort of representation so that you can talk about classes of people. Or individually people if you feel that they're important to be talked about as an individual. So, from this, you can say, \quotation{Well, this is a communication from me, to the active Clan Masters.} \quotation{And I do this sort of communication.} \quotation{Or, this is a communication from me to [Active Clan Master 1].} Whatever represents what you're doing.
Participant: Look, cause sometimes -- Well, the information we can talk about, what, information, data? is transmitted. That's why I sort of did them separately. Cause I want to transmit me and [Active Clan Master 1] will talk about something differently than we'll talk about with others here. That can be a slightly different form of communication with these and with that. You deal with the individual differently even in a group.
Interviewer: And that's what I'm trying to tease out here. Thank you. We've got you as a player, you as a clan master, the Devs. Right now, we're just going to make bubbles and we'll take these bubbles to a third page when we're drawing lines. This is brainstorming.
Participant: We'll keep it sort of then at a higher level.
Interviewer: Whatever you want to do, if you want me to draw it, I'll draw it. If you want to draw it, go for it.
Participant: Nah, you might as well do it. So there's other CMs then, as part of the clan. And, I'll use that term to classify all the people that can rez the fort, per se.
Interviewer: Okay, so you define Clan Master as someone who's able to rez the fort?
Participant: Well, not really. But I'll group them together. There's what we call Martini admirals in [Clan] which aren't Clan Masters per se, but they have almost the same power as a clan Master. They can't be kicked. But I'll group them together. Because they're really, as you know in [Interviewer's Clan]. There's either those at the top and there's the rest of the players. So I'll call that other clan masters. And then there's other players. That's probably an easier, higher level distinction up there. And the next natural bubble is \quotation{other clans.}
\stopextract
The other aspect of the \SDFN\ that I was teaching the participant about here was the appropriate scope of an entity as well as the desired granularity of their universe of discourse, as it is bounded by both scope and detail: only so many actions are of interest, and some actions are too trivial to diagram.
The mistakes of \in{figure}[fig:bubble2] illustrate the participant demonstrating a misunderstanding of the \SDFN, drawing their own hierarchy of authority within their organization. The creation of these side artifacts as part of the entity diagram is acceptable, especially as a way of pinpointing desired levels of granularity in entities. Participants should not think of the \SDFN\ as a hierarchy.
The start of the \SDFN\ can be quite subtle. In \in{figure}[fig:bubble3], the rezzing the fort \SDFN\ began as a \quotation{walk me through the process:}
\startextract
Participant: So the next would be -- Maybe if I described the process ...
Interviewer: Walk me through the process.
Participant: Well, this is the case. I say \quotation{It's time clan war.} This is where I bring in the extra bubbles.
Interviewer: Let's trace this and see where we get off these bubbles from the process.
Participant: We're at clan war. \quotation{I want to rez the fort.} A command to rez the fort, it sends me back. Now, I guess to introduce the other bubble here as other clan members.
Interviewer: So, we're going to say internal clan members? Can we say other than members because that conflicts with clan master?
Participant: Clan Players?
Interviewer: Players. ICP.
Participant: I'll leave you with the acronyms. I tell them something now as well.
Interviewer: Now, when you tell them something.
Participant: There's a lot. That's a really detailed line between us and
Interviewer: Do we want to multiple lines?
Participant: Yeah. That's cool What we're dealing with at the moment. I'm going through the process of rezzing the fort, which is a common thing we want to do.
Interviewer: So shall we label this rezzing the fort?
Participant: So I'm telling them, I'm sending them information. That they can now war. That they can start.
Interviewer: And so this is a what? Is this a status, is this a command, is this something else?
Participant: It's information because it does require context. But it's like a status it's saying: \quotation{you can now war. We can start fighting.}
Interviewer: So it's a status. What other flows do you have to the internal clan players?
Participant: Apart from that? We obviously maintain that they're sending stuff back to me.
Interviewer: What are they doing there?
Participant: The line going back. They're sending me, also status updates. Whether they're ready to play, whether they're there. How much AP they have and things like that.
Interviewer: And they're sending you these as?
Participant: Textual information.
Interviewer: So it's text information?
Participant: Received via MSN.
Interviewer: Do we want to have MSN here or is MSN not at this level?
Participant: No, MSN is at that level. I'd say it is. I see MSN as -- it's true, these would in essence, I don't see them as MSN. MSN is a like a tool or a spanner. As an intermediary because I send it here (MSN bubble) and then to there (Player bubble.) And that is true because I don't talk directly to them, per se.
\stopextract
I start by exploring the entities that we described in the entity diagram along with a process that has come about out of small talk. The process of diagramming a single process is about the right complexity for an \SDFN. As we can see, the advent of a second \SDFN\ in Case-4 meant that the process of \quotation{rezzing the fort} was a little too simple. It costs nothing to make a second diagram if there is sufficient time remaining.
The other important element is the requirement of asking questions. The point of the interview is to tease out the understanding of the participant, and to do that, they have to keep talking. Open questions, confirmations, and other prompts keep them talking without guiding down them any specific direction.
\stopcomponent
Interviewer: So what we'll want to do here is create some sort of representation so that you can talk about classes of people. Or individually people if you feel that they're important to be talked about as an individual. So, from this, you can say, "Well, this is a communication from me, to the active Clan Masters." "And I do this sort of communication." "Or, this is a communication from me to [Active Clan Master 1]." Whatever represents what you're doing.
Participant: Look, cause sometimes -- Well, the information we can talk about, what, information, data? is transmitted. That's why I sort of did them separately. Cause I want to transmit me and [Active Clan Master 1] will talk about something differently than we'll talk about with others here. That can be a slightly different form of communication with these and with that. You deal with the individual differently even in a group.
Interviewer: And that's what I'm trying to tease out here. Thank you. We've got you as a player, you as a clan master, the Devs. Right now, we're just going to make bubbles and we'll take these bubbles to a third page when we're drawing lines. This is brainstorming.
Participant: We'll keep it sort of then at a higher level.
Interviewer: Whatever you want to do, if you want me to draw it, I'll draw it. If you want to draw it, go for it.
Participant: Nah, you might as well do it. So there's other CMs then, as part of the clan. And, I'll use that term to classify all the people that can rez the fort, per se.
Interviewer: Okay, so you define CM as someone who's able to rez the fort?
Participant: Well, not really. But I'll group them together. There's what we call MA in [Clan] which aren't Clan Masters per se, but they have almost the same power as a clan Master. They can't be kicked. But I'll group them together. Because they're really, as you know in [Interviewer's Clan]. There's either those at the top and there's the rest of the players. So I'll call that other clan masters. And then there's other players. That's probably an easier, higher level distinction up there. And the next natural bubble is "other clans."
Interviewer: So I've got other clan players, and then other clans?
Participant: Um. No, I would put them together, so just other clans.
Interviewer: What about other clan masters? Or is that other clans?
Participant: I would put them as just other clans at the moment.