-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.json
1220 lines (1220 loc) · 399 KB
/
search.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
{
"objectID": "assessment/index.html",
"href": "assessment/index.html",
"title": "Assessment",
"section": "",
"text": "3,500-word long research report\n 26th April 2023\n Submit to Turnitin via Canvas\n\nMore detailed information and submission link are available on the Canvas site"
},
{
"objectID": "data/data-documentation.html",
"href": "data/data-documentation.html",
"title": "Data documentation",
"section": "",
"text": "The datasets used in this course and available for download from the course website are the following:\n\n\n\nFile name\nOriginal name\nType\nVersion\nSurvey\nLinks\n\n\n\n\neb89.1\nZA6963_v1-0-0\n.dta\n.sav\n1.0.0\nEurobarometer; 89.1 (March 2018)\nSource\nQuestionnaire\nCodebook\n\n\ness9\nESS9e03_1\n.dta\n.sav\n3.1\nEuropean Social Survey; Integrated file, Round 9 (2018)\nSource\nQuestionnaire\nCodebook\n\n\nevs5\nZA7500_v4-0-0\n.dta\n.sav\n4.0.0\nEuropean Values Study; Wave 5 (2017-2020)\nSource\nQuestionnaire\nCodebook\n\n\nEUinUK2018\nEUinUK2018_Polish\n.dta\n-\nSurvey data collected by McGhee and Moreh (2018), ESRC Centre for Population Change\nSource\nQuestionnaire\nCodebook\n\n\nLaddLenz\nLaddLenz\n.dta\n-\nReplication data for Ladd and Lenz (2009), based on British Election Panel Study data\nSource\nQuestionnaire\nCodebook\n\n\nosterman\nReplication_data_ESS1-9_20201113\n.dta\n-\nReplication data for Österman (2020), based on European Social Survey Rounds 1-9 data\nSource\nQuestionnaire\nCodebook\n\n\n\nThe datasets can be read into R from \"https://cgmoreh.github.io/SSC7001M/data/FILE_NAME\" using an appropriate command from the haven package or other importing function.\n\n\n\n\n\n\n\n\n\nFile\n\n\nOriginal name\n\n\nType\n\n\nVersion\n\n\nOrigin\n\n\nAccess\n\n\n\n\n\n\nosterman\n\n\nReplication_data_ESS1-9_20201113\n\n\n.dta\n\n\nNA\n\n\nReplication data for Österman (2021), based on European Social Survey Rounds 1-9 data\n\n\nSource Questionnaire Codebook\n\n\n\n\nLaddLenz\n\n\nLaddLenz\n\n\n.dta\n\n\nNA\n\n\nReplication data for Ladd and Lenz (2009), based on British Election Panel Study data. Included in Hainmueller (2012)\n\n\nSource Questionnaire Codebook\n\n\n\n\n\n\n\n\n\n\nReferences\n\nHainmueller, Jens. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20 (1): 25–46. https://doi.org/10.1093/pan/mpr025.\n\n\nLadd, Jonathan McDonald, and Gabriel S. Lenz. 2009. “Exploiting a Rare Communication Shift to Document the Persuasive Power of the News Media.” American Journal of Political Science 53 (2): 394–410. https://doi.org/10.1111/j.1540-5907.2009.00377.x.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nÖsterman, Marcus. 2021. “Can We Trust Education for Fostering Trust? Quasi-experimental Evidence on the Effect of Education and Tracking on Social Trust.” Social Indicators Research 154 (1): 211–33. https://doi.org/10.1007/s11205-020-02529-y."
},
{
"objectID": "data/index.html",
"href": "data/index.html",
"title": "Data documentation",
"section": "",
"text": "File name\n\n\nType\n\n\nDescription\n\n\nLink to source\n\n\n\n\n\n\nevs5\n\n\n.sav\n\n\nEuropean Values Study; Wave 5 (2017-2021)\n\n\nSource\n\n\n\n\nosterman\n\n\n.dta\n\n\nReplication data for Österman (2021), based on European Social Survey Rounds 1-9 data\n\n\nData source Open access article Supplementary materials\n\n\n\n\nLaddLenz\n\n\n.dta\n\n\nReplication data for Ladd and Lenz (2009), based on British Election Panel Study data. Included in Hainmueller (2012)\n\n\nSource\n\n\n\n\nEverydayTrust\n\n\n.Rds\n\n\nReplication data for Weiss et al. (2021)\n\n\nSource\n\n\n\n\ngaltonpeas\n\n\n.Rds\n\n\nData underpinning a paper presented by Sir Francis Galton to the Royal Institute on February 9, 1877, summarising his experiments on sweet peas in which he compared the size of peas produced by parent plants to those produced by offspring plants.\n\n\nSource\n\n\n\n\ngalton1886\n\n\n.dta\n\n\nSir Francis Galton’s famous data on the heights or parents and their children underpinning his 1886 paper (Galton 1886).\n\n\nSource and more info\n\n\n\n\nValentino17\n\n\n.dta\n\n\nReplication data for Valentino et al. (2019), based on original data collected through YouGov in 11 countries. The original dataset provided by the authors is called imm.bjpols.dta and the original analysis was performed in Stata.\n\n\nData source Open access article Supplementary materials\n\n\n\n\nEjrnaes21\n\n\n.dta\n\n\nReplication data for Ejrnæs and Jensen (2021), based on data from the European Social Survey Round 8. The original dataset provided by the authors is called G&O_Final.tab and the original analysis was performed in Stata.\n\n\nData source Open access article Supplementary materials\n\n\n\n\nworkout\n\n\n.Rds\n\n\nExample dataset from Mehmetoglu and Mittner (2021); a combined version of the original workout2 and workout3 datasets included in the {astatur} package\n\n\nData source\n\n\n\n\n\n\nThe datasets can be downloaded by clicking on the file name, or read into R directly from \"https://cgmoreh.github.io/HSS8005/data/___\" (using a type-appropriate read function and replacing ___ with “File name” and “Type” extension; e.g. haven::read_dta(\"https://cgmoreh.github.io/HSS8005/data/dataset.dta\")).\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEjrnæs, Anders, and Mads Dagnis Jensen. 2021. “Go Your Own Way: The Pathways to Exiting the European Union.” Government and Opposition, February, 1–23. https://doi.org/10.1017/gov.2020.37.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in Hereditary Stature.” The Journal of the Anthropological Institute of Great Britain and Ireland 15: 246–63. https://doi.org/10.2307/2841583.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nHainmueller, Jens. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20 (1): 25–46. https://doi.org/10.1093/pan/mpr025.\n\n\nLadd, Jonathan McDonald, and Gabriel S. Lenz. 2009. “Exploiting a Rare Communication Shift to Document the Persuasive Power of the News Media.” American Journal of Political Science 53 (2): 394–410. https://doi.org/10.1111/j.1540-5907.2009.00377.x.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMehmetoglu, Mehmet, and Matthias Mittner. 2021. Applied Statistics Using R: A Guide for the Social & Natural Sciences. First. Thousand Oaks: SAGE Publications.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nÖsterman, Marcus. 2021. “Can We Trust Education for Fostering Trust? Quasi-experimental Evidence on the Effect of Education and Tracking on Social Trust.” Social Indicators Research 154 (1): 211–33. https://doi.org/10.1007/s11205-020-02529-y.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489.\n\n\nValentino, Nicholas A., Stuart N. Soroka, Shanto Iyengar, Toril Aalberg, Raymond Duch, Marta Fraile, Kyu S. Hahn, et al. 2019. “Economic and Cultural Drivers of Immigrant Support Worldwide.” British Journal of Political Science 49 (4): 1201–26. https://doi.org/10.1017/S000712341700031X.\n\n\nWeiss, Alexa, Corinna Michels, Pascal Burgmer, Thomas Mussweiler, Axel Ockenfels, and Wilhelm Hofmann. 2021. “Trust in Everyday Life.” Journal of Personality and Social Psychology 121: 95–114. https://doi.org/10.1037/pspi0000334."
},
{
"objectID": "index.html",
"href": "index.html",
"title": "\n Quantitative analysis \n ",
"section": "",
"text": "Quantitative analysis \n \n \n HSS8005 • Intermediate/Advanced stream • 2023\nNewcastle University (UK)\n \n \n A second course in applied statistics and probability for the understanding of society and culture. It is aimed at an interdisciplinary audience through real-life research examples from various fields in the social sciences and humanities. The course emphasizes the scientific application of statistical methods, developing a reproducible research workflow, and computational techniques.\n \n \n\n\n\n\n\nModule leader\n\n Dr. Chris Moreh\n HDB.4.106\n chris.moreh@newcastle.ac.uk\n Tutorial booker\n \n\n\n\nTeaching Assistants\n\n Bilal Alsharif\n Fengting Du\n\n\n\n\n\nSession dates\n\n Thursdays\n Check on your Timetable app\n Lecture: 10:00-11:30\n Labs: 13:00-14:30 (Group 03) 14:30-16:00 (Group 04) \n\n\n\nAssessment\n\n 26th April 2023\n 3,500-word long research report\n Submit to Turnitin via Canvas\n\n\n\n\n Chris’s mastodon feed\nwhere he posts stuff of interest to #HSS8005\n\n\n\n\n\n\nModule overview\nThis module is offered by School X - Researcher Education and Development to postgraduate students within the Faculty of Humanities and Social Sciences at Newcastle University. The module aims to provide a broad applied introduction to more advanced methods in quantitative analysis for students from various disciplinary backgrounds. See the module plan page for details about the methods covered. The course content consists of eight lectures (1.5 hours each) and eight IT labs (1.5 hours) . The course stands on three pillars: application, reproducibility and computation.\nApplication: we will work with real data originating from large-scale representative surveys or published research, with the aim of applying methods to concrete research scenarios. IT lab exercises will involve reproducing small bits of published research, using the data and (critically) the modelling approaches used by the authors. The aim is to see how methods have been used in practice in various disciplines and learn how to reproduce (and potentially improve) those analyses. This will then enable students to apply this knowledge to their own research questions. The data used in IT labs may be cleansed to allow focusing more on modelling tasks than on data wrangling, but exercises will address some of the more common data manipulation challenges and will cover essential functions. Data cleansing scripts will also be provided so that interested students can use them in their own work.\nReproducibility: developing a reproducible workflow that allows your future self or a reviewer of your work to understand your process of analysis and reproduce your results is essential for reliable and collaborative scientific research. We enforce the ideas and procedures of reproducible research both through replicating published research (see above) and in our practice (in the IT labs and the assignment). For an overview of why it’s important to develop a reproducible workflow early on in your research career and how to do it using (some) of the tools used in this module, read Chapter 3 of TSD (see Resources>Readings). It’s also worth reading through Kieran Healy’s The Plain Person’s Guide to Plain Text Social Science, although there are now better software options than those discussed there. In this course, we will be using a suite of well-integrated free and open-source software to aid our reproducible workflow: the statistical programming language and its currently most popular dialect – the {tidyverse} – via the IDE for data analysis, and for scientific writing and publishing (see Resources>Software).\nComputation: the development of computational methods underpins the application of the most important statistical ideas of the past 50 years (see Andrew Gelman’s article on these developments here or an online workshop talk here; Richard McElreath’s great talk on Science as Amateur Software Development is well worth watching too). This module aims to develop basic computational skills that allow the application of complex statistical models to practical scientific problems without advanced mathematical knowledge, and which lay the foundation on which students can then pursue further learning and research in computational humanities and social sciences.\n\nThe course and the website were written and are maintained by Chris Moreh.\n\n\nPrerequisites\nTo benefit the most from this module, students are expected to have a foundational level of knowledge in quantitative methods: a good understanding of data types and distributions, familiarity with inferential statistics, and some exposure to linear regression. This is roughly equivalent to the content covered in the Introductory stream of the module or a textbook such as OpenIntro Statistics (which you can download for free in PDF).\nThose who don’t feel completely up to date with linear regression but are determined to advance more quickly and read/practice beyond the compulsory material during weeks 1-3 are also encouraged to sign up.\nThose with a stronger background in multiple linear regression (e.g. students with undergraduate-level training in econometrics) will still benefit from weeks 1-3 as the approach we are taking is probably different from the one they are familiar with.\nNo previous knowledge of or command-based statistical analysis software is needed. Gaining experience with using statistical software is part of the skills development aims of the module. However, it is not a general data science module, and the IT labs will cover a very limited number of functions (from both base R, the tidyverse and other reliable user-written packages) that are most useful for tackling specific analysis tasks. Students are advised to complete some additional self-paced free online training in the use of the software, such as Data Carpentry’s R for Social Scientists, and to consult Wickham, Çetinkaya-Rundel and Grolemund’s R for Data Science (2nd ed.) online book."
},
{
"objectID": "materials/handouts/index.html",
"href": "materials/handouts/index.html",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "Title\n\n\nDescription\n\n\nReading Time\n\n\n\n\n\n\nWeek 1 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 2 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 3 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 4 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 5 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 6 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 7 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 8 handout\n\n\n\n\n0 min\n\n\n\n\nWeek 1 handout sheet\n\n\n\n\n0 min\n\n\n\n\n\n\nNo matching items"
},
{
"objectID": "materials/index.html",
"href": "materials/index.html",
"title": "Materials",
"section": "",
"text": "Materials for each week are available from the side menu. The table below outlines the weekly topics.\n\n\n\n\n\n\nWeekly topics\n\n\n\n\n\n\n\n\nWeek 1 Gamblers, God, Guinness and peas\n\n\nA brief history of statistics\n\n\n\n\nWeek 2 Revisiting Flatland\n\n\nA review of general linear models\n\n\n\n\nWeek 3 Dear Prudence, Help! I may be cheating with my X\n\n\nInteractions and the logic of causal inference\n\n\n\n\nWeek 4 The Y question\n\n\nGeneralised linear models\n\n\n\n\nWeek 5 Do we live in a simulation?\n\n\nBasic data simulation for statistical inference and power analysis\n\n\n\n\nWeek 6 Challenging hierarchies\n\n\nMultilevel models\n\n\n\n\nWeek 7 The unobserved\n\n\nLatent variables and structural models\n\n\n\n\nWeek 8 Words, words, mere words…\n\n\nText as data\n\n\n\n\n\n\nNo matching items"
},
{
"objectID": "materials/info/index.html",
"href": "materials/info/index.html",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "Title\n\n\nSubtitle\n\n\n\n\n\n\nWeek 1 Gamblers, God, Guinness and peas\n\n\nA brief history of statistics\n\n\n\n\nWeek 2 Revisiting Flatland\n\n\nA review of general linear models\n\n\n\n\nWeek 3 Dear Prudence, Help! I may be cheating with my X\n\n\nInteractions and the logic of causal inference\n\n\n\n\nWeek 4 The Y question\n\n\nGeneralised linear models\n\n\n\n\nWeek 5 Do we live in a simulation?\n\n\nBasic data simulation for statistical inference and power analysis\n\n\n\n\nWeek 6 Challenging hierarchies\n\n\nMultilevel models\n\n\n\n\nWeek 7 The unobserved\n\n\nLatent variables and structural models\n\n\n\n\nWeek 8 Words, words, mere words…\n\n\nText as data\n\n\n\n\n\n\nNo matching items\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w01.html",
"href": "materials/info/info_w01.html",
"title": "Week 1 Gamblers, God, Guinness and peas",
"section": "",
"text": "Readings\nTextbook readings\n\nROS: Chapters 1 and 2\nTSD: Chapters 1, 2 and 3 (“Foundations”)\nR4DS: Chapters 1-10 (“Whole game”)\n\nIntuition building\n\nJaynes, E. T. (2003). Probability theory: The logic of science. Cambridge University Press (available via the NU library)\n\nPreface: pp. xix-xxvii\nChapter 16 (“Orthodox methods: historical background”): pp. 490-506\n\nMcElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (2nd ed.). Taylor and Francis, CRC Press (available online)\n\nChapter 1: pp. 1-18\n\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w02.html",
"href": "materials/info/info_w02.html",
"title": "Week 2 Revisiting Flatland",
"section": "",
"text": "Readings\nStatistics\n\nROS: Chapters 3, 4, 6-12\nTSD: Chapter 12 (“Linear models”)\n\nCoding\n\nTSD: Chapters 9 and 11\nR4DS: Chapters 11, 12\n\nApplication\n\nÖsterman, Marcus. 2021. ‘Can We Trust Education for Fostering Trust? Quasi-Experimental Evidence on the Effect of Education and Tracking on Social Trust’. Social Indicators Research 154(1):211–33 - (online)\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w03.html",
"href": "materials/info/info_w03.html",
"title": "Week 3 Dear Prudence, Help! I may be cheating with my X",
"section": "",
"text": "Readings\nStatistics\n\nROS: Chapters 10-12, 18-20\n\nCoding\n\nTSD: Chapter 14\n\nApplication\n\nÖsterman, Marcus. 2021. ‘Can We Trust Education for Fostering Trust? Quasi-Experimental Evidence on the Effect of Education and Tracking on Social Trust’. Social Indicators Research 154(1):211–33 - (online)\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w04.html",
"href": "materials/info/info_w04.html",
"title": "Week 4 The Y question",
"section": "",
"text": "Readings\nStatistics\n\nROS: Chapters 13-15\n\nCoding\n\nTSD: Chapter 13\n\nApplication\n\nLadd, Jonathan McDonald, and Gabriel S. Lenz. 2009. ‘Exploiting a Rare Communication Shift to Document the Persuasive Power of the News Media’. American Journal of Political Science 53(2):394–410. doi: 10.1111/j.1540-5907.2009.00377.x.(published version should be accessible with university login; additional Appendix available here)\nWeiss, Alexa, Corinna Michels, Pascal Burgmer, Thomas Mussweiler, Axel Ockenfels, and Wilhelm Hofmann. 2021. ‘Trust in Everyday Life’. Journal of Personality and Social Psychology 121:95–114. doi: 10.1037/pspi0000334 (access preprint version here)\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w05.html",
"href": "materials/info/info_w05.html",
"title": "Week 5 Do we live in a simulation?",
"section": "",
"text": "Readings\n\nROS: Chapters 5 (pp. 69-76) and 16 (pp. 291-310)\nTSD: TDS makes extensive use of simulation methods for various purposes at different stages of a research project (e.g. from data preparation through statistical inference to sharing results and data openly). A search on a keyword stub “simulat” can point you various sections of interest that are all worth reading.\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w06.html",
"href": "materials/info/info_w06.html",
"title": "Week 6 Challenging hierarchies",
"section": "",
"text": "Readings\nTextbook\n\nARM: Chapters 11 (pp. 237-249) and 12 (pp. 251-278)\nTSD: Chapter section 15.2\n\nApplication\n\nValentino et al. (2017) Economic and cultural drivers of immigrant support worldwide. British Journal of Political Science, 49(4), 1201–1226. (The accepted manuscript version can be downloaded from here; Note: this version of the article also contains a brief “response to reviewers” by the authors, which you may find interesting)\n\n\n\nFurther readings\n\nARM: Chapters 13 (pp. 279-299), 14 (pp. 301-323) and 15 (pp. 325-342)\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w07.html",
"href": "materials/info/info_w07.html",
"title": "Week 7 The unobserved",
"section": "",
"text": "Readings\nTextbook\n\nChapters 13 and 14 in Mehmetoglu, M. & Mittner, M. (2022) Applied statistics using R: a guide for the social sciences. London: Sage (NCL library access here)\n\nVideo\n\nKubinec, R. (2019) An introduction to latent variable models for data science. Sage Research Methods (video file, 00:17:44) (NCL library access here)\n\nApplication\n\nEjrnæs, A., & Jensen, M. D. (2022) Go Your Own Way: The Pathways to Exiting the European Union. Government and Opposition, 57(2), 253-275. https://doi.org/10.1017/gov.2020.37 (The accepted manuscript version can be downloaded from here)\n\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/info/info_w08.html",
"href": "materials/info/info_w08.html",
"title": "Week 8 Words, words, mere words…",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w01.html",
"href": "materials/notes/draft-notes_w01.html",
"title": "Gamblers, God, Guinness and peas",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w02.html",
"href": "materials/notes/draft-notes_w02.html",
"title": "Revisiting Flatland",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w03.html",
"href": "materials/notes/draft-notes_w03.html",
"title": "Dear Prudence, Help! I may be cheating with my X",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w04.html",
"href": "materials/notes/draft-notes_w04.html",
"title": "The Y question",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w05.html",
"href": "materials/notes/draft-notes_w05.html",
"title": "Do we live in a simulation?",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w06.html",
"href": "materials/notes/draft-notes_w06.html",
"title": "Challenging hierarchies",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w07.html",
"href": "materials/notes/draft-notes_w07.html",
"title": "The unobserved",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/draft-notes_w08.html",
"href": "materials/notes/draft-notes_w08.html",
"title": "Words, words, mere words…",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/notes/index.html",
"href": "materials/notes/index.html",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "Title\n\n\nDescription\n\n\nReading Time\n\n\n\n\n\n\nGamblers, God, Guinness and peas\n\n\nIn the first contribution to a series of articles on the history of probability and statistics in the journal Biometrika, Florence Nightingale David (1955) (no linear relationship with the famous social reformer) paraphrased a contemporary archaeologist who quipped that “a symptom of decadence in a civilization is when men become interested in their own history”, giving the interest in his own discipline as proof of the validity of his statement. David, however, thought that this does not stand true also for scientists’ and statisticians’ own emerging interest in their disciplines. He was right, in that the critical examination of the intellectual development of statistics and probability theory that followed has improved the discipline by excavating ideas that had been buried by mainstream statistics, but he was also mistaken, in that this activity threw light on the decadence of mainstream statistical practice. In this lecture we will look back on the development of some basic statistical concepts and learn about the ideas and preoccupations that influenced them over the centuries. The aim of this overview is to build up essential intuition about the concepts and methods that we will learn later. Brains-on activities will include casting astragali, fighting Laplace’s Demon, tasting tea, and comparing peas in a pod. By the end, we will gain a clearer understanding of the limits of statistical analysis and the dangers of not acknowledging those limits.\nThe IT lab will provide a very hands-on practical introduction to the statistical software that will be used in the module.\n\n\n0 min\n\n\n\n\nRevisiting Flatland\n\n\nIn Edwin Abbott’s 1884 novella, the inhabitants of Flatland are geometric shapes living in a two-dimensional world, incapable of imagining the existence of higher dimensions. A sphere passing through the plain of their world is a fascinating but incomprehensible event: Flatlanders can only see a dot becoming a circle, increasing in circumference, then shrinking back in size and disappearing. There are, in this universe, worlds with even more limited views, like the one-dimensional Lineland and the zero-dimensional Pointland. Any attempt to expand the perspective of their inhabitant(s) is doomed to failure. But as in any good adventure story, a chosen Flatland native embarks on a journey of discovery and revelation - and ostracism and imprisonment. The story is interpreted as an allegorical criticism of Victorian-age social structure, but can equally describe the limitations of inhabiting uncritically a methodological world in which all data are ‘normal’ and all relationships are linear. Moving beyond linearity and acquiring the statistical intuition needed to think in higher dimensions and perceive more complex relationships is indeed a matter of practice-induced revelation. It’s unlikely that we will reach statistical nirvana in this short course, but we’ll attempt to build some more substantial structures upon the arid plains of linear regression. We start by looking around in the Flat-, Line- and Point-lands of quantitative analysis. Incorrigible procrastinators may want to check out a full-length computer animated film version of Flatland on YouTube. Others may be better served by this brief TED-Ed animation.\n\n\n0 min\n\n\n\n\nDear Prudence, Help! I may be cheating with my X\n\n\nMuch of what we do in quantitative data analysis is about examining relationships. We are often interested in proposing and testing models of relationships between two or more variables. Sometimes our variables cry out to us begging for help, and we turn into agony aunts and uncles to our data. Other times we must psychoanalyse our data to uncover hidden associations and interactions. This is not an easy task. Do it carelessly, and you may unwittingly cheat yourself and the readers of your research. This week we’ll build some intuition for detecting complex and uneasy relationships within the design matrix X - that promiscuous commune on the right-hand-side of our regression equations. We’ll expand on the linear additive models that we looked at in the previous week by considering interactions among our predictor variables, we’ll explore the possibilities and challenges of asking causal questions of observational data, and we’ll think about ways to avoid what evolutionary anthropologist Richard McElreath calls ‘causal salad’. We may get an uncomfortable feeling that we may have cheated with our Xs in the past, but we’ll look towards the future. By the way, Dear Prudence is Slate magazine’s advice column; I like the name because being prudent really is essential in data analysis and interpretation. If you’re done with the readings for this week, you may indulge in some Prudie advice on matters more serious than statistics.\n\n\n0 min\n\n\n\n\nThe Y question\n\n\nIt wasn’t until the last quarter of the 20th century that a unified vision of statistical modelling emerged, allowing practitioners to see how the general linear model we have explored so far is only a specific case of a more general class of models. We could have had a fancy, memorable name for this class of models - as John Nelder, one of its inventors, acknowledged later in life (Senn 2003, 127) - but back then academics were not required to undertake marketing training on the tweetabilty-factor of the chosen names for their theories; so we ended up with “generalised linear models”. These models can be applied to explananda (“explained”, “response”, “outcome”, “dependent” etc. variables, our ys) whose possible values have certain constraints (such as being limited by a lower bound or constrained to discreet choices) that makes the parameters of the Gaussian (‘normal’) distribution inefficient in describing them. Instead, they follow some of the other “exponential distributions” (and not only the exponential: cf. Gelman, Hill, and Vehtari (2020, 264)), of which the Poisson, gamma, beta, binomial and multinomial are probably the most common in human and social sciences research. Their “generalised linear modelling” involves mapping them unto a linear model using a so-called “link function”. We will explore what all of this means in practice and how it can be applied to data that we are interested in most in our respective fields of study.\n\n\n0 min\n\n\n\n\nDo we live in a simulation?\n\n\nWe have known ever since science-fiction author Philip K. Dick’s memorable “Metz address” of 1977 that our world is a computer simulation. Of course, like some common-currency theories in the social sciences, this knowledge will never be truly verified. We won’t even attempt to get to the bottom of it in class; instead, we’ll practice some basic methods of computer simulation for statistical inference and for generating data that has some idealised characteristics. Such methods play an increasingly important role in computational statistics and are extremely useful for designing robust data collection and analysis plans. If you make a mistake in the code and end up in an infinite loop, but you’re afraid that stopping the process may cause the known universe to implode, you can watch Dick on YouTube while you wait. If something like this can happen to our data, who says it couldn’t happen to us?\n\n\n0 min\n\n\n\n\nChallenging hierarchies\n\n\nBy now we got a sense that every new thing we learn about turns out to be merely a specific case of a larger class of things. So, all the models we covered so far are specific, single-level, versions of multilevel models, in which our cases can be seen as clustered within larger entities. Sometimes they are part of several cross-cutting clusters and/or the clusters are themselves clustered. In general terms, we must acknowledge that there are dependencies in our data that may influence their behaviour. It turns out that data about humans living in societies look somewhat like humans living in societies. The importance of including information about hierarchical dependencies in our models is probably emphasised by no one else more than McElreath (2020, 15), who wants “to convince the reader of something that appears unreasonable: multilevel regression deserves to be the default form of regression. Papers that do not use multilevel models should have to justify not using a multilevel approach.” We will encounter some of the uses and challenges of multilevel modelling.\n\n\n0 min\n\n\n\n\nThe unobserved\n\n\nThe unobserved sounds like the title of a promising horror film; if we have achieved our aims in the module so far, our horror should be ‘merely’ metaphysical by now (Kołakowski anyone? No? Okay, never mind). We have already had to deal with various aspects of latency in our analyses. At the most fundamental level, we speak about population parameters, but we never actually observe them; even a sample statistic can be a purely imaginary case that doesn’t occur in real life. We have discussed the effects of omitted variables, which are thus unobserved by our model, but which we may have access to in our data. And, of course, our most interesting measurements are likely to be proxies of some unobservable theoretical construct (Mulvin (2021) has recently published a wonderfully rich book about proxies in general). This week we pick up an earlier thread from week 4, where we thought about binary and ordered multinomial variables as discretised manifestations of some continuous ‘latent variable’. We expand on this idea by exploring simple and then more complex latent variable models (factor analysis, structural equation modelling), as a further generalisation of the hierarchical perspective introduced earlier. This gives us a few more tools to deal with our radical uncertainty. (n.b. missing data points are another challenge that could fall under this heading, and learning how to deal with them is extremely important; but “The missing” is too good a title not to deserve a high-budget, weak-storyline, full-on special effects sequel somewhere else)\n\n\n0 min\n\n\n\n\nWords, words, mere words…\n\n\nAs researchers in humanities and the social sciences, we use words both as tools of analysis and as sources of data. Words, and more broadly, texts, are also increasingly important for quantitative research in an age of so-called ‘big data’, when the digital world is saturated with unstructured textual information. But the statistical inspection of text is neither new, nor restricted to the humanistic tail of the social sciences. For example, a documented interest in the statistical study of literary style for the purposes of attributing authorship dates back to the mid-1850s (see El-Shagi and Jung 2015); and investors can use textual data such as minutes from the Bank of England’s Monetary Policy Committee’s deliberations to estimate future monetary policy decisions before they are actually taken (cf. Lord 1958). Methods for the collection and quantitative analysis of large-scale textual data are increasingly available, but their technical implementation is complex and requires efficient combination of humanistic subject knowledge and statistical expertise. Faced with words, one is understandably caught between Shakespeare’s Troilus and Wilde’s Dorian Gray. “Words, words, mere words, no matter from the heart; th’ effect doth operate another way. … My love with words and errors still she feeds, but edifies another with her deeds” - believed the betrayed Troilus. “Words! Mere words! How terrible they were! How clear, and vivid, and cruel! One could not escape from them. And yet what a subtle magic there was in them! They seemed to be able to give a plastic form to formless things, and to have a music of their own as sweet as that of viol or of lute. Mere words! Was there anything so real as words?” - pondered Dorian.\n\n\n0 min\n\n\n\n\n\n\nNo matching items\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/index.html",
"href": "materials/slides-frame/index.html",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "Title\n\n\nDescription\n\n\n\n\n\n\nWeek 1 Gamblers, God, Guinness and peas\n\n\n\n\n\n\nWeek 2 Revisiting Flatland\n\n\n\n\n\n\nWeek 3 Dear Prudence, Help! I may be cheating with my X\n\n\n\n\n\n\nWeek 4 The Y question\n\n\n\n\n\n\nWeek 5 Do we live in a simulation?\n\n\n\n\n\n\nWeek 6 Challenging hierarchies\n\n\n\n\n\n\nWeek 7 The unobserved\n\n\n\n\n\n\nWeek 8 Words, words, mere words…\n\n\n\n\n\n\n\n\nNo matching items\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w01.html",
"href": "materials/slides-frame/slides-frame_w01.html",
"title": "Week 1 Gamblers, God, Guinness and peas",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w02.html",
"href": "materials/slides-frame/slides-frame_w02.html",
"title": "Week 2 Revisiting Flatland",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w03.html",
"href": "materials/slides-frame/slides-frame_w03.html",
"title": "Week 3 Dear Prudence, Help! I may be cheating with my X",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w04.html",
"href": "materials/slides-frame/slides-frame_w04.html",
"title": "Week 4 The Y question",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w05.html",
"href": "materials/slides-frame/slides-frame_w05.html",
"title": "Week 5 Do we live in a simulation?",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w06.html",
"href": "materials/slides-frame/slides-frame_w06.html",
"title": "Week 6 Challenging hierarchies",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w07.html",
"href": "materials/slides-frame/slides-frame_w07.html",
"title": "Week 7 The unobserved",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides-frame/slides-frame_w08.html",
"href": "materials/slides-frame/slides-frame_w08.html",
"title": "Week 8 Words, words, mere words…",
"section": "",
"text": "View the slides full-screen in a standalone browser window here. The lecture recording is available on ReCap (requires Newcastle University login)\n\n\n\n\n\nReferences\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/slides/dice-calculations.html#the-number-of-possible-outcomes-from-throwing-three-dice-without-counting-the-order",
"href": "materials/slides/dice-calculations.html#the-number-of-possible-outcomes-from-throwing-three-dice-without-counting-the-order",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "The number of possible outcomes from throwing three dice, without counting the order",
"text": "The number of possible outcomes from throwing three dice, without counting the order\nThe number of possible outcomes when throwing three six-sided dice, without considering the order of the numbers, is nCr(6+3-1, 3), where nCr represents the number of combinations.\nIn this case, nCr(6+3-1, 3) = nCr(8, 3) = 56. So there are 56 possible outcomes.\nThe formula nCr(n, r) represents the number of combinations of r elements from a set of n elements.\nIn the formula nCr(6+3-1, 3), n is 6+3-1 = 8, which represents the total number of elements in the set when considering all the possible outcomes from rolling three six-sided dice. The -1 in the formula accounts for the fact that we’re considering repeating dice rolls.\nr is 3, which represents the number of elements we want to choose (in this case, the number of dice rolls we want to consider in a combination).\nSo, nCr(6+3-1, 3) gives us the number of possible combinations of 3 dice rolls from a set of 8 possible outcomes (the 6 possible outcomes from each dice roll plus 2 duplicates). This is also known as the number of ways to choose 3 dice rolls out of 8 possible outcomes.\nThe result nCr(8, 3) = 56 means there are 56 possible combinations of 3 dice rolls from 8 possible outcomes.\nHow the ‘n’ is calculated?\nThe n in nCr(n, r) represents the total number of elements in the set from which we want to choose r elements.\nIn the case of three six-sided dice, each dice has 6 possible outcomes. If we roll three dice, the total number of possible outcomes is 6 * 6 * 6 = 216. However, because we don’t care about the order in which the dice land, we need to remove the duplicate combinations that would result from simply rearranging the order of the same three outcomes.\nTo account for duplicates, we consider the problem as choosing r elements from a set of n elements, where n is the total number of unique combinations that can be made from rolling three dice.\nTo calculate n, we can use the formula n = n + r - 1, where n is the total number of outcomes from rolling the dice (in this case, 6), and r is the number of dice we’re rolling (in this case, 3).\nSo in this case, n = 6 + 3 - 1 = 8. This means that we have 8 unique combinations of dice rolls, and we can choose 3 dice rolls from this set to form a combination. And the number of combinations of 3 dice rolls from this set of 8 unique combinations is nCr(8, 3) = 56.\nHow is ‘n = n + r - 1’ is derived?\nThe formula n = n + r - 1 is used to find the number of unique combinations of r elements from a set of n elements, when the elements can be repeated.\nThe reasoning behind this formula is as follows:\nImagine you have n distinct elements, and you want to form combinations of r elements. In this case, the number of unique combinations is simply nCr(n, r).\nHowever, when elements can be repeated, the number of unique combinations is not simply nCr(n, r), because the same combination of elements can be formed by rearranging the same r elements in different orders. To account for this, we need to consider the problem as choosing r elements from a set of n + r - 1 elements, where n + r - 1 is the total number of unique combinations that can be made from the n elements.\nThe -1 in the formula accounts for the fact that we’re considering repeating elements, so we need to remove one duplicate combination that would result from simply rearranging the same r elements.\nSo in the case of three six-sided dice, each dice has 6 possible outcomes, giving us n = 6 unique elements. If we roll three dice, we have r = 3 elements. And to find the number of unique combinations of r elements from a set of n elements, where elements can be repeated, we use the formula n = n + r - 1, giving us n = 6 + 3 - 1 = 8.\nThis means that there are 8 unique combinations of dice rolls that can be formed from rolling three six-sided dice, and we can choose 3 dice rolls from this set to form a combination. The number of combinations of 3 dice rolls from this set of 8 unique combinations is nCr(8, 3) = 56.\nHere’s an empirical demonstration in R using the formula:\n\n# Define the number of unique elements and the number of elements to choose\nn <- 6\nr <- 3\n\n# Calculate the number of unique combinations using the formula n = n + r - 1\nm <- n + r - 1\n\n# Calculate the number of combinations using the combination formula nCr(m, r)\ncombinations <- choose(m, r)\ncombinations\n\n[1] 56\n\n\nThis code returns the value 56, which is the number of unique combinations of r elements from a set of m elements, where m = n + r - 1 and n is the number of unique elements and r is the number of elements to choose. This is the same result as we found earlier using nCr(8, 3).\nAn empirical demonstration in R by enumerating all possible combinations and counting the number of unique combinations (without the choose function) (Not correct!!)\n\n# Define the number of unique elements and the number of elements to choose\nn <- 6\nr <- 3\n\n# Calculate the number of unique combinations using the formula n = n + r - 1\nm <- n + r - 1\n\n# Generate all possible combinations of length r from 1 to m\nall_combinations <- combn(1:m, r)\n\n# Convert the combinations to a matrix for easier manipulation\ncombination_matrix <- as.matrix(all_combinations)\n\n# Find the unique rows in the combination matrix\nunique_combinations <- unique(combination_matrix)\n\n# Count the number of unique combinations\nnum_unique_combinations <- nrow(unique_combinations)\nnum_unique_combinations\n\n[1] 3\n\n\nAn empirical demonstation by first finding all possible outcomes and then selecting out the duplicates: (not correct!!)\n\n# Define the number of unique elements and the number of elements to choose\nn <- 6\nr <- 3\n\n# Generate all possible combinations of length r from 1 to n\nall_combinations <- combn(1:n, r)\n\n# Convert the combinations to a matrix for easier manipulation\ncombination_matrix <- as.matrix(all_combinations)\n\n# Find the unique rows in the combination matrix\nunique_combinations <- unique(combination_matrix)\n\n# Count the number of unique combinations\nnum_unique_combinations <- nrow(unique_combinations)\nnum_unique_combinations\n\n[1] 3"
},
{
"objectID": "materials/slides/w1.html#not-an-outline-slide",
"href": "materials/slides/w1.html#not-an-outline-slide",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Not an outline slide",
"text": "Not an outline slide\n\n\nGaming chance\nsecond topic\nThird topic\nForth topic"
},
{
"objectID": "materials/slides/w1.html#section",
"href": "materials/slides/w1.html#section",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "Gaming chance"
},
{
"objectID": "materials/slides/w1.html#testing",
"href": "materials/slides/w1.html#testing",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Testing",
"text": "Testing\n\n\nTesting\n\n\nhow\n\n\nfragments work in\n\n\nreality"
},
{
"objectID": "materials/slides/w1.html#testing-2",
"href": "materials/slides/w1.html#testing-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Testing 2",
"text": "Testing 2\n\nTesting\n. . .\nHow\n. . .\nfragments work\n. . .\nreally"
},
{
"objectID": "materials/slides/w1.html#statistics",
"href": "materials/slides/w1.html#statistics",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Statistics",
"text": "Statistics\n\nand the state"
},
{
"objectID": "materials/slides/w1.html#statistics-1",
"href": "materials/slides/w1.html#statistics-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Statistics",
"text": "Statistics\nand probability\n\n\n\nStatistics as the mathematical science of using probability to describe uncertainty"
},
{
"objectID": "materials/slides/w1.html#gaming-chance",
"href": "materials/slides/w1.html#gaming-chance",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Gaming chance",
"text": "Gaming chance\n\n\n\nWe may never know when humans started playing games of chance, but archaeological findings suggest it was a rather long time ago\nDuring the the First Dynasty in Egypt (c. 3500 B.C.) variants of a game involving astragali (small bones in the ankle of an animal) were already documented\nOne of the chief games may have been the simple one of throwing four astragali together and noting which sides fell uppermost"
},
{
"objectID": "materials/slides/w1.html#ālea-iacta-est",
"href": "materials/slides/w1.html#ālea-iacta-est",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Ālea iacta est",
"text": "Ālea iacta est\n\n\n\n\nThe six-sided die we know today may have been obtained from the astragalus by grinding it down until it formed a rough cube\nDice became common in the Ptolemaic dynasty (300 to 30 B.C.)\nThere is evidence that dice were used for divination rites in this period - one carried the sacred symbols of Osiris, Horus, Isis, Nebhat, Hathor and Horhudet engraved on its six sides\nIn Roman times, rule by divination attained great proportions; Emperors Septimius Severus (Emperor A.D. 193-211) and Diocletian (Emperor AD. 284-305) were notorious for their reliance on the whims of the gods"
},
{
"objectID": "materials/slides/w1.html#fat-chance",
"href": "materials/slides/w1.html#fat-chance",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Fat chance",
"text": "Fat chance\n\n\n\nHe threw four knucklebones on to the table and committed his hopes to the throw. If he threw well, particularly if he obtained the image of the goddess herself, no two showing the same number, he adored the goddess, and was in high hopes of gratifying his passion; if he threw badly, as usually happens, and got an unlucky combination, he called down imprecations on all Cnidos, and was as much overcome by grief as if he had suffered some personal loss.\n— Lucian of Samosata (c. 125 – 180), writing in his trademark satirical style about a young man who fell in love with Praxiteles’s Aphrodite of Knidos; cited in F. N. David (1955:8)"
},
{
"objectID": "materials/slides/w1.html#chance-with-limitations",
"href": "materials/slides/w1.html#chance-with-limitations",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Chance with limitations",
"text": "Chance with limitations\n\n\n\nDice were sometimes faked. Sometimes numbers were left off or duplicated; hollow dice have been found dating from Roman time\nDice were also imperfect; a “fair” die was the exception rather than the rule\nExperiment by F. N. David using three dice from the British Museum:"
},
{
"objectID": "materials/slides/w1.html#exercise",
"href": "materials/slides/w1.html#exercise",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Exercise",
"text": "Exercise\n\n\nWhich of the three dice (if any) would you call “fair”?\nWhat distribution of outcomes would you expect 204 fair dice rolls to produce prior to seeing any results?\nHow would you expect that distribution to change as the number of rolls progresses towards \\(\\infty\\)?\nWhat name would you give to that distribution?\nverv\nrever"
},
{
"objectID": "materials/slides/w1.html#from-chance-to-probability",
"href": "materials/slides/w1.html#from-chance-to-probability",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "From chance to probability\n",
"text": "From chance to probability\n\n\n\n\nUntil 18th century people had mostly used probability to solve problems about dice throwing and other games of chance\nJacob (Jacques/James) Bernoulli (1654/1655-1705), a Swiss mathematician trained as a theologian and ordained as a minister of the Reformed church in Basel, began asking questions about probabilistic inference instead\nHis work focused on the mathematics of uncertainty - what he came to call “stochastics” (from the Greek word \\(στόχος\\) [stókhos] meaning to “aim” or “guess’)\n\nArs Conjectandi (The Art of Conjecturing) - published posthumously in 1713"
},
{
"objectID": "materials/slides/w1.html#inferential-questions",
"href": "materials/slides/w1.html#inferential-questions",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Inferential questions",
"text": "Inferential questions\n\n\n\nSuppose you are presented with a large urn full of tiny white and black pebbles, in a ratio that’s unknown to you. You begin selecting pebbles from the urn and recording their colors, black or white. How do you use these results to make a guess about the ratio of pebble colors in the urn as a whole?\n\n\nBernoulli’s solution: if you take a large enough sample, you can be very sure, to within a small margin of absolute certainty, that the proportion of white pebbles you observe in the sample is close to the proportion of white pebbles in the urn.\nA first version of the Law of Large Numbers"
},
{
"objectID": "materials/slides/w1.html#large-numbers",
"href": "materials/slides/w1.html#large-numbers",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Large numbers",
"text": "Large numbers\n\n\nBernoulli’s solution, more technically: For any given \\(\\epsilon\\) > 0 and any \\(s\\) > 0, there is a sample size \\(n\\) such that, with \\(w\\) being the number of white pebbles counted in the sample and \\(f\\) being the true fraction of white pebbles in the urn, the probability of \\(w/n\\) falling between \\(f − \\epsilon\\) and \\(f + \\epsilon\\) is greater than \\(1 − s\\).\nthe fraction \\(w/n\\) is the ratio of white to total pebbles we observe in our sample\n\\(\\epsilon\\) (epsilon) captures the fact that we may not see the true urn ratio exactly thanks to random variation in the sample; larger samples help assure that we get closer to the “true” value, but uncertainty always remains\n\\(s\\) reflects just how sure we want to be; for example, set \\(s\\) = 0.01 and be 99% percent sure.\n“moral certainty” as distinct from absolute certainty of the kind logical deduction provides"
},
{
"objectID": "materials/slides/w2.html#guessing-game",
"href": "materials/slides/w2.html#guessing-game",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Guessing game",
"text": "Guessing game\n\n\n\n\n\n\n152 140 137 157 145 164 149 169 148 165 154 151 145 150 150 163 157 144 122 105 86 161 156 130 109 146 149 147 137 126 114 148 162 146 146 153 143 143 148 161 152 163 171 147 148 145 122 129 98 154 144 147 157 127 110 98 166 152 142 159 156 164 152 161 154 145 145 152 164 144 130 130 154 143 146 167 158 91 166 150 148 138 155 161 162 148 114 159 149 137 158 145 157 179 119 170 146 147 113 163 134 152 160 150 143 167 159 155 149 111 112 163 152 124 112 86 170 146 159 151 161 170 159 74 150 153 97 162 163 149 117 100 163 162 145 163 151 150 142 171 91 157 152 149 130 147 145 122 114 157 154 121 116 167 143 152 97 160 159 150 161 161 149 125 141 155 142 160 150 156 104 95 156 153 167 150 148 159 162 156 159 147 173 166 142 143 133 128 119 152 157 149 157 150 148 102 153 161 149 114 101 138 91 163 149 159 150 158 156 149 144 154 131 157 157 154 108 168 145 148 101 113 149 155 163 157 123 161 145 144 149 110 150 166 144 157 154 164 156 154 135 144 114 163 146 121 155 145 107 147 152 164 166 156 152 140 158 163 151 171 150 164 142 94 149 105 146 161 163 145 145 171 127 159 159 154 160 150 149 127 143 142 147 163 164 160 154 167 151 148 125 111 153 139 152 155 148 144 118 144 93 148 156 150 156 154 131 102 157 169 150 112 160 168 144 145 160 147 164 153 149 160 149 85 84 60 93 111 91 154 100 62 82 97 80 150 152 141 88 158 149 152 155 124 104 161 149 97 93 161 157 167 157 91 60 137 152 152 81 109 71 89 67 85 70 162 152 89 90 72 84 159 142 142 169 123 75 74 91 160 68 136 158 85 93 152 156 154 157 120 114 84 156 137 114 94 168 148 140 157 76 66 161 114 146 161 70 134 68 150 163 149 149 162 154 69 151 164 153 152 132 156 140 159 143 84 152 161 128 161 145 132 118 160 155 161 166 158 155 98 64 161 147 147 147 173 158 147 125 106 166 150 76 162 140 67 63 164 148 160 155 152 62 146 152 157 56 61 152 145 118 78 161 151 122 93 154 147 140 157 91 155 144 83 158 147 124 89 160 137 165 155 111 154 145 142 145 164 161 155 161 170 150 124 85 161 155 106 126 166 148 124 90 102 152 149 154 54 147 57 101 122 82 155 156 133 125 102 161 146 133 88 156 152 163 115 68 143 77 145 163 156 71 159\n\n\n\n\nWhat are the most appropriate summary statistics for this sample?\nWhat is the Mean (\\(\\bar{x}\\)) of this dataset? What is its standard deviation (\\(s\\))?"
},
{
"objectID": "materials/slides/w2.html#revised-guesses",
"href": "materials/slides/w2.html#revised-guesses",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Revised guesses",
"text": "Revised guesses\n\n\nThe data consist of 544 measurements of human height\nThe mean of the data is 138.2635963\nThe standard deviation is 27.6024476"
},
{
"objectID": "materials/slides/w2.html#kung-demography",
"href": "materials/slides/w2.html#kung-demography",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "!Kung demography",
"text": "!Kung demography\n\n\nthis is a bulletpoint\nanother bullet\nfinal one bites the bullet"
},
{
"objectID": "materials/slides/w2.html#section",
"href": "materials/slides/w2.html#section",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "library(patchwork)\n\ny <- tibble(y)\n\nggplot(y, aes(x = y)) + scale_x_continuous(n.breaks = 20, limits = c(50, 180)) +\n geom_boxplot() + theme_void() +\nggplot(y, aes(x = y)) + \n geom_histogram() + scale_x_continuous(n.breaks = 20, limits = c(50, 180)) + scale_y_continuous(n.breaks = 10) +\n geom_vline(aes(xintercept = median(y))) +\n#box + hist + \n plot_layout(nrow = 2, heights = c(0.2, 4))"
},
{
"objectID": "materials/slides/w2.html#section-1",
"href": "materials/slides/w2.html#section-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "ggplot(y, aes(x = y)) + \n geom_histogram() + scale_x_continuous(n.breaks = 20) + scale_y_continuous(n.breaks = 10) +\n geom_vline(aes(xintercept = median(y)))\n\n\ny <- y |> mutate(height = y, age = howell$age, male = as_factor(howell$male))\n\nggplot(y, aes(x = height, fill = factor(round(age)))) + guides(fill = FALSE) +\n geom_histogram() + scale_x_continuous(n.breaks = 20) + scale_y_continuous(n.breaks = 10) + \n scale_fill_grey(start = 0.9, end = 0.1)"
},
{
"objectID": "materials/slides/w2.html#section-2",
"href": "materials/slides/w2.html#section-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "y <- y |> mutate(x = howell$age, z = as_factor(howell$male))\n\n#theme_set(theme_bw())\n\nplot(y, aes(x = x, y = y, colour = z)) + \n geom_point(alpha = 0.7) + scale_x_continuous(n.breaks = 10) + scale_y_continuous(n.breaks = 10) + \n labs(title = \"\", x = \"x\", y = \"y\", colour = \"\") + \n theme(legend.position = c(0.92,0.16)) +\n scale_color_manual(name = \"Sex\", labels = c(\"Female\", \"Male\"), values = c(100,200))\n\n#+ scale_color_brewer(palette = \"Dark2\") \n \n \n #xlab(\"\\nx\") + ylab(\"y\\n\") + scale_color_brewer(palette = \"Dark2\") #+ scale_colour_manual(\"\", values = c(1, 2), labels = c(\"Female\", \"Male\"))"
},
{
"objectID": "materials/slides/w2.html#section-3",
"href": "materials/slides/w2.html#section-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "summary(howell)\n\n height weight age male \n Min. : 53.98 Min. : 4.252 Min. : 0.00 Min. :0.0000 \n 1st Qu.:125.09 1st Qu.:22.008 1st Qu.:12.00 1st Qu.:0.0000 \n Median :148.59 Median :40.058 Median :27.00 Median :0.0000 \n Mean :138.26 Mean :35.611 Mean :29.34 Mean :0.4724 \n 3rd Qu.:157.48 3rd Qu.:47.209 3rd Qu.:43.00 3rd Qu.:1.0000 \n Max. :179.07 Max. :62.993 Max. :88.00 Max. :1.0000 \n\n\n\n\nsummary(lm(height ~ weight, data = howell))\n\n\nCall:\nlm(formula = height ~ weight, data = howell)\n\nResiduals:\n Min 1Q Median 3Q Max \n-28.9634 -5.7794 0.7503 6.7207 20.7799 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \n(Intercept) 75.4359 1.0517 71.72 <2e-16 ***\nweight 1.7643 0.0273 64.63 <2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nResidual standard error: 9.363 on 542 degrees of freedom\nMultiple R-squared: 0.8851, Adjusted R-squared: 0.8849 \nF-statistic: 4177 on 1 and 542 DF, p-value: < 2.2e-16\n\n\n\n\nsummary(lm(height ~ weight + age + male, data = howell))\n\n\nCall:\nlm(formula = height ~ weight + age + male, data = howell)\n\nResiduals:\n Min 1Q Median 3Q Max \n-29.011 -5.409 0.730 6.490 19.735 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \n(Intercept) 75.94238 1.06471 71.327 < 2e-16 ***\nweight 1.65634 0.03740 44.289 < 2e-16 ***\nage 0.11247 0.02621 4.291 2.11e-05 ***\nmale 0.07931 0.80942 0.098 0.922 \n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nResidual standard error: 9.222 on 540 degrees of freedom\nMultiple R-squared: 0.889, Adjusted R-squared: 0.8884 \nF-statistic: 1441 on 3 and 540 DF, p-value: < 2.2e-16"
},
{
"objectID": "materials/slides/w4.html#revisiting-trust-end-education",
"href": "materials/slides/w4.html#revisiting-trust-end-education",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Revisiting trust end education\n",
"text": "Revisiting trust end education\n\n\nosterman <- sjlabelled::read_stata(\"https://cgmoreh.github.io/HSS8005-data/osterman.dta\")\n\nggplot(osterman, aes(y = trustindex3, x = eduyrs25)) +\n geom_jitter(alpha = 0.03) +\n geom_smooth(method = \"lm\") + \n scale_y_continuous(n.breaks = 15) + \n scale_x_continuous(n.breaks = 10)"
},
{
"objectID": "materials/slides/w4.html#three-linear-models",
"href": "materials/slides/w4.html#three-linear-models",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Three linear models",
"text": "Three linear models\n\nlm_data <- osterman |> select(trustindex3, eduyrs25, agea, female) |> \n mutate(med_trust = trustindex3 - median(trustindex3)) |> \n mutate(d_trust = trustindex3 >= median(trustindex3))"
},
{
"objectID": "materials/slides/w4.html#three-linear-models-1",
"href": "materials/slides/w4.html#three-linear-models-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Three linear models",
"text": "Three linear models\n\nlm_data <- osterman |> select(trustindex3, eduyrs25, agea, female) |> \n mutate(med_trust = trustindex3 - median(trustindex3)) |> \n mutate(d_trust = trustindex3 >= median(trustindex3))\n\nlm1 <- lm(trustindex3 ~ eduyrs25 + agea + female, lm_data)\nlm2 <- lm(med_trust ~ eduyrs25 + agea + female, lm_data)\nlm3 <- lm(d_trust ~ eduyrs25 + agea + female, lm_data)"
},
{
"objectID": "materials/slides/w4.html#three-linear-models-2",
"href": "materials/slides/w4.html#three-linear-models-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Three linear models",
"text": "Three linear models\n\nlm_data <- osterman |> select(trustindex3, eduyrs25, agea, female) |> \n mutate(med_trust = trustindex3 - median(trustindex3)) |> \n mutate(d_trust = trustindex3 >= median(trustindex3))\n\nlm1 <- lm(trustindex3 ~ eduyrs25 + agea + female, lm_data)\nlm2 <- lm(med_trust ~ eduyrs25 + agea + female, lm_data)\nlm3 <- lm(d_trust ~ eduyrs25 + agea + female, lm_data)\n\nlm_table <- modelsummary::modelsummary(list(\n \"lm\" = lm1, \n \"lm_cent\" = lm2, \n \"lm_prob\" = lm3), \n statistic = 'conf.int')\n\nlm_table"
},
{
"objectID": "materials/slides/w4.html#three-linear-models-3",
"href": "materials/slides/w4.html#three-linear-models-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Three linear models",
"text": "Three linear models\n\n\n\n\n\n \n lm \n lm_cent \n lm_prob \n \n\n\n (Intercept) \n 2.955 \n −2.379 \n 0.009 \n \n\n \n [2.871, 3.039] \n [−2.463, −2.295] \n [−0.013, 0.031] \n \n\n eduyrs25 \n 0.116 \n 0.116 \n 0.027 \n \n\n \n [0.113, 0.120] \n [0.113, 0.120] \n [0.026, 0.028] \n \n\n agea \n 0.016 \n 0.016 \n 0.004 \n \n\n \n [0.014, 0.017] \n [0.014, 0.017] \n [0.003, 0.004] \n \n\n female \n 0.041 \n 0.041 \n 0.010 \n \n\n \n [0.013, 0.069] \n [0.013, 0.069] \n [0.003, 0.018] \n \n\n\n\n\n\ntidy(m_logit) |> select(term, estimate) |> rename(logit = estimate) |> \n add_column(\n tidy(m_probit) |> select(estimate) |> rename(\"probit <br> longer name\" = estimate)) |> kable(escape = FALSE)"
},
{
"objectID": "materials/slides/w4.html#the-logit-model",
"href": "materials/slides/w4.html#the-logit-model",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "The logit model",
"text": "The logit model\n\nm_logit <- glm(d_trust ~ eduyrs25 + agea + female, family = binomial(link = \"logit\"), data = lm_data)\nplot_predictions(m_logit, condition = c(\"eduyrs25\"))"
},
{
"objectID": "materials/slides/w4.html#the-probit-model",
"href": "materials/slides/w4.html#the-probit-model",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "The probit model",
"text": "The probit model\n\nm_probit <- glm(d_trust ~ eduyrs25 + agea + female, family = binomial(link = \"probit\"), data = lm_data)\nplot_predictions(m_probit, condition = c(\"eduyrs25\"))"
},
{
"objectID": "materials/slides/w4.html#model-comparisons",
"href": "materials/slides/w4.html#model-comparisons",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Model comparisons",
"text": "Model comparisons\n\n\n\n\n\n \n lm \n lm_cent \n lm_prob \n Logit \n Probit \n \n\n\n (Intercept) \n 2.955 \n −2.379 \n 0.009 \n −2.103 \n −1.289 \n \n\n \n [2.871, 3.039] \n [−2.463, −2.295] \n [−0.013, 0.031] \n [−2.201, −2.006] \n [−1.348, −1.230] \n \n\n eduyrs25 \n 0.116 \n 0.116 \n 0.027 \n 0.116 \n 0.071 \n \n\n \n [0.113, 0.120] \n [0.113, 0.120] \n [0.026, 0.028] \n [0.112, 0.120] \n [0.069, 0.074] \n \n\n agea \n 0.016 \n 0.016 \n 0.004 \n 0.015 \n 0.009 \n \n\n \n [0.014, 0.017] \n [0.014, 0.017] \n [0.003, 0.004] \n [0.014, 0.017] \n [0.009, 0.010] \n \n\n female \n 0.041 \n 0.041 \n 0.010 \n 0.042 \n 0.026 \n \n\n \n [0.013, 0.069] \n [0.013, 0.069] \n [0.003, 0.018] \n [0.011, 0.073] \n [0.007, 0.046]"
},
{
"objectID": "materials/slides/w4.html#probabilities-odds-log-odds-logit-odds-ratios",
"href": "materials/slides/w4.html#probabilities-odds-log-odds-logit-odds-ratios",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Probabilities, Odds, Log-odds (logit), Odds Ratios",
"text": "Probabilities, Odds, Log-odds (logit), Odds Ratios\n\n\n\n\n\n\n\\(prob = \\{0, \\dots, 1\\}\\)\n\n\\(odds = {prob \\over (1-prob)}\\)\n\n\\(log\\_odds = \\ln(odds)\\)\n\n\\(odds = {\\rm e}^{\\ln(odds)}\\)\n\n\\(odds\\_ratio = {odds1 \\over odds2} = {{prob1 \\over (1-prob1)} \\over {prob2 \\over (1-prob2)}}\\)\n\n\n\n\n\n\n\n probs \n odds \n log_odds \n \n\n\n 0.01 \n 0.01 \n -4.60 \n \n\n 0.05 \n 0.05 \n -2.94 \n \n\n 0.10 \n 0.11 \n -2.20 \n \n\n 0.20 \n 0.25 \n -1.39 \n \n\n 0.33 \n 0.50 \n -0.69 \n \n\n 0.40 \n 0.67 \n -0.41 \n \n\n 0.50 \n 1.00 \n 0.00 \n \n\n 0.60 \n 1.50 \n 0.41 \n \n\n 0.67 \n 2.00 \n 0.69 \n \n\n 0.80 \n 4.00 \n 1.39 \n \n\n 0.90 \n 9.00 \n 2.20 \n \n\n 0.95 \n 19.00 \n 2.94 \n \n\n 0.99 \n 99.00 \n 4.60"
},
{
"objectID": "materials/slides/w4.html#probabilities-odds-log-odds-logit-odds-ratios-1",
"href": "materials/slides/w4.html#probabilities-odds-log-odds-logit-odds-ratios-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Probabilities, Odds, Log-odds (logit), Odds Ratios",
"text": "Probabilities, Odds, Log-odds (logit), Odds Ratios\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n probs \n odds \n log_odds \n \n\n\n 0.01 \n 0.01 \n -4.60 \n \n\n 0.05 \n 0.05 \n -2.94 \n \n\n 0.10 \n 0.11 \n -2.20 \n \n\n 0.20 \n 0.25 \n -1.39 \n \n\n 0.33 \n 0.50 \n -0.69 \n \n\n 0.40 \n 0.67 \n -0.41 \n \n\n 0.50 \n 1.00 \n 0.00 \n \n\n 0.60 \n 1.50 \n 0.41 \n \n\n 0.67 \n 2.00 \n 0.69 \n \n\n 0.80 \n 4.00 \n 1.39 \n \n\n 0.90 \n 9.00 \n 2.20 \n \n\n 0.95 \n 19.00 \n 2.94 \n \n\n 0.99 \n 99.00 \n 4.60"
},
{
"objectID": "materials/slides/w4.html#poisson-model",
"href": "materials/slides/w4.html#poisson-model",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Poisson model",
"text": "Poisson model\n\n\n# A tibble: 78 × 3\n department course number_of_A\n <chr> <chr> <int>\n 1 1 DEP_1_a 4\n 2 1 DEP_1_b 2\n 3 1 DEP_1_c 5\n 4 1 DEP_1_d 4\n 5 1 DEP_1_e 1\n 6 1 DEP_1_f 4\n 7 1 DEP_1_g 3\n 8 1 DEP_1_h 3\n 9 1 DEP_1_i 3\n10 1 DEP_1_j 3\n# … with 68 more rows"
},
{
"objectID": "materials/slides/w4.html#poisson-model-1",
"href": "materials/slides/w4.html#poisson-model-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Poisson model",
"text": "Poisson model"
},
{
"objectID": "materials/slides/w4.html#poisson-model-2",
"href": "materials/slides/w4.html#poisson-model-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Poisson model",
"text": "Poisson model\n\ngrades_base <-\n glm(\n number_of_A ~ department,\n data = count_of_A,\n family = \"poisson\"\n )\nsummary(grades_base)\n\n\nCall:\nglm(formula = number_of_A ~ department, family = \"poisson\", data = count_of_A)\n\nDeviance Residuals: \n Min 1Q Median 3Q Max \n-2.61555 -0.69944 -0.09568 0.60343 2.28141 \n\nCoefficients:\n Estimate Std. Error z value Pr(>|z|) \n(Intercept) 1.3269 0.1010 13.135 < 2e-16 ***\ndepartment2 0.8831 0.1201 7.353 1.94e-13 ***\ndepartment3 1.7029 0.1098 15.505 < 2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\n(Dispersion parameter for poisson family taken to be 1)\n\n Null deviance: 426.201 on 77 degrees of freedom\nResidual deviance: 75.574 on 75 degrees of freedom\nAIC: 392.55\n\nNumber of Fisher Scoring iterations: 4"
},
{
"objectID": "materials/slides/w5.html#the-uses-of-simulation-methods",
"href": "materials/slides/w5.html#the-uses-of-simulation-methods",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "The uses of simulation methods",
"text": "The uses of simulation methods\n\nSimulating data uses generating random data sets with known properties using code (or some other method). This can be useful in various contexts.\n\n\nTo better understand our models. Probability models mimic variation in the world, and the tools of simulation can help us better understand this variation. Patterns of randomness are contrary to normal human thinking and simulation helps in training our intuitions about averages and variation\n\nTo run statistical analyses (e.g., simulating a null distribution against which to compare a sample)\n\nTo approximate the sampling distribution of data and propagate this to the sampling distribution of statistical estimates and procedures"
},
{
"objectID": "materials/slides/w5.html#distribution-functions",
"href": "materials/slides/w5.html#distribution-functions",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Distribution functions",
"text": "Distribution functions\n\nBase R functions:\n\nrnorm(): sampling from a normal distribution\nrunif(): sampling from a uniform distribution\nrbinom(): sampling from a binomial distribution\nrpois(): sampling from a Poisson distribution\n\n(Other distributions are also available)\n\nsample(): sampling elements from an R object with or without replacement\nreplicate(): often plays a role in conjunction with sampling functions; it is used to evaluate an expression N number of times repeatedly\n\nFrom non-base packages:\n\n\nMASS::mvtnorm(): multivariate normal; sampling multiple variables with a known correlation structure (i.e., we can tell R how variables should be correlated with one another) and normally distributed errors"
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-uniform-distribution",
"href": "materials/slides/w5.html#sampling-from-a-uniform-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a uniform distribution",
"text": "Sampling from a uniform distribution\nThe runif function returns some number (n) of random numbers from a uniform distribution with a range from \\(a\\) (min) to \\(b\\) (max) such that \\(X\\sim\\mathcal U(a,b)\\) (verbally, \\(X\\) is sampled from a uniform distribution with the parameters \\(a\\) and \\(b\\)), where \\(-\\infty < a < b < \\infty\\) (verbally, \\(a\\) is greater than negative infinity but less than \\(b\\), and \\(b\\) is finite). The default is to draw from a standard uniform distribution (i.e., \\(a = 0\\) and \\(b = 1\\)):\n\n\n# Sample a vector of ten numbers and store the results in the object `rand_unifs`\n# Note that the numbers will be different each time we re-run the `runif` function above.\n# If we want to recreate the same sample, we should set a `seed` number first\n\nrand_unifs <- runif(n = 10000, min = 0, max = 1);\n\n\n\nThe first 40 numbers from the sample are:\n\n\n [1] 0.73389592 0.77027279 0.12883356 0.62677799 0.07682038 0.08668081\n [7] 0.95609747 0.76159718 0.55481559 0.61747149 0.25032236 0.19532391\n[13] 0.16115864 0.97814687 0.99120674 0.09791592 0.93735431 0.53521339\n[19] 0.47323976 0.32125960 0.04244730 0.59705072 0.07353607 0.76877016\n[25] 0.38614356 0.67211119 0.26172603 0.32942547 0.92414770 0.28457958\n[31] 0.25625157 0.26928066 0.66945283 0.08099618 0.27268495 0.60555933\n[37] 0.07795224 0.30725433 0.47694105 0.34310998 0.94421787 0.12908665\n[43] 0.18597160 0.14777209 0.46402415 0.61465427 0.78012954 0.59137894\n[49] 0.58202826 0.22381850"
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-uniform-distribution-1",
"href": "materials/slides/w5.html#sampling-from-a-uniform-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a uniform distribution",
"text": "Sampling from a uniform distribution\nTo visualise the entire sample, we can plot it on a histogram:"
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-normal-distribution",
"href": "materials/slides/w5.html#sampling-from-a-normal-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a normal distribution",
"text": "Sampling from a normal distribution\nThe rnorm function returns some number (n) of randomly generated values given a set mean (\\(\\mu\\); mean) and standard deviation (\\(\\sigma\\); sd), such that \\(X\\sim\\mathcal N(\\mu,\\sigma^2)\\). The default is to draw from a standard normal (a.k.a., “Gaussian”) distribution (i.e., \\(\\mu = 0\\) and \\(\\sigma = 1\\)):\n\n\nrand_norms_10000 <- rnorm(n = 10000, mean = 0, sd = 1)\n\nprint(rand_norms_10000[1:20])\n\n\n\n [1] 0.8247128 -0.2646844 -0.8189774 -1.1496807 -0.9199141 1.9054621\n [7] 2.1109840 -0.2281314 2.5573187 -1.1336439 -1.8498121 0.7892403\n[13] 0.1478274 -1.1718075 0.9450400 -0.6083184 0.9430121 -1.0393722\n[19] -0.6519066 0.4566983"
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-normal-distribution-1",
"href": "materials/slides/w5.html#sampling-from-a-normal-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a normal distribution",
"text": "Sampling from a normal distribution\n\nHistograms allow us to check how samples from the same distribution might vary.\nExercise: Compare the above distribution with a normal distribution that had a standard deviation of 2 instead of 1.\nSample 10,000 new values in rnorm with sd = 2 instead of sd = 1 and create a new histogram with hist.\nTo see what the distribution of sampled data might look like given a low sample size (e.g., 10), repeat the process of sampling from rnorm(n = 10, mean = 0, sd = 1) multiple times and look at the shape of the resulting histogram."
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-poisson-distribution",
"href": "materials/slides/w5.html#sampling-from-a-poisson-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a Poisson distribution",
"text": "Sampling from a Poisson distribution\nA Poisson process describes events happening with some given probability over an area of time or space such that \\(X\\sim Poisson(\\lambda)\\), where the rate parameter \\(\\lambda\\) is both the mean and variance of the Poisson distribution (note that by definition, \\(\\lambda > 0\\), and although \\(\\lambda\\) can be any positive real number, data are always integers, as with count data).\n\nSampling from a Poisson distribution can be done in R with rpois, which takes only two arguments specifying the number of values to be returned (n) and the rate parameter (lambda). There are no default values for rpois.\n\n\nrand_poissons <- rpois(n = 10, lambda = 1.5)\n\nprint(rand_poissons)\n\n\n\n [1] 2 2 0 1 3 2 3 5 3 0"
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-poisson-distribution-1",
"href": "materials/slides/w5.html#sampling-from-a-poisson-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a Poisson distribution",
"text": "Sampling from a Poisson distribution\nA histogram of a large number of values to see the distribution when \\(\\lambda = 4.5\\):\n\nrand_poissons_10000 <- rpois(n = 10000, lambda = 4.5)"
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-binomial-distribution",
"href": "materials/slides/w5.html#sampling-from-a-binomial-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\nA binomial distribution describes the number of ‘successes’ for some number of independent trials (\\(\\Pr(success) = p\\)).\nThe rbinom function returns the number of successes after size trials, in which the probability of success in each trial is prob.\nSampling from a binomial distribution in R with rbinom is a bit more complex than using runif, rnorm, or rpois.\nLike those previous functions, the rbinom function returns some number (n) of random numbers, but the arguments and output can be slightly confusing at first."
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-binomial-distribution-1",
"href": "materials/slides/w5.html#sampling-from-a-binomial-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\nFor example, suppose we want to simulate the flipping of a fair coin 1000 times, and we want to know how many times that coin comes up heads (‘success’). We can do this with the following code:\n\n\ncoin_flips <- rbinom(n = 1, size = 1000, prob = 0.5)\n\ncoin_flips\n\n[1] 493\n\n\n\n\nThe above result shows that the coin came up heads 493 times. But note the (required) argument n. This allows us to set the number of sequences to run.\nIf we instead set n = 2, then this could simulate the flipping of a fair coin 1000 times once to see how many times heads comes up, then repeating the whole process a second time to see how many times heads comes up again (or, if it is more intuitive, the flipping of two separate fair coins 1000 times at the same time)."
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-binomial-distribution-2",
"href": "materials/slides/w5.html#sampling-from-a-binomial-distribution-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\ncoin_flips_2 <- rbinom(n = 2, size = 1000, prob = 0.5)\n\ncoin_flips_2\n\n[1] 480 476\n\n\n\nA coin was flipped 1000 times and returned 480 heads, and then another fair coin was flipped 1000 times and returned 476 heads.\n\n\n\nAs with the rnorm and runif functions, we can check to see what the distribution of the binomial function looks like if we repeat this process."
},
{
"objectID": "materials/slides/w5.html#sampling-from-a-binomial-distribution-3",
"href": "materials/slides/w5.html#sampling-from-a-binomial-distribution-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\nSuppose that we want to see the distribution of the number of times heads comes up after 1000 flips. We can simulate the process of flipping 1000 times in a row with 10000 different coins:\n\n\ncoin_flips_10000 <- rbinom(n = 10000, size = 1000, prob = 0.5)"
},
{
"objectID": "materials/slides/w5.html#random-sampling-using-sample",
"href": "materials/slides/w5.html#random-sampling-using-sample",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Random sampling using sample\n",
"text": "Random sampling using sample\n\n\nSometimes it is useful to sample a set of values from a vector or list. The R function sample is very flexible for sampling a subset of numbers or elements from some structure (x) in R according to some set probabilities (prob).\nElements can be sampled from x some number of times (size) with or without replacement (replace), though an error will be returned if the size of the sample is larger than x but replace = FALSE (default).\nSuppose we want to ask R to pick a random number from one to ten with equal probability:\n\n\nrand_number_1 <- sample(x = 1:10, size = 1)\n\nprint(rand_number_1)\n\n[1] 2"
},
{
"objectID": "materials/slides/w5.html#random-sampling-using-sample-1",
"href": "materials/slides/w5.html#random-sampling-using-sample-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Random sampling using sample\n",
"text": "Random sampling using sample\n\n\nWe can increase the size of the sample to 10:\n\n\nrand_number_10 <- sample(x = 1:10, size = 10)\nprint(rand_number_10)\n\n [1] 5 3 1 7 6 9 2 8 4 10\n\n\n\nNote that all numbers from 1 to 10 have been sampled, but in a random order. This is because the default is to sample without replacement, meaning that once a number has been sampled for the first element in rand_number_10, it is no longer available to be sampled again.\n\n\n\nWe can change this and allow for sampling with replacement:\n\n\nrand_number_10_r <- sample(x = 1:10, size = 10, replace = TRUE)\n\nprint(rand_number_10_r)\n\n [1] 4 10 1 9 3 4 10 9 6 10\n\n\n\nNote that the numbers {4, 9, 10} are now repeated in the set of randomly sampled values above."
},
{
"objectID": "materials/slides/w5.html#random-sampling-using-sample-2",
"href": "materials/slides/w5.html#random-sampling-using-sample-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Random sampling using sample\n",
"text": "Random sampling using sample\n\n\nSo far, because we have not specified a probability vector prob, the function assumes that every element in 1:10 is sampled with equal probability\nHere’s an example in which the numbers 1-5 are sampled with a probability of 0.05, while the numbers 6-10 are sampled with a probability of 0.15, thereby biasing sampling toward larger numbers; we always need to ensure that these probabilities need to sum to 1.\n\n\nprob_vec <- c( rep(x = 0.05, times = 5), rep(x = 0.15, times = 5))\n\nrand_num_bias <- sample(x = 1:10, size = 10, replace = TRUE, prob = prob_vec)\n\nprint(rand_num_bias)\n\n [1] 1 2 6 3 9 9 8 8 6 10"
},
{
"objectID": "materials/slides/w5.html#sampling-random-characters-from-a-list",
"href": "materials/slides/w5.html#sampling-random-characters-from-a-list",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling random characters from a list",
"text": "Sampling random characters from a list\n\nWe can also sample characters from a list of elements; it is no different than sampling numbers\nFor example, if we want to create a simulated data set that includes three different species of some plant or animal, we could create a vector of species identities from which to sample:\n\n\nspecies <- c(\"species_A\", \"species_B\", \"species_C\");\n\n\nWe can then sample from these three possible categories. For example:\n\n\nsp_sample <- sample(x = species, size = 24, replace = TRUE, \n prob = c(0.5, 0.25, 0.25))\n\n\nWhat did the code above do?\n\n\n\n\n [1] \"species_B\" \"species_C\" \"species_A\" \"species_A\" \"species_B\" \"species_A\"\n [7] \"species_A\" \"species_C\" \"species_C\" \"species_B\" \"species_C\" \"species_A\"\n[13] \"species_C\" \"species_A\" \"species_A\" \"species_A\" \"species_A\" \"species_C\"\n[19] \"species_B\" \"species_A\" \"species_C\" \"species_B\" \"species_A\" \"species_B\""
},
{
"objectID": "materials/slides/w5.html#simulating-data-with-known-correlations",
"href": "materials/slides/w5.html#simulating-data-with-known-correlations",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Simulating data with known correlations",
"text": "Simulating data with known correlations\n\nWe can generate variables \\(X_{1}\\) and \\(X_{2}\\) that have known correlations \\(\\rho\\) with with one another.\nFor example: two standard normal random variables with a sample size of 10000, and with correlation between them of 0.3:\n\n\nN <- 10000\nrho <- 0.3\nx1 <- rnorm(n = N, mean = 0, sd = 1)\nx2 <- (rho * x1) + sqrt(1 - rho*rho) * rnorm(n = N, mean = 0, sd = 1)\n\n\nThese variables are generated by first simulating the sample \\(x_{1}\\) (x1 above) from a standard normal distribution. Then, \\(x_{2}\\) (x2 above) is calculated as\n\n\\(x_{2} = \\rho x_{1} + \\sqrt{1 - \\rho^{2}}x_{rand}\\),\nwhere \\(x_{rand}\\) is a sample from a normal distribution with the same variance as \\(x_{1}\\)."
},
{
"objectID": "materials/slides/w5.html#simulating-data-with-known-correlations-1",
"href": "materials/slides/w5.html#simulating-data-with-known-correlations-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Simulating data with known correlations",
"text": "Simulating data with known correlations\n\nWe can generate variables \\(X_{1}\\) and \\(X_{2}\\) that have known correlations \\(\\rho\\) with with one another.\nFor example: two standard normal random variables with a sample size of 10000, and with correlation between them of 0.3:\n\n\nN <- 10000\nrho <- 0.3\nx1 <- rnorm(n = N, mean = 0, sd = 1)\nx2 <- (rho * x1) + sqrt(1 - rho*rho) * rnorm(n = N, mean = 0, sd = 1)\n\n\nDoes the correlation equal rho (with some sampling error)?\n\n\ncor(x1, x2)\n\n[1] 0.2952028"
},
{
"objectID": "materials/slides/w5.html#simulating-data-with-known-correlations-2",
"href": "materials/slides/w5.html#simulating-data-with-known-correlations-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Simulating data with known correlations",
"text": "Simulating data with known correlations\n\nThere is a more efficient way to generate any number of variables with different variances and correlations to one another.\nWe need to use the MASS library, which can be installed and loaded as below:\n\n\ninstall.packages(\"MASS\")\nlibrary(\"MASS\")\n\n\n\n\n\nIn the MASS library, the function mvrnorm can be used to generate any number of variables for a pre-specified covariance structure."
},
{
"objectID": "materials/slides/w5.html#statistical-power",
"href": "materials/slides/w5.html#statistical-power",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Statistical power",
"text": "Statistical power\n\n\nStatistical power is defined as the probability, before a study is performed, that a particular comparison will achieve “statistical significance” at some predetermined level (typically a p-value below 0.05), given some assumed true effect size\nIf a certain effect of interest exists (e.g. a difference between two groups) power is the chance that we actually find the effect in a given study\nA power analysis is performed by first hypothesizing an effect size, then making some assumptions about the variation in the data and the sample size of the study to be conducted, and finally using probability calculations to determine the chance of the p-value being below the threshold\nThe conventional view is that you should avoid low-power studies because they are unlikely to succeed\nThere are several problems with this view, but it’s often required by research funding bodies"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design",
"href": "materials/slides/w5.html#example-simulating-a-regression-design",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nWe can use simulation to test rather complex study designs\nImagine you are interested in students attitude towards smoking and how it depends on the medium of the message and the focus of the message\nWe want to know whether people’s attitude is different after seeing a visual anti-smoking message (these pictures on the package) vs a text-message (the text belonging to that picture)\nWe are interested in whether the attitude that people report is different after seeing a message that regards the consequences on other people (e.g. smoking can harm your loved ones) as compared to yourself (smoking can cause cancer)"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design-1",
"href": "materials/slides/w5.html#example-simulating-a-regression-design-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\nStudy design:\nDV: attitude towards smoking (0-100) IV1: medium (text vs. visual) IV2: focus (internal vs. external)\nThis is, there are 4 groups:\n\ngroup_TI will receive text-messages that are internal\ngroup_TE will receive text-messages that are external\ngroup_VI will receive visual messages that are internal\ngroup_VE will receive visual messages that are external"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design-2",
"href": "materials/slides/w5.html#example-simulating-a-regression-design-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nassume that we expect that people’s attitude will be more negative after seeing a visual rather than text message if the focus is internal (i.e. the message is about yourself) because it might be difficult to imagine that oneself would get cancer after reading a text but seeing a picture might cause fear regardless\nfor the external focus on the other hand, we expect a more negative attitude after reading a text as compared to seeing a picture, as it might have more impact on attitude to imagine a loved one get hurt than seeing a stranger in a picture suffering from the consequences of second-hand smoking\nwe expect that the internal focus messages will be related to lower attitudes compared to the external focus messages on average but we expect no main-effect of picture vs. text-messages"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design-3",
"href": "materials/slides/w5.html#example-simulating-a-regression-design-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nvisualize some rough means that show the desired behavior that we described in words earlier and see where we are going\nwe could make the overall mean of the internal focus groups (group_TI and group_VI) 20 and the mean of the external groups (group_TE and group_VE) 50 (this would already reflect the main-effect but also a belief that the smoking-attitudes are on average quite negative as we assume both means to be on the low end of the scale)\nassume that the mean of group_TI is 30 while the mean of group_VI is 10 and we could assume that the mean of group_TE is 40 and the mean of group_VE is 60"
},
{
"objectID": "materials/slides/w5.html#section-2",
"href": "materials/slides/w5.html#section-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "focus <- rep(c(\"internal\", \"external\"), each = 2)\nmedia <- rep(c(\"text\", \"visual\"), times = 2)\nmean_TI <- 50\nmean_VI <- 20\nmean_TE <- 30\nmean_VE <- 60\n\npd <- data.frame(score = c(mean_TI, mean_VI, mean_TE, mean_VE), focus = focus, media = media)\n\ninteraction.plot(pd$focus, pd$media, pd$score, ylim = c(0,100))"
},
{
"objectID": "materials/slides/w5.html#section-3",
"href": "materials/slides/w5.html#section-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "focus <- rep(c(\"internal\", \"external\"), each = 2)\nmedia <- rep(c(\"text\", \"visual\"), times = 2)\nmean_TI <- 43\nmean_VI <- 40\nmean_TE <- 45\nmean_VE <- 47\n\npd <- data.frame(score = c(mean_TI, mean_VI, mean_TE, mean_VE), focus = focus, media = media)\n\ninteraction.plot(pd$focus, pd$media, pd$score, ylim = c(0,100))"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design-4",
"href": "materials/slides/w5.html#example-simulating-a-regression-design-4",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nin the new example there is a difference between the two media groups on average but it is only .50 points, so arguably it is small enough to represent the assumption of “no” effect, as in real-life “no” effect in terms of a difference being actually 0 is rather rare\ncome up with some reasonable standard-deviation; if we start at 50 and we want most people to be < 80, we can set the 2-SD bound at 80 to get a standard-deviation of 15 (80-50)/2.\nlet’s assume that each of our groups has a standard-deviation of 15 points.\n\ngroup_TI = normal(n, 43, 15)\ngroup_VI = normal(n, 40, 15)\ngroup_TE = normal(n, 45, 15)\ngroup_VE = normal(n, 47, 15)"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design-5",
"href": "materials/slides/w5.html#example-simulating-a-regression-design-5",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nn <- 1e5\ngroup_TI <- rnorm(n, 43, 15)\ngroup_VI <- rnorm(n, 40, 15)\ngroup_TE <- rnorm(n, 45, 15)\ngroup_VE <- rnorm(n, 47, 15)\n\nparticipant <- c(1:(n*4))\nfocus <- rep(c(\"internal\", \"external\"), each = n*2)\nmedia <- rep(c(\"text\", \"visual\"), each = n, times = 2)\n\ndata <- data.frame(participant = participant, focus = focus, media = media, score = c(group_TI, group_VI, group_TE, group_VE))\n\nsummary(data)\n\n participant focus media score \n Min. :1e+00 Length:400000 Length:400000 Min. :-36.68 \n 1st Qu.:1e+05 Class :character Class :character 1st Qu.: 33.48 \n Median :2e+05 Mode :character Mode :character Median : 43.73 \n Mean :2e+05 Mean : 43.74 \n 3rd Qu.:3e+05 3rd Qu.: 54.00 \n Max. :4e+05 Max. :117.63"
},
{
"objectID": "materials/slides/w5.html#ready-for-power-analysis",
"href": "materials/slides/w5.html#ready-for-power-analysis",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Ready for power-analysis",
"text": "Ready for power-analysis\n\nSome additional assumptions: suppose we have enought funding for a sizeable data collection and the aim is to ensure that we do not draw unwarranted conclusions from the research\nWe should then set the alpha-level at a more conservative value (\\(\\alpha = .001\\)); with this, we expect to draw non-realistic conclusions in the interaction effect in only about 1 in every 1,000 experiments\nWe also want to be sure that we do detect an existing effect and keep our power high at 95%; with this, we expect that if there is an interaction effect, we would detect it in 19 out of 20 cases (only miss it in 1 out of 20, or 5%)\nRunning the power-simulation can be very memory-demanding and the code can run a very long time to complete; it’s advised to start from various “low-resolution” sample-sizes (e.g. n = 10, n = 100, n = 200, etc.) to get a rough idea of where we can expect our loop to end. Then, the search can be made more specific in order to identify a more precise sample size."
},
{
"objectID": "materials/slides/w5.html#ready-for-power-analysis-1",
"href": "materials/slides/w5.html#ready-for-power-analysis-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Ready for power-analysis",
"text": "Ready for power-analysis\n\nset.seed(1)\nn_sims <- 1000 # we want 1000 simulations\np_vals <- c()\npower_at_n <- c(0) # this vector will contain the power for each sample-size (it needs the initial 0 for the while-loop to work)\nn <- 100 # sample-size and start at 100 as we can be pretty sure this will not suffice for such a small effect\nn_increase <- 100 # by which stepsize should n be increased\ni <- 2\n\npower_crit <- .95\nalpha <- .001\n\nwhile(power_at_n[i-1] < power_crit){\n for(sim in 1:n_sims){\n group_TI <- rnorm(n, 43, 15)\n group_VI <- rnorm(n, 40, 15)\n group_TE <- rnorm(n, 45, 15)\n group_VE <- rnorm(n, 47, 15)\n \n participant <- c(1:(n*4))\n focus <- rep(c(\"internal\", \"external\"), each = n*2)\n media <- rep(c(\"text\", \"visual\"), each = n, times = 2)\n \n data <- data.frame(participant = participant, focus = focus, media = media, score = c(group_TI, group_VI, group_TE, group_VE))\n data$media_sum_num <- ifelse(data$media == \"text\", 1, -1) # apply sum-to-zero coding\n data$focus_sum_num <- ifelse(data$focus == \"external\", 1, -1) \n lm_int <- lm(score ~ 1 + focus_sum_num + media_sum_num + focus_sum_num:media_sum_num, data = data) # fit the model with the interaction\n lm_null <- lm(score ~ 1 + focus_sum_num + media_sum_num, data = data) # fit the model without the interaction\n p_vals[sim] <- anova(lm_int, lm_null)$`Pr(>F)`[2] # put the p-values in a list\n }\n print(n)\n power_at_n[i] <- mean(p_vals < alpha) # check power (i.e. proportion of p-values that are smaller than alpha-level of .10)\n names(power_at_n)[i] <- n\n n <- n+n_increase # increase sample-size by 100 for low-resolution testing first\n i <- i+1 # increase index of the while-loop by 1 to save power and cohens d to vector\n}\n\n[1] 100\n[1] 200\n[1] 300\n[1] 400\n[1] 500\n[1] 600\n[1] 700\n[1] 800\n[1] 900\n\npower_at_n <- power_at_n[-1] # delete first 0 from the vector"
},
{
"objectID": "materials/slides/w5.html#example-simulating-a-regression-design-6",
"href": "materials/slides/w5.html#example-simulating-a-regression-design-6",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\nWe can plot the results form the power-simulation:\n\n\nAt roughly 900 participants we observe sufficient power"
},
{
"objectID": "materials/slides/w6.html#the-uses-of-simulation-methods",
"href": "materials/slides/w6.html#the-uses-of-simulation-methods",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "The uses of simulation methods",
"text": "The uses of simulation methods\n\nSimulating data uses generating random data sets with known properties using code (or some other method). This can be useful in various contexts.\n\n\nTo better understand our models. Probability models mimic variation in the world, and the tools of simulation can help us better understand this variation. Patterns of randomness are contrary to normal human thinking and simulation helps in training our intuitions about averages and variation\n\nTo run statistical analyses (e.g., simulating a null distribution against which to compare a sample)\n\nTo approximate the sampling distribution of data and propagate this to the sampling distribution of statistical estimates and procedures"
},
{
"objectID": "materials/slides/w6.html#distribution-functions",
"href": "materials/slides/w6.html#distribution-functions",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Distribution functions",
"text": "Distribution functions\n\nBase R functions:\n\nrnorm(): sampling from a normal distribution\nrunif(): sampling from a uniform distribution\nrbinom(): sampling from a binomial distribution\nrpois(): sampling from a Poisson distribution\n\n(Other distributions are also available)\n\nsample(): sampling elements from an R object with or without replacement\nreplicate(): often plays a role in conjunction with sampling functions; it is used to evaluate an expression N number of times repeatedly\n\nFrom non-base packages:\n\n\nMASS::mvtnorm(): multivariate normal; sampling multiple variables with a known correlation structure (i.e., we can tell R how variables should be correlated with one another) and normally distributed errors"
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-uniform-distribution",
"href": "materials/slides/w6.html#sampling-from-a-uniform-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a uniform distribution",
"text": "Sampling from a uniform distribution\nThe runif function returns some number (n) of random numbers from a uniform distribution with a range from \\(a\\) (min) to \\(b\\) (max) such that \\(X\\sim\\mathcal U(a,b)\\) (verbally, \\(X\\) is sampled from a uniform distribution with the parameters \\(a\\) and \\(b\\)), where \\(-\\infty < a < b < \\infty\\) (verbally, \\(a\\) is greater than negative infinity but less than \\(b\\), and \\(b\\) is finite). The default is to draw from a standard uniform distribution (i.e., \\(a = 0\\) and \\(b = 1\\)):\n\n\n# Sample a vector of ten numbers and store the results in the object `rand_unifs`\n# Note that the numbers will be different each time we re-run the `runif` function above.\n# If we want to recreate the same sample, we should set a `seed` number first\n\nrand_unifs <- runif(n = 10000, min = 0, max = 1);\n\n\n\nThe first 40 numbers from the sample are:\n\n\n [1] 0.99434864 0.96431541 0.30580586 0.33276507 0.84967627 0.81678374\n [7] 0.51459419 0.10484424 0.91966070 0.26868353 0.83214199 0.87764814\n[13] 0.34502670 0.10482143 0.62726192 0.78069416 0.28723441 0.99650014\n[19] 0.06301852 0.52260594 0.45916681 0.03622946 0.72827098 0.40192253\n[25] 0.77440006 0.15546010 0.35228083 0.07063814 0.56907472 0.29733538\n[31] 0.68845754 0.43638929 0.45369228 0.62800198 0.35717584 0.48973529\n[37] 0.68858431 0.76422131 0.94052166 0.86891697 0.10667781 0.67989207\n[43] 0.41068036 0.21645607 0.21990561 0.29829047 0.48076992 0.92049340\n[49] 0.55169980 0.02008806"
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-uniform-distribution-1",
"href": "materials/slides/w6.html#sampling-from-a-uniform-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a uniform distribution",
"text": "Sampling from a uniform distribution\nTo visualise the entire sample, we can plot it on a histogram:"
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-normal-distribution",
"href": "materials/slides/w6.html#sampling-from-a-normal-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a normal distribution",
"text": "Sampling from a normal distribution\nThe rnorm function returns some number (n) of randomly generated values given a set mean (\\(\\mu\\); mean) and standard deviation (\\(\\sigma\\); sd), such that \\(X\\sim\\mathcal N(\\mu,\\sigma^2)\\). The default is to draw from a standard normal (a.k.a., “Gaussian”) distribution (i.e., \\(\\mu = 0\\) and \\(\\sigma = 1\\)):\n\n\nrand_norms_10000 <- rnorm(n = 10000, mean = 0, sd = 1)\n\nprint(rand_norms_10000[1:20])\n\n\n\n [1] -1.4294749 0.7034161 0.2124047 -0.7159934 1.9414967 2.2186264\n [7] 0.9284274 0.5083624 0.4948380 -0.9719948 -0.4409321 1.5844225\n[13] -0.3116727 1.3008722 -2.1558888 -0.5306407 -0.5345091 1.5416112\n[19] 1.5387759 -1.0633193"
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-normal-distribution-1",
"href": "materials/slides/w6.html#sampling-from-a-normal-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a normal distribution",
"text": "Sampling from a normal distribution\n\nHistograms allow us to check how samples from the same distribution might vary.\nExercise: Compare the above distribution with a normal distribution that had a standard deviation of 2 instead of 1.\nSample 10,000 new values in rnorm with sd = 2 instead of sd = 1 and create a new histogram with hist.\nTo see what the distribution of sampled data might look like given a low sample size (e.g., 10), repeat the process of sampling from rnorm(n = 10, mean = 0, sd = 1) multiple times and look at the shape of the resulting histogram."
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-poisson-distribution",
"href": "materials/slides/w6.html#sampling-from-a-poisson-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a Poisson distribution",
"text": "Sampling from a Poisson distribution\nA Poisson process describes events happening with some given probability over an area of time or space such that \\(X\\sim Poisson(\\lambda)\\), where the rate parameter \\(\\lambda\\) is both the mean and variance of the Poisson distribution (note that by definition, \\(\\lambda > 0\\), and although \\(\\lambda\\) can be any positive real number, data are always integers, as with count data).\n\nSampling from a Poisson distribution can be done in R with rpois, which takes only two arguments specifying the number of values to be returned (n) and the rate parameter (lambda). There are no default values for rpois.\n\n\nrand_poissons <- rpois(n = 10, lambda = 1.5)\n\nprint(rand_poissons)\n\n\n\n [1] 1 0 2 1 0 2 1 0 0 1"
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-poisson-distribution-1",
"href": "materials/slides/w6.html#sampling-from-a-poisson-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a Poisson distribution",
"text": "Sampling from a Poisson distribution\nA histogram of a large number of values to see the distribution when \\(\\lambda = 4.5\\):\n\nrand_poissons_10000 <- rpois(n = 10000, lambda = 4.5)"
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-binomial-distribution",
"href": "materials/slides/w6.html#sampling-from-a-binomial-distribution",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\nA binomial distribution describes the number of ‘successes’ for some number of independent trials (\\(\\Pr(success) = p\\)).\nThe rbinom function returns the number of successes after size trials, in which the probability of success in each trial is prob.\nSampling from a binomial distribution in R with rbinom is a bit more complex than using runif, rnorm, or rpois.\nLike those previous functions, the rbinom function returns some number (n) of random numbers, but the arguments and output can be slightly confusing at first."
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-binomial-distribution-1",
"href": "materials/slides/w6.html#sampling-from-a-binomial-distribution-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\nFor example, suppose we want to simulate the flipping of a fair coin 1000 times, and we want to know how many times that coin comes up heads (‘success’). We can do this with the following code:\n\n\ncoin_flips <- rbinom(n = 1, size = 1000, prob = 0.5)\n\ncoin_flips\n\n[1] 543\n\n\n\n\nThe above result shows that the coin came up heads 543 times. But note the (required) argument n. This allows us to set the number of sequences to run.\nIf we instead set n = 2, then this could simulate the flipping of a fair coin 1000 times once to see how many times heads comes up, then repeating the whole process a second time to see how many times heads comes up again (or, if it is more intuitive, the flipping of two separate fair coins 1000 times at the same time)."
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-binomial-distribution-2",
"href": "materials/slides/w6.html#sampling-from-a-binomial-distribution-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\ncoin_flips_2 <- rbinom(n = 2, size = 1000, prob = 0.5)\n\ncoin_flips_2\n\n[1] 466 501\n\n\n\nA coin was flipped 1000 times and returned 466 heads, and then another fair coin was flipped 1000 times and returned 501 heads.\n\n\n\nAs with the rnorm and runif functions, we can check to see what the distribution of the binomial function looks like if we repeat this process."
},
{
"objectID": "materials/slides/w6.html#sampling-from-a-binomial-distribution-3",
"href": "materials/slides/w6.html#sampling-from-a-binomial-distribution-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling from a binomial distribution",
"text": "Sampling from a binomial distribution\n\nSuppose that we want to see the distribution of the number of times heads comes up after 1000 flips. We can simulate the process of flipping 1000 times in a row with 10000 different coins:\n\n\ncoin_flips_10000 <- rbinom(n = 10000, size = 1000, prob = 0.5)"
},
{
"objectID": "materials/slides/w6.html#random-sampling-using-sample",
"href": "materials/slides/w6.html#random-sampling-using-sample",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Random sampling using sample\n",
"text": "Random sampling using sample\n\n\nSometimes it is useful to sample a set of values from a vector or list. The R function sample is very flexible for sampling a subset of numbers or elements from some structure (x) in R according to some set probabilities (prob).\nElements can be sampled from x some number of times (size) with or without replacement (replace), though an error will be returned if the size of the sample is larger than x but replace = FALSE (default).\nSuppose we want to ask R to pick a random number from one to ten with equal probability:\n\n\nrand_number_1 <- sample(x = 1:10, size = 1)\n\nprint(rand_number_1)\n\n[1] 6"
},
{
"objectID": "materials/slides/w6.html#random-sampling-using-sample-1",
"href": "materials/slides/w6.html#random-sampling-using-sample-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Random sampling using sample\n",
"text": "Random sampling using sample\n\n\nWe can increase the size of the sample to 10:\n\n\nrand_number_10 <- sample(x = 1:10, size = 10)\nprint(rand_number_10)\n\n [1] 5 4 2 7 8 9 6 1 10 3\n\n\n\nNote that all numbers from 1 to 10 have been sampled, but in a random order. This is because the default is to sample without replacement, meaning that once a number has been sampled for the first element in rand_number_10, it is no longer available to be sampled again.\n\n\n\nWe can change this and allow for sampling with replacement:\n\n\nrand_number_10_r <- sample(x = 1:10, size = 10, replace = TRUE)\n\nprint(rand_number_10_r)\n\n [1] 10 8 4 3 7 8 8 1 1 7\n\n\n\nNote that the numbers {1, 7, 8} are now repeated in the set of randomly sampled values above."
},
{
"objectID": "materials/slides/w6.html#random-sampling-using-sample-2",
"href": "materials/slides/w6.html#random-sampling-using-sample-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Random sampling using sample\n",
"text": "Random sampling using sample\n\n\nSo far, because we have not specified a probability vector prob, the function assumes that every element in 1:10 is sampled with equal probability\nHere’s an example in which the numbers 1-5 are sampled with a probability of 0.05, while the numbers 6-10 are sampled with a probability of 0.15, thereby biasing sampling toward larger numbers; we always need to ensure that these probabilities need to sum to 1.\n\n\nprob_vec <- c( rep(x = 0.05, times = 5), rep(x = 0.15, times = 5))\n\nrand_num_bias <- sample(x = 1:10, size = 10, replace = TRUE, prob = prob_vec)\n\nprint(rand_num_bias)\n\n [1] 7 5 4 6 4 10 8 7 8 9"
},
{
"objectID": "materials/slides/w6.html#sampling-random-characters-from-a-list",
"href": "materials/slides/w6.html#sampling-random-characters-from-a-list",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Sampling random characters from a list",
"text": "Sampling random characters from a list\n\nWe can also sample characters from a list of elements; it is no different than sampling numbers\nFor example, if we want to create a simulated data set that includes three different species of some plant or animal, we could create a vector of species identities from which to sample:\n\n\nspecies <- c(\"species_A\", \"species_B\", \"species_C\");\n\n\nWe can then sample from these three possible categories. For example:\n\n\nsp_sample <- sample(x = species, size = 24, replace = TRUE, \n prob = c(0.5, 0.25, 0.25))\n\n\nWhat did the code above do?\n\n\n\n\n [1] \"species_B\" \"species_C\" \"species_B\" \"species_B\" \"species_C\" \"species_B\"\n [7] \"species_A\" \"species_B\" \"species_C\" \"species_A\" \"species_B\" \"species_A\"\n[13] \"species_B\" \"species_A\" \"species_A\" \"species_B\" \"species_A\" \"species_A\"\n[19] \"species_A\" \"species_B\" \"species_A\" \"species_C\" \"species_C\" \"species_A\""
},
{
"objectID": "materials/slides/w6.html#simulating-data-with-known-correlations",
"href": "materials/slides/w6.html#simulating-data-with-known-correlations",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Simulating data with known correlations",
"text": "Simulating data with known correlations\n\nWe can generate variables \\(X_{1}\\) and \\(X_{2}\\) that have known correlations \\(\\rho\\) with with one another.\nFor example: two standard normal random variables with a sample size of 10000, and with correlation between them of 0.3:\n\n\nN <- 10000\nrho <- 0.3\nx1 <- rnorm(n = N, mean = 0, sd = 1)\nx2 <- (rho * x1) + sqrt(1 - rho*rho) * rnorm(n = N, mean = 0, sd = 1)\n\n\nThese variables are generated by first simulating the sample \\(x_{1}\\) (x1 above) from a standard normal distribution. Then, \\(x_{2}\\) (x2 above) is calculated as\n\n\\(x_{2} = \\rho x_{1} + \\sqrt{1 - \\rho^{2}}x_{rand}\\),\nwhere \\(x_{rand}\\) is a sample from a normal distribution with the same variance as \\(x_{1}\\)."
},
{
"objectID": "materials/slides/w6.html#simulating-data-with-known-correlations-1",
"href": "materials/slides/w6.html#simulating-data-with-known-correlations-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Simulating data with known correlations",
"text": "Simulating data with known correlations\n\nWe can generate variables \\(X_{1}\\) and \\(X_{2}\\) that have known correlations \\(\\rho\\) with with one another.\nFor example: two standard normal random variables with a sample size of 10000, and with correlation between them of 0.3:\n\n\nN <- 10000\nrho <- 0.3\nx1 <- rnorm(n = N, mean = 0, sd = 1)\nx2 <- (rho * x1) + sqrt(1 - rho*rho) * rnorm(n = N, mean = 0, sd = 1)\n\n\nDoes the correlation equal rho (with some sampling error)?\n\n\ncor(x1, x2)\n\n[1] 0.3126728"
},
{
"objectID": "materials/slides/w6.html#simulating-data-with-known-correlations-2",
"href": "materials/slides/w6.html#simulating-data-with-known-correlations-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Simulating data with known correlations",
"text": "Simulating data with known correlations\n\nThere is a more efficient way to generate any number of variables with different variances and correlations to one another.\nWe need to use the MASS library, which can be installed and loaded as below:\n\n\ninstall.packages(\"MASS\")\nlibrary(\"MASS\")\n\n\n\n\n\nIn the MASS library, the function mvrnorm can be used to generate any number of variables for a pre-specified covariance structure."
},
{
"objectID": "materials/slides/w6.html#statistical-power",
"href": "materials/slides/w6.html#statistical-power",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Statistical power",
"text": "Statistical power\n\n\nStatistical power is defined as the probability, before a study is performed, that a particular comparison will achieve “statistical significance” at some predetermined level (typically a p-value below 0.05), given some assumed true effect size\nIf a certain effect of interest exists (e.g. a difference between two groups) power is the chance that we actually find the effect in a given study\nA power analysis is performed by first hypothesizing an effect size, then making some assumptions about the variation in the data and the sample size of the study to be conducted, and finally using probability calculations to determine the chance of the p-value being below the threshold\nThe conventional view is that you should avoid low-power studies because they are unlikely to succeed\nThere are several problems with this view, but it’s often required by research funding bodies"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design",
"href": "materials/slides/w6.html#example-simulating-a-regression-design",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nWe can use simulation to test rather complex study designs\nImagine you are interested in students attitude towards smoking and how it depends on the medium of the message and the focus of the message\nWe want to know whether people’s attitude is different after seeing a visual anti-smoking message (these pictures on the package) vs a text-message (the text belonging to that picture)\nWe are interested in whether the attitude that people report is different after seeing a message that regards the consequences on other people (e.g. smoking can harm your loved ones) as compared to yourself (smoking can cause cancer)"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design-1",
"href": "materials/slides/w6.html#example-simulating-a-regression-design-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\nStudy design:\nDV: attitude towards smoking (0-100) IV1: medium (text vs. visual) IV2: focus (internal vs. external)\nThis is, there are 4 groups:\n\ngroup_TI will receive text-messages that are internal\ngroup_TE will receive text-messages that are external\ngroup_VI will receive visual messages that are internal\ngroup_VE will receive visual messages that are external"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design-2",
"href": "materials/slides/w6.html#example-simulating-a-regression-design-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nassume that we expect that people’s attitude will be more negative after seeing a visual rather than text message if the focus is internal (i.e. the message is about yourself) because it might be difficult to imagine that oneself would get cancer after reading a text but seeing a picture might cause fear regardless\nfor the external focus on the other hand, we expect a more negative attitude after reading a text as compared to seeing a picture, as it might have more impact on attitude to imagine a loved one get hurt than seeing a stranger in a picture suffering from the consequences of second-hand smoking\nwe expect that the internal focus messages will be related to lower attitudes compared to the external focus messages on average but we expect no main-effect of picture vs. text-messages"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design-3",
"href": "materials/slides/w6.html#example-simulating-a-regression-design-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nvisualize some rough means that show the desired behavior that we described in words earlier and see where we are going\nwe could make the overall mean of the internal focus groups (group_TI and group_VI) 20 and the mean of the external groups (group_TE and group_VE) 50 (this would already reflect the main-effect but also a belief that the smoking-attitudes are on average quite negative as we assume both means to be on the low end of the scale)\nassume that the mean of group_TI is 30 while the mean of group_VI is 10 and we could assume that the mean of group_TE is 40 and the mean of group_VE is 60"
},
{
"objectID": "materials/slides/w6.html#section-2",
"href": "materials/slides/w6.html#section-2",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "focus <- rep(c(\"internal\", \"external\"), each = 2)\nmedia <- rep(c(\"text\", \"visual\"), times = 2)\nmean_TI <- 50\nmean_VI <- 20\nmean_TE <- 30\nmean_VE <- 60\n\npd <- data.frame(score = c(mean_TI, mean_VI, mean_TE, mean_VE), focus = focus, media = media)\n\ninteraction.plot(pd$focus, pd$media, pd$score, ylim = c(0,100))"
},
{
"objectID": "materials/slides/w6.html#section-3",
"href": "materials/slides/w6.html#section-3",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "",
"text": "focus <- rep(c(\"internal\", \"external\"), each = 2)\nmedia <- rep(c(\"text\", \"visual\"), times = 2)\nmean_TI <- 43\nmean_VI <- 40\nmean_TE <- 45\nmean_VE <- 47\n\npd <- data.frame(score = c(mean_TI, mean_VI, mean_TE, mean_VE), focus = focus, media = media)\n\ninteraction.plot(pd$focus, pd$media, pd$score, ylim = c(0,100))"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design-4",
"href": "materials/slides/w6.html#example-simulating-a-regression-design-4",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nin the new example there is a difference between the two media groups on average but it is only .50 points, so arguably it is small enough to represent the assumption of “no” effect, as in real-life “no” effect in terms of a difference being actually 0 is rather rare\ncome up with some reasonable standard-deviation; if we start at 50 and we want most people to be < 80, we can set the 2-SD bound at 80 to get a standard-deviation of 15 (80-50)/2.\nlet’s assume that each of our groups has a standard-deviation of 15 points.\n\ngroup_TI = normal(n, 43, 15)\ngroup_VI = normal(n, 40, 15)\ngroup_TE = normal(n, 45, 15)\ngroup_VE = normal(n, 47, 15)"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design-5",
"href": "materials/slides/w6.html#example-simulating-a-regression-design-5",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\n\nn <- 1e5\ngroup_TI <- rnorm(n, 43, 15)\ngroup_VI <- rnorm(n, 40, 15)\ngroup_TE <- rnorm(n, 45, 15)\ngroup_VE <- rnorm(n, 47, 15)\n\nparticipant <- c(1:(n*4))\nfocus <- rep(c(\"internal\", \"external\"), each = n*2)\nmedia <- rep(c(\"text\", \"visual\"), each = n, times = 2)\n\ndata <- data.frame(participant = participant, focus = focus, media = media, score = c(group_TI, group_VI, group_TE, group_VE))\n\nsummary(data)\n\n participant focus media score \n Min. :1e+00 Length:400000 Length:400000 Min. :-27.34 \n 1st Qu.:1e+05 Class :character Class :character 1st Qu.: 33.48 \n Median :2e+05 Mode :character Mode :character Median : 43.77 \n Mean :2e+05 Mean : 43.75 \n 3rd Qu.:3e+05 3rd Qu.: 54.01 \n Max. :4e+05 Max. :117.23"
},
{
"objectID": "materials/slides/w6.html#ready-for-power-analysis",
"href": "materials/slides/w6.html#ready-for-power-analysis",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Ready for power-analysis",
"text": "Ready for power-analysis\n\nSome additional assumptions: suppose we have enought funding for a sizeable data collection and the aim is to ensure that we do not draw unwarranted conclusions from the research\nWe should then set the alpha-level at a more conservative value (\\(\\alpha = .001\\)); with this, we expect to draw non-realistic conclusions in the interaction effect in only about 1 in every 1,000 experiments\nWe also want to be sure that we do detect an existing effect and keep our power high at 95%; with this, we expect that if there is an interaction effect, we would detect it in 19 out of 20 cases (only miss it in 1 out of 20, or 5%)\nRunning the power-simulation can be very memory-demanding and the code can run a very long time to complete; it’s advised to start from various “low-resolution” sample-sizes (e.g. n = 10, n = 100, n = 200, etc.) to get a rough idea of where we can expect our loop to end. Then, the search can be made more specific in order to identify a more precise sample size."
},
{
"objectID": "materials/slides/w6.html#ready-for-power-analysis-1",
"href": "materials/slides/w6.html#ready-for-power-analysis-1",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Ready for power-analysis",
"text": "Ready for power-analysis\n\nset.seed(1)\nn_sims <- 1000 # we want 1000 simulations\np_vals <- c()\npower_at_n <- c(0) # this vector will contain the power for each sample-size (it needs the initial 0 for the while-loop to work)\nn <- 100 # sample-size and start at 100 as we can be pretty sure this will not suffice for such a small effect\nn_increase <- 100 # by which stepsize should n be increased\ni <- 2\n\npower_crit <- .95\nalpha <- .001\n\nwhile(power_at_n[i-1] < power_crit){\n for(sim in 1:n_sims){\n group_TI <- rnorm(n, 43, 15)\n group_VI <- rnorm(n, 40, 15)\n group_TE <- rnorm(n, 45, 15)\n group_VE <- rnorm(n, 47, 15)\n \n participant <- c(1:(n*4))\n focus <- rep(c(\"internal\", \"external\"), each = n*2)\n media <- rep(c(\"text\", \"visual\"), each = n, times = 2)\n \n data <- data.frame(participant = participant, focus = focus, media = media, score = c(group_TI, group_VI, group_TE, group_VE))\n data$media_sum_num <- ifelse(data$media == \"text\", 1, -1) # apply sum-to-zero coding\n data$focus_sum_num <- ifelse(data$focus == \"external\", 1, -1) \n lm_int <- lm(score ~ 1 + focus_sum_num + media_sum_num + focus_sum_num:media_sum_num, data = data) # fit the model with the interaction\n lm_null <- lm(score ~ 1 + focus_sum_num + media_sum_num, data = data) # fit the model without the interaction\n p_vals[sim] <- anova(lm_int, lm_null)$`Pr(>F)`[2] # put the p-values in a list\n }\n print(n)\n power_at_n[i] <- mean(p_vals < alpha) # check power (i.e. proportion of p-values that are smaller than alpha-level of .10)\n names(power_at_n)[i] <- n\n n <- n+n_increase # increase sample-size by 100 for low-resolution testing first\n i <- i+1 # increase index of the while-loop by 1 to save power and cohens d to vector\n}\n\n[1] 100\n[1] 200\n[1] 300\n[1] 400\n[1] 500\n[1] 600\n[1] 700\n[1] 800\n[1] 900\n\npower_at_n <- power_at_n[-1] # delete first 0 from the vector"
},
{
"objectID": "materials/slides/w6.html#example-simulating-a-regression-design-6",
"href": "materials/slides/w6.html#example-simulating-a-regression-design-6",
"title": "HSS8005 {{< iconify line-md plus >}}",
"section": "Example: simulating a regression design",
"text": "Example: simulating a regression design\nWe can plot the results form the power-simulation:\n\n\nAt roughly 900 participants we observe sufficient power"
},
{
"objectID": "materials/worksheets/index.html",
"href": "materials/worksheets/index.html",
"title": "Worksheets",
"section": "",
"text": "References\n\nDavid, F. N. 1955. “Studies in the History of Probability and Statistics i. Dicing and Gaming (a Note on the History of Probability).” Biometrika 42 (1/2): 1–15. https://doi.org/10.2307/2333419.\n\n\nEl-Shagi, Makram, and Alexander Jung. 2015. “Have Minutes Helped Markets to Predict the MPC’s Monetary Policy Decisions?” European Journal of Political Economy 39 (September): 222–34. https://doi.org/10.1016/j.ejpoleco.2015.05.004.\n\n\nGelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781139161879.\n\n\nLord, R. D. 1958. “Studies in the History of Probability and Statistics.: VIII. De Morgan and the Statistical Study of Literary Style.” Biometrika 45 (1/2): 282–82. https://doi.org/10.2307/2333072.\n\n\nMcElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Second. CRC Texts in Statistical Science. Boca Raton: Taylor and Francis, CRC Press.\n\n\nMulvin, Dylan. 2021. Proxies: The Cultural Work of Standing in. Infrastructures Series. Cambridge, Massachusetts: The MIT Press.\n\n\nSenn, Stephen. 2003. “A Conversation with John Nelder.” Statistical Science 18 (1): 118–31. https://doi.org/10.1214/ss/1056397489."
},
{
"objectID": "materials/worksheets/worksheets_w01.html",
"href": "materials/worksheets/worksheets_w01.html",
"title": "Week 1 Computer Lab Worksheet",
"section": "",
"text": "This lab is an introduction to R and RStudio for the purposes of this module. It is expected that those new to R will complete the R for Social Scientists online training course on their own (estimated to take around 5-6 hours), as well as read through the assigned chapters from the R4DS textbook. The aims of this session are more limited than the contents of those resources, while at the same time offering something additional to those already familiar with basic operations in R.\nBy the end of the session, you will:\n\nunderstand how to use the most important panels in the RStudio interface\ncreate an RStudio Project to store your work throughout the course\nbegin using R scripts (.R) and Quarto notebooks (.qmd) to record and document your coding progress\nunderstand data types and basic operations in the R language\nunderstand the principles behind functions\n\nknow how to install, load and use functions from user-written packages\ngain familiarity with some useful functions from packages included in the tidyverse ecosystem\nThese few tasks should be enough to get you started with R and RStudio. If you haven’t yet done so, complete the R for Social Scientists online training too sometime over the next week. From next week we will begin working actively with real data and address specific data management challenges that arise from there.\nThose of you who have worked on the advanced user exercise can check some optional solutions below."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#r-and-rstudio",
"href": "materials/worksheets/worksheets_w01.html#r-and-rstudio",
"title": "Week 1 Computer Lab Worksheet",
"section": "R and RStudio",
"text": "R and RStudio\nIf you are working on university desktops in the IT labs, recent versions of both R and RStudio will already be installed. To install them on your personal computers, follow the steps outlined here based on your operating system.\nAlthough you will likely only interact directly with RStudio, R needs to be installed first. Think of the relationship between the two as that between the engine of a car (R) and the dashboard of a car (RStudio); or, imagine driving this (R) versus this (RStudio).\nYour first task is to take RStudio for a spin and get to know some of its more commonly used panes. The four main panes are:\n\nThe R Console Pane\nThe R Console, by default the left or lower-left pane in R Studio, is the home of the R “engine”. This is where the commands are actually run and non-graphic outputs and error/warning messages appear. The Console is the direct interface to the R software itself; it’s what we get if instead of RStudio we open the R software: a direct interface to the R programming language, where we can type commands and where results/messages are printed.\nYou can directly enter and run commands in the R Console, but realize that these commands are not saved as they are when running commands from a script. For this reason, we should not use the Console pane directly too much. For typing commands that we want R to execute, we should instead use an R script file, where everything we type can be saved for later and complex analyses can be built up.\nThe Source Pane\nThis pane, by default in the upper-left, is a space to work with scripts and other text files. This pane is also where datasets (data frames) open up for viewing.\n\n\n\n\n\n\nNote\n\n\n\nNote\nIf your RStudio displays only one pane on the left, it is because you have no scripts open yet. We can open an existing one or create a new one. We’ ll do that a bit later.\n\n\nThe Environment Pane\nThis pane, by default in the upper-right, is most often used to see brief summaries of “objects” that are available in an active session. Datasets loaded for analysis would appear here\n\n\n\n\n\n\nNote\n\n\n\nNote\nIf your Environment is empty, it means that you don’t have any “objects” loaded or created yet. We will be creating some objects later and we will also import an example dataset.\n\n\nFiles, Plots, Packages, Help, etc. The lower-right pane includes several tabs including plots (display of graphics including maps), help, a file library, and available R packages (including installation/update options).\n\n\n\n\n\n\nTip\n\n\n\nTip\nYou can arrange the panes in various ways, depending on your preferences, using Tools > Global Options in the top menu. So the arrangement of panes may look different on different computers.\n\n\nGeneral settings\nYou can personalise the look and feel of your RStudio setup in various ways using Tools > Global Options from the top menu, but setting some options as default from the very start is highly recommended. You can see these in the pictures below:\n\n\n\n\n\n\n\n\n\n\n\nThe most important setting in the picture on the left is the one to restore .RData at startup and saving the workspace as .RData on exit. Make sure these are un-ticked and set to ‘Never’, respectively, as shown in the picture. It’s always safer to start each RStudio session in a clean state, without anything automatically pre-loaded from a previos session. That could lead to serious and hard to trace complications.\nIn the picture on the right, you have the option to select that the new native pipe operator (we’ll talk about it later!) be inserted using the Ctrl+Shift+M keyboard shortcut instead of the older version of the pipe (|>).\n\nThese settings will make more sense later, but it’s a good idea to have them sorted at the very beginning."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#task-1-use-r-as-a-simple-calculator",
"href": "materials/worksheets/worksheets_w01.html#task-1-use-r-as-a-simple-calculator",
"title": "Week 1 Computer Lab Worksheet",
"section": "Task 1: Use R as a simple calculator",
"text": "Task 1: Use R as a simple calculator\nThe most elementary yet still handy task you can use R for is to perform basic arithmetic operations. This is useful for getting a first experience doing things in the R language. Let’s have a look at a few operations using the Console directly. Let’s say we want to know the result of adding up three numbers: 1, 3 and 5. In the Console pane, type the command below and then click Enter:\n\n1 + 3 + 5\n\nThis will print out the result (9) in the Console:\n\n\n[1] 9\n\n\nThe [1] in the result is just the line number; in this case, our result only consists of a single line.\nWe can also save the result of this operation as an object, so we can use it for further operations. We create objects by using the so-called assignment operator consisting of the characters <-. A command involving <- can be read as “assign the value of the result from the operation on the right hand side (some expression) to the object on the left hand side (short name of object, single word, with no spaces)”. For example, let’s save our result in an object called “nine”:\n\nnine <- 1 + 3 + 5\n\nNotice that there is no output printed in the Console this time. But there are also no error messages, so the operation must have run without problems. Instead, if we look at the Environment pane, we notice that it is no longer empty, but contains an object called “nine” that stores the value “9” in it. We can now use this object for other operations, such as:\n\nnine - 3\n\n[1] 6\n\nnine + 15\n\n[1] 24\n\nnine / 3\n\n[1] 3\n\nnine * 9\n\n[1] 81\n\n\nWe see the results of these operations printed out in the Console.\nWe can also check results of so-called relational operations. There are several relational operators that allow us to compare objects in R. The most useful of these are the following:\n\n\n> greater than, >= greater than or equal to\n\n< less than, <= less than or equal to\n\n== equal to\n\n!= not equal to\n\nWhen we use these to compare two objects in R, we end us with a logical object.\nFor example, let’s check whether 9 is greater than 5, and whether it is lower than 8:\n\n9 > 5\n\n[1] TRUE\n\n9 < 8\n\n[1] FALSE\n\n\nR treats our inputs as statements that we are asking it to evaluate, and we get the answers “TRUE” and “FALSE”, respectively, as we would expect. Let’s now check whether our object “nine” is equal to the number 9. We may assume that we can achieve this by typing “nine = 9”, but let’s see what that results in:\n\nnine = 9\n\nDid we get the result we expected? Nothing was printed in the output, so seemingly nothing happened… That’s because the “=” sign is also used as an assignment operator in R, just like “<-”. So we basically assigned the value “9” to the object “nine” again. To use the equal sign as a logical operator we must type it twice (==). Let’s see:\n\nnine == 9\n\n[1] TRUE\n\n\nNow we get the answer “TRUE”, as expected.\nThis distinction between “=” and “==” is important to keep in mind. What would have happened if we had tried to test whether our object “nine” equals value “5” or not, and instead of the logical operator (==) we used the assignment operator (=)? Let’s see:\n\nnine = 5\n\nIn the Console we again see no results printed, but if we check our Environment, we see that the value of the object “nine” was changed to 5. So it can be a dangerous business. We’ll be using the “<-” as assignment operator instead of “=” to avoid any confusion in this respect. The distinction between == and = will also emerge in other contexts later.\nSo, try out the following commands in turn now and check if the results are what you’d expect:\n\nnine == 9\n\n[1] FALSE\n\nnine == 5\n\n[1] TRUE\n\nfive <- 9\nnine == five\n\n[1] FALSE\n\nfive = nine\nnine == five\n\n[1] TRUE\n\nnine + five <= 10 # lower than or equal to ...\n\n[1] TRUE\n\n\nThe text following the hashtag (#) in the last line is a comment. If you’d like to comment on any code you write just add a hash (#) or series of hashes in front of it so that R knows it should not evaluate it as a command. This will be useful when writing your commands in an R script that you can save for later, rather than interacting with R live in the Console."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#scripts-markdown-documents-and-projects",
"href": "materials/worksheets/worksheets_w01.html#scripts-markdown-documents-and-projects",
"title": "Week 1 Computer Lab Worksheet",
"section": "Scripts, markdown documents and projects",
"text": "Scripts, markdown documents and projects\nBefore learning to do more with R, let’s learn about some further file types and complete our RStudio setup. Writing brief commands that you want to test out in the Console is okay, but what you really want is to save your commands as part of a workflow in a dedicated file that you can reuse, extend and share with others. In every quantitative analysis, we need to ensure that each step in our analysis is traceable and reproducible. This is increasingly a professional standard expected of all data analysts in the social sciences. This means that we need to have an efficient way in which to share our analysis code, as well as our outputs and interpretations of our findings. RStudio has an efficient way of handling this requirement with the use of R script files and versions of the Markdown markup language that allow the efficient combining of plain text (as in the main body of an article) with analysis code and outputs produced in R. The table below lists the main characteristics of these file types:\n\n\nFormat\nExtension\nDescription\n\n\n\nR Script\n.R\nUse an R script if you want to document a large amount of code used for a particular analysis project. Scripts should contain working R commands and human-readable comments explaining the code. Commands can be run line-by-line, or the whole R script can be run at once. For example, one can write an R script containing a few hundred or thousands of lines of code that gathers and prepares raw, unruly data for analysis; if this script can run without any errors, then it can be saved and sourced from within another script that contains code that undertakes the analysis using the cleansed dataset. Comments can be added by appending them with a hashtag (#).\n\n\nR Markdown\n.Rmd\n\nMarkdown is a simple markup language that allows the formatting of plain text documents. R Markdown is a version of this language written by the R Studio team, which also allows for R code to be included. Plain text documents having the .Rmd extension and containing R Markdown-specific code can be “knitted” (exported) directly into published output document formats such as HTML, PDF or Microsoft Word, which contain both normal text as well as tables and charts produced with the embedded R code. The code itself can also be printed to the output documents.\n\n\nQuarto document\n.qmd\nQuarto is a newer version of R Markdown which allows better compatibility with other programming languages. It is a broader ecosystem design for academic publishing and communication (for example, the course website was built using quarto), but you will be using only Quarto documents in this module. There isn’t much difference between .Rmd and .qmd documents for their uses-cases on this module, so one could easily change and .Rmd extension to .qmd and still produce the same output. .qmd documents are “rendered” instead of “knitted”, but for RStudio users the underlying engine doing the conversion from Quarto/R Markdown to standard Markdown to output file (HTML, PDF, Word, etc.) is the same. Read more about Quarto document in the TSD textbook.\n\n\n\nCreating new files can be done easily via the options File > New File > from the top RStudio menu.\nThe best way to use these files are as part of R project folders, which allow for cross-references to documents and datasets to be made relative to the path of the project folder root. This makes sure that no absolute paths to files (i.e. things like “C:/Documents/Chris/my_article/data_files/my_dataset.rds”) need to be used (instead, you would write something like “~/data_files/my_dataset.rds” if the “my_article” folder was set up as an R Project). This allows for the same code file to be run on another computer too without an error, ensuring a minimal expected level of reproducibility in your workflow.\nSetting up an existing or a new folder as an R Project involves having a file with the extension .RProj saved in it. This can be done easily via the options File > New Project from the top RStudio menu."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#task-2-set-up-a-new-r-project-with-an-.r-script-and-a-.qmd-document-included",
"href": "materials/worksheets/worksheets_w01.html#task-2-set-up-a-new-r-project-with-an-.r-script-and-a-.qmd-document-included",
"title": "Week 1 Computer Lab Worksheet",
"section": "Task 2: Set up a new R Project, with an .R script and a .qmd document included:",
"text": "Task 2: Set up a new R Project, with an .R script and a .qmd document included:\n\nCreate a new folder set up as an R project; call the folder “HSS8005_labs”; when done, you should have an empty folder with a file called “HSS8005_labs.Rproj” in it\nCreate a new R script (.R); once created, save it as “Lab_1.R” within the “HSS8005_labs” folder\nCreate a new Quarto document (.qmd); once created, save it as “Lab_1.qmd” within the “HSS8005_labs” folder\n\nYou will work in each of these new documents in this lab to gain experience with them."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#data-types-and-structures",
"href": "materials/worksheets/worksheets_w01.html#data-types-and-structures",
"title": "Week 1 Computer Lab Worksheet",
"section": "Data types and structures",
"text": "Data types and structures\nThe basic elements of data in R are called vectors. The objects that we have in the Environment, the ones we created in Task 1 are simple numeric vectors of length 1. R has 6 basic data types that you should be aware of:\n\ncharacter: a text string, e.g. “name”\nnumeric: a real or decimal number\ninteger: non-decimal number; often represented by a number followed by the letter “L”, e.g. 5L\nlogical: TRUE or FALSE\ncomplex: complex numbers with real and imaginary parts\n\nR provides several functions to examine features of vectors and other objects, for example:\n\nclass() - what kind of object is it (high-level)?\ntypeof() - what is the object’s data type (low-level)?\nlength() - how long is it? What about two dimensional objects?\nattributes() - does it have any metadata?"
},
{
"objectID": "materials/worksheets/worksheets_w01.html#task-3-vector-operations-in-the-r-script",
"href": "materials/worksheets/worksheets_w01.html#task-3-vector-operations-in-the-r-script",
"title": "Week 1 Computer Lab Worksheet",
"section": "Task 3: Vector operations in the R script",
"text": "Task 3: Vector operations in the R script\nLet’s learn a few vector operations. Type/copy the code below to the R script file we created earlier, and save it at the end for your records.\nFirst, let’s use the c() function to concatenate vector elements:\n\nx <- c(2.2, 6.2, 1.2, 5.5, 20.1)\n\nTo run this line of code in an R script, place the cursor on the line you want to execute and either click on the small “Run” tab in the upper-right corner of the script’s task bar, or click Ctrl+Enter (on Windows PCs).\nThe vector called x that we just created appears in the Environment. We can examine some of its features:\n\nclass(x)\n\n[1] \"numeric\"\n\ntypeof(x)\n\n[1] \"double\"\n\nlength(x)\n\n[1] 5\n\nattributes(x)\n\nNULL\n\n\nThese tell us something about the characteristics of the object, but not much about its content (apart from the fact that it has a length of 5). Functions such as min, max, range, mean, median, sum or summary give us some summary statistics about the object:\n\nmin(x)\n\n[1] 1.2\n\nmax(x)\n\n[1] 20.1\n\nrange(x)\n\n[1] 1.2 20.1\n\nmean(x)\n\n[1] 7.04\n\nmedian(x)\n\n[1] 5.5\n\nsum(x)\n\n[1] 35.2\n\nsummary(x)\n\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 1.20 2.20 5.50 7.04 6.20 20.10 \n\n\nThe seq() function lets us create a sequence from a starting point to an ending point. If you specify the by argument, you can skip values. For instance, if we wanted a vector of every 5th number between 0 and 100, we could write:\n\nnumbers <- seq(from = 0, to = 100, by = 5)\n\nTo print out the result in the console, we can simply type the name of the object:\n\nnumbers\n\n [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90\n[20] 95 100\n\n\nA shorthand version to get a sequence between two numbers counting by 1s is to use the : sign. For example, print out all the numbers between 200 and 250:\n\n200:250\n\n [1] 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218\n[20] 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237\n[39] 238 239 240 241 242 243 244 245 246 247 248 249 250\n\n\nTo access a single element of a vector by position in the vector, use the square brackets []:\n\nx[2]\n\n[1] 6.2\n\n\nIf you want to access more than one element of a vector, put a vector of the positions you want to access in the brackets:\n\nx[c(2, 5)]\n\n[1] 6.2 20.1\n\n\nIf you try to access an element past the length of the vector, it will return a missing value NA:\n\nx[10]\n\n[1] NA\n\n\nIf you accidentally subset a vector by NA (the missing value), you get the vector back with all its entries replaced by NA:\n\nx[NA]\n\n[1] NA NA NA NA NA\n\n\nLet’s say you want to modify one value in your vector. You can combine the square bracket subset [] with the assignment operator <- to replace a particular value:\n\nx\n\n[1] 2.2 6.2 1.2 5.5 20.1\n\nx[3] <- 50.3\nx\n\n[1] 2.2 6.2 50.3 5.5 20.1\n\n\nYou can replace multiple values at the same time by using a vector for subsetting:\n\nx\n\n[1] 2.2 6.2 50.3 5.5 20.1\n\nx[1:2] <- c(-1.3, 42)\nx\n\n[1] -1.3 42.0 50.3 5.5 20.1\n\n\nIf the replacement vector (the right-hand side) is shorter than what you are assigning to (the left-hand side), the values will “recycle” or repeat as necessary:\n\nx[1:2] <- 3.2\nx\n\n[1] 3.2 3.2 50.3 5.5 20.1\n\nx[1:4] <- c(1.2, 2.4)\nx\n\n[1] 1.2 2.4 1.2 2.4 20.1\n\n\nYou can also create a vector of characters (words, letters, punctuation, etc):\n\njedi <- c(\"Yoda\", \"Obi-Wan\", \"Luke\", \"Leia\", \"Rey\")\n\nNote for vectors, you cannot mix characters and numbers in the same vector. If you add a single character element, the whole vector gets converted.\n\n### output is numeric\nx\n\n[1] 1.2 2.4 1.2 2.4 20.1\n\n### output is now character\nc(x, \"hey\")\n\n[1] \"1.2\" \"2.4\" \"1.2\" \"2.4\" \"20.1\" \"hey\" \n\n\nLogical vectors are just vectors that only contain the special R values TRUE or FALSE.\n\nlogical <- c(TRUE, FALSE, TRUE, TRUE, FALSE)\nlogical\n\n[1] TRUE FALSE TRUE TRUE FALSE\n\n\nYou could but never should shorten TRUE to T and FALSE to F. It’s easy for this shortening to go wrong so better just to spell out the full word. Also not that this is case-sensitive, and this will produce an error:\n\ntrue\n\nError in eval(expr, envir, enclos): object 'true' not found\n\nTrue\n\nError in eval(expr, envir, enclos): object 'True' not found\n\nfalse\n\nError in eval(expr, envir, enclos): object 'false' not found"
},
{
"objectID": "materials/worksheets/worksheets_w01.html#data-frames",
"href": "materials/worksheets/worksheets_w01.html#data-frames",
"title": "Week 1 Computer Lab Worksheet",
"section": "Data frames",
"text": "Data frames\nIt is useful to know about vectors, but we will use them primarily as part of larger data frames. Data frames are objects that contain several vectors of similar length. In a data frame each column is a variable and each row is a case. They look like spreadsheets containing data. There are several toy data frames built into R, and we can have a look at one to see how it looks like. For example, the cars data frame is built into R and so you can access it without loading any files. To get the dimensions, you can use dim(), nrow(), and ncol().\n\ndim(mtcars)\n\n[1] 32 11\n\nnrow(mtcars)\n\n[1] 32\n\nncol(mtcars)\n\n[1] 11\n\n\nWe can also load the dataset into our Environment and look at it manually:\n\nmtcars <- mtcars\n\nThe new object has appeared in the Environment under a new section called Data. We can click on it and the dataset will open up in the Source pane. What do you think this dataset is about?\nYou can select each column/variable from the data frame use the $, turning it into a vector:\n\nmtcars$wt\n\n [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070\n[13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840\n[25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780\n\n\nYou can now treat this just like a vector, with the subsets and all.\n\nmtcars$wt[1]\n\n[1] 2.62\n\n\nWe can subset to the first/last k rows of a data frame\n\nhead(mtcars)\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nMazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4\nMazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4\nDatsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1\nHornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1\nHornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2\nValiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1\n\ntail(mtcars)\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nPorsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2\nLotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8\nVolvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2\n\n\nThere are various ways in which one can further subset and wrangle vectors and data frames using base R functions, but the tidyverse and other user-written packages provide more functionality and ease of use. In this course, we will rely mostly on these."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#functions",
"href": "materials/worksheets/worksheets_w01.html#functions",
"title": "Week 1 Computer Lab Worksheet",
"section": "Functions",
"text": "Functions\nWe have already encountered some basic functions earlier. Most of the work in R is done using functions. It’s possible to create your own functions. This makes R extremely powerful and extendible. We’re not going to cover making your own functions in this course, but it’s important to be aware of this capability. There are plenty of good resources online for learning how to do this, including this one.\nAdvanced user exercise: leap year functions\nIf you have more advanced knowledge of R, here’s and exercise for you. Suppose you want to write a function that lists all the leap years between two specified years. How would you go about writing it? What are the information that you need first? What are the steps that you would take to build up the function? There are several ways of achieving such a function, and you can find three options at the bottom of this worksheet. Work individually or in a small group. Compare your results to the options given at the end."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#packages",
"href": "materials/worksheets/worksheets_w01.html#packages",
"title": "Week 1 Computer Lab Worksheet",
"section": "Packages",
"text": "Packages\nInstead of programming your own functions in the R language, you can rely on functions written by other people and bundled within a package that performs some set task. There are a large number of reliable, tested and oft-used packages containing functions that are particularly useful for social scientists.\nSome particularly useful packages: - the tidyverse bundle of packages, which includes the dplyr package (for data manipulation) and additional R packages for reading in (readr), transforming (tidyr) and visualizing (ggplot2) datasets. - to import datasets in non-native formats and to manage attached labels (a concept familiar from other statistical packages but foreign to R), load the sjlabelled package (an alternative to haven and labelled, which work in a similar way but provide less functionality) - the sjmisc package contains very useful functions for undertaking data transformations on labelled variables (recoding, grouping, missing values, etc); also has some useful tabulation functions - the sjPlot package contains functions for graphing and tabulating results from regression models\nPackages are often available from the Comprehensive R Archive Network (CRAN) or private repositories such as Bioconductor, GitHub etc. Packages made available on CRAN can be installed using the command install.packages(\"packagename\"). Once the package/library is installed (i.e. it is sitting somewhere on your computer), we then need to load it to the current R session using the command library(packagename).\nSo using a package/library is a two-stage process. We:\n\n\nInstall the package/library onto your computer (from the internet)\n\nLoad the package/library into your current session using the library command.\n\nLet’s start by installing the ‘tidyverse’ package, and then load it:\n\ninstall.packages(\"tidyverse\") ## this command installs packages from CRAN; note the quotation marks around the package name\n\nYou can check the suite of packages that are loaded when you load the Tidyverse library using a command from the tidyverse itself:\n\ntidyverse_packages()\n\n\nQuestion\nWhy do you think we got an error message when we tried to run the above command?\n\nBecause tidyverse_packages() is itself a function from the tidyverse, in order to use that function we need not only to install the tidyverse but also to make its functions available. In other words, we did not yet load the tidyverse for use in our R session, we only just installed it on our computers.\nIf we don’t want to load a package that we have downloaded - because maybe we only want to use a single function once and we don’t want to burden our computer’s memory, we can state explicitly which package the function is from in the following way:\n\ntidyverse::tidyverse_packages() # Here we state the package followed by two colons, then followed by the function we want\n\nBut in many cases we do want to use several functions at various points in an analysis session, so it is usually useful to load the entire package or set of packages:\n\nlibrary(tidyverse)\n\nNow we can use functions from that package without having to explicitly state the name of the package. We can still state the name explicitly, and that may be useful for readers of our code to understand what package a function come from. Also, it may happen that different packages have similarly named functions, and if all those packages are loaded, then the functions from a package loaded later will override that in the package loaded earlier. R will note in a comment whether any functions from a package are masked by another, so it’s worth paying attention to the comments and warnings printed by R when we load packages.\nThere are also convenience tools - e.g. the pacman package - that make it easier to load several packages at once, while at the same time downloading the package if it has not yet been downloaded on our computer.\nFor example, we can download a number of packages with the command below:\n\n# Install 'pacman' if not yet installed:\n\nif (!require(\"pacman\")) install.packages(\"pacman\") \n\n# Then load/install other packages using 'pacman':\n\npacman::p_load(\n tidyverse, # general data management tools ('dplyr', etc.)\n sjlabelled, # data import from other software (alternative to 'haven') and labels management\n sjmisc # data transformation on variables (recoding,grouping, missing values, etc)\n )"
},
{
"objectID": "materials/worksheets/worksheets_w01.html#about-the-tidyverse",
"href": "materials/worksheets/worksheets_w01.html#about-the-tidyverse",
"title": "Week 1 Computer Lab Worksheet",
"section": "About the Tidyverse\n",
"text": "About the Tidyverse\n\nData frames and ‘tibbles’\nThe Tidyverse is built around the basic concept that data in a table should have one observation per row, one variable per column, and only one value per cell. Once data is in this ‘tidy’ format, it can be transformed, visualized and modelled for an analysis.\nWhen using functions in the Tidyverse ecosystem, most data is returned as a tibble object. Tibbles are very similar to the data.frames (which are the basic types of object storing datasets in base R) and it is perfectly fine to use Tidyverse functions on a data.frame object. Just be aware that in most cases, the Tidyverse function will transform your data into a tibble. If you are unobservant, you won’t even notice a difference. However, there are a few differences between the two data types, most of which are just designed to make your life easier. For more info, check R4DS.\nSelected dplyr functions\nThe dplyr package is designed to make it easier to manipulate flat (2-D) data (i.e. the type of datasets we are most likely to use, which are laid out as in a standard spreadsheet, with rows referring to cases (observations; respondents) and columns referring to variables. dplyr provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code. Here are some of the most common functions in dplyr:\n\n\nfilter() chooses rows based on column values.\n\narrange() changes the order of the rows.\n\nselect() changes whether or not a column is included.\n\nrename() changes the name of columns.\n\nmutate()/transmute() changes the values of columns and creates new columns (variables)\n\nsummarise() compute statistical summaries (e.g., computing the mean or the sum)\n\ngroup_by() group data into rows with the same values\n\nungroup() remove grouping information from data frame.\n\ndistinct() remove duplicate rows.\n\nAll these functions work similarly as follows:\n\nThe first argument is a data frame/tibble\nThe subsequent arguments are comma separated list of unquoted variable names and the specification of what you want to do\nThe result is a new data frame\n\nFor more info, check R for Social Scientists\nThe forward-pipe (%>%/|>) workflow\nAll of the dplyr functions take a data frame or tibble as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the forward-pipe operator %>% from the magrittr package. This operator allows us to combine multiple operations into a single sequential chain of actions. As of R 4.1.0 there is also a native pipe operator in R (|>), and in RStudio one can set the shortcut to paste the new pipe operator instead (as we have done at the beginning of the lab). Going forward, we’ll use this version of the pipe operator for simplicity, but it’s likely that you will encounter the older version of the operator too in various scripts.\nLet’s start with a hypothetical example. Say you would like to perform a sequence of operations on data frame x using hypothetical functions f(), g(), and h():\n\nTake x then\n\nUse x as an input to a function f() then\n\nUse the output of f(x) as an input to a function g() then\n\nUse the output of g(f(x)) as an input to a function h()\n\nOne way to achieve this sequence of operations is by using nesting parentheses as follows:\nh(g(f(x)))\nThis code isn’t so hard to read since we are applying only three functions: f(), then g(), then h() and each of the functions is short in its name. Further, each of these functions also only has one argument. However, you can imagine that this will get progressively harder to read as the number of functions applied in your sequence increases and the arguments in each function increase as well. This is where the pipe operator |> comes in handy. |> takes the output of one function and then “pipes” it to be the input of the next function. Furthermore, a helpful trick is to read |> as “then” or “and then.” For example, you can obtain the same output as the hypothetical sequence of functions as follows:\nx |> \n f() |> \n g() |> \n h()\nYou would read this sequence as:\n\nTake x then\n\nUse this output as the input to the next function f() then\n\nUse this output as the input to the next function g() then\n\nUse this output as the input to the next function h()\n\nSo while both approaches achieve the same goal, the latter is much more human-readable because you can clearly read the sequence of operations line-by-line. Instead of typing out the three strange characters of the operator, one can use the keyboard shortcut Ctrl + Shift + M (Windows) or Cmd + Shift + M (MacOS) to paste the operator."
},
{
"objectID": "materials/worksheets/worksheets_w01.html#task-4-data-frame-operations-in-a-quarto-document",
"href": "materials/worksheets/worksheets_w01.html#task-4-data-frame-operations-in-a-quarto-document",
"title": "Week 1 Computer Lab Worksheet",
"section": "Task 4: Data frame operations in a Quarto document",