Skip to content

Commit

Permalink
Terminology improvements (#17)
Browse files Browse the repository at this point in the history
* Remove replicate term

* Use simulation repetition terminology

* Clarify use of nrep

* Tweak use of terminology

* Use "repetitions" in chapter overview in index.qmd and README.md as well

* Shorten "simulation repetitions" to "repetitions" in most context but especially when used in combination with "number of" 
Reword "simulation repetitions of N(0,1)" to "repetitions simulating N(0,1)"

* Add "replications" once as a common alternative term

---------

Co-authored-by: Julian Lange <32773007+langejulian@users.noreply.github.com>
  • Loading branch information
NeuroShepherd and langejulian authored Sep 12, 2024
1 parent 07e2dfe commit 3134c25
Show file tree
Hide file tree
Showing 11 changed files with 38 additions and 36 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ It is necessary that you work through the sections of the tutorial in order. Ple
* [Repeat](./tutorial_pages/repeat.qmd) – How to repeat the generation of random numbers multiple times?
* [Setting the seed](./tutorial_pages/seed.qmd) – How can you generate the same random numbers?
* [Sample size `n`](./tutorial_pages/sample-size-n.qmd) – How many values should you generate within a simulation?
* [Number of simulations `nrep`](./tutorial_pages/number-of-simulations-nrep.qmd) – How many repeats of a simulation should you run?
* [Number of repetitions `nrep`](./tutorial_pages/number-of-simulations-nrep.qmd) – How many repeats of a simulation should you run?
* [DRY rule](./tutorial_pages/dry-rule.qmd) – How to write your own functions?
* [Simulate to check alpha](./tutorial_pages/check-alpha.qmd) – Write your first simulation and check the rate of false-positive findings.
* [Simulate to check power](./tutorial_pages/check-power.qmd) – Simulate data to perform a power analysis.
Expand Down
2 changes: 1 addition & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ It is necessary that you work through the sections of the tutorial in order. Ple
* [Repeat](./tutorial_pages/repeat.qmd) – How to repeat the generation of random numbers multiple times?
* [Setting the seed](./tutorial_pages/seed.qmd) – How can you generate the same random numbers?
* [Sample size `n`](./tutorial_pages/sample-size-n.qmd) – How many values should you generate within a simulation?
* [Number of simulations `nrep`](./tutorial_pages/number-of-simulations-nrep.qmd) – How many repeats of a simulation should you run?
* [Number of repetitions `nrep`](./tutorial_pages/number-of-simulations-nrep.qmd) – How many repeats of a simulation should you run?
* [DRY rule](./tutorial_pages/dry-rule.qmd) – How to write your own functions?
* [Simulate to check alpha](./tutorial_pages/check-alpha.qmd) – Write your first simulation and check the rate of false-positive findings.
* [Simulate to check power](./tutorial_pages/check-power.qmd) – Simulate data to perform a power analysis.
Expand Down
4 changes: 2 additions & 2 deletions tutorial_pages/basic-principles.qmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Basic principles

Basically, a simulation consist in:
Basically, a simulation consists of:
**1) Generating `n` random numbers from a known distribution.**
**2) Repeating this `nrep` times.**

Once you know how to do this, the questions we will explore are:
**1) What sample size `n` should we use within a simulation?**
**2) How many simulations `nrep` should we run?**
**2) How many simulation repetitions `nrep` should we run?**

***

14 changes: 7 additions & 7 deletions tutorial_pages/check-alpha.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ If you draw from the same normal distribution twice, will the mean of the two sa
3. Figure out how to extract the *p*-value from that object (explore your R object with the functions `str()` or `names()`).
4. Write a function `simT()` that generates two vectors of `n` values drawn from a standard normal distribution (N(0,1)), compares them with a *t*-test, and returns the *p*-value.
5. Test your function by calling it for `n = 50`.
6. For `n = 10`, generate `nrep = 20` repetitions and draw a histogram.
7. Repeat the previous task with `nrep = 100`.
6. For a sample size of `n = 10`, generate `nrep = 20` repetitions and draw a histogram.
7. Repeat the previous task with `nrep = 100` repetitions.

***

***p*-values of *t*-tests comparing means from 20 or 100 simulations of N(0,1) with n = 10:**
***p*-values of *t*-tests comparing means from 20 or 100 repetitions simulating N(0,1) with n = 10:**
<br/>
<img src="../assets/ttest-changing-nrep.png" width="500">
<br/>
Expand All @@ -33,17 +33,17 @@ Are those deviations meaningful? Are they significant?
***

**YOUR TURN:**
1. Plot a histogram of `nrep = 1000` outputs of the function `simT` with `n = 10`.
2. Plot a histogram of `nrep = 1000` outputs of the function `simT` with `n = 100`.
1. Plot a histogram of `nrep = 1000` repetitions of the function `simT` with `n = 10`.
2. Plot a histogram of `nrep = 1000` repetitions of the function `simT` with `n = 100`.

***

***p*-values of *t*-tests comparing means from 1000 simulations of N(0,1) with n=10 or n=100:**
***p*-values of *t*-tests comparing means from 1000 repetitions simulating N(0,1) with n=10 or n=100:**
<br/>
<img src="../assets/ttest-changing-n.png" width="500">
<br/>

In both cases, we expect 50 out of the 1000 tests to be significant by chance (i.e. with a *p*-value under 0.05). In my simulations, I get 40 and 45 false positive results, for `n = 10` and `n = 100`, respectively. How many did you get?
In both cases, we expect 50 out of the 1000 tests to be significant by chance (i.e. with a *p*-value under 0.05). In my simulation repetitions, I get 40 and 45 false positive results for `n = 10` and `n = 100`, respectively. How many did you get?

These proportions are not significantly different from 5%.

Expand Down
4 changes: 2 additions & 2 deletions tutorial_pages/check-power.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,12 @@ If we sample values from two normal distributions with different means (e.g. N(0
i. Draws `n` values from a random normal distribution with `mean1` and another `n` values from a normal distribution with `mean2`.
ii. Compares the means of these two samples with a *t*-test and extracts the *p*-value.

2. Replicate the function 1000 times using the parameters used in the power calculation above (that used the `power.t.test()` function).
2. Repeat the function 1000 times using the parameters used in the power calculation above (that used the `power.t.test()` function).
3. Calculate the proportion of *p*-values that are smaller than 0.05.

***

***p*-values of *t*-tests comparing means from 1000 simulations of N(0,1) and N(0.5,1) with n = 64:**
***p*-values of *t*-tests comparing means from 1000 repetitions simulating N(0,1) and N(0.5,1) with n = 64:**

<br/>
<img src="../assets/hist-power.png" width="500">
Expand Down
6 changes: 3 additions & 3 deletions tutorial_pages/dry-rule.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,11 @@ To build up a function, start by writing the "`stuff`" outside the function to t
***

**YOUR TURN:**
1. Create a function that repeats the calculation of the mean of 100 values drawn from a standard normal distribution (use `mean(rnorm(n = 100))` for this calculation) `nrep` times and returns a histogram of the `nrep` means.
2. Modify your function such that, in addition to `nrep`, the number of drawn values `n` (i.e. argument `n` of the `rnorm()` function) can also be varied when calling your function.
1. Create a function that repeats the calculation of the mean of 100 values drawn from a standard normal distribution (use `mean(rnorm(n = 100))` for this calculation) `nrep` times and returns a histogram of the `nrep` means.
2. Modify your function such that, in addition to the number of repetitions `nrep`, the number of drawn values `n` (i.e. argument `n` of the `rnorm()` function) can also be varied when calling your function.


***

Note that it is useful to define `nrep` outside of the function, so users of your script can more easily change that value, e.g. from a low number (to verify the script runs without error) to a large number (to obtain reliable results).
Note that it is useful to define the number of repetitions `nrep` outside of the function, so users of your script can more easily change that value, e.g. from a low number (to verify the script runs without error) to a large number (to obtain reliable results).

6 changes: 3 additions & 3 deletions tutorial_pages/general-structure.qmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# General structure of a simulation

1. **Define** what type of data and variables need to be simulated, i.e. their **distribution**, their class (e.g. factor vs. numerical values), **sample sizes** (within a dataset and number of replicates), what will need to vary (e.g. the strength of relationship), etc.
1. **Define** what type of data and variables need to be simulated, i.e. their **distribution**, their class (e.g. factor vs. numerical values), **sample sizes** (within a dataset and number of repetitions), what will need to vary (e.g. the strength of relationship), etc.

2. **Generate data**, random data or data including an effect (e.g. an imposed correlation between two variables).

3. **Run the statistical test** you think is appropriate and record the relevant statistic (e.g. *p*-value).

4. **Replicate** step 2 and 3 to get the distribution of the statistic of interest.
4. **Repeat** step 2 and 3 to get the distribution of the statistic of interest.

5. Try out different parameter sets (**explore the parameter space** for which results are similar).

6. **Analyse and interpret the combined results of many simulations** within each set of parameters. For instance, check that you only get a significant result in 5% of the simulations (if `alpha = 0.05`) when you simulated no effect and that you get a significant result in 80% of the simulations (if you targeted a power of 80%) when you simulated an effect.
6. **Analyse and interpret the combined results of many simulation repetitions** within each set of parameters. For instance, check that you only get a significant result in 5% of the repetitions (if `alpha = 0.05`) when you simulated no effect and that you get a significant result in 80% of the repetitions (if you targeted a power of 80%) when you simulated an effect.

***

13 changes: 7 additions & 6 deletions tutorial_pages/number-of-simulations-nrep.qmd
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
# Number of simulations `nrep`
# Number of repetitions `nrep`

Sampling theory applies to the number of simulations `nrep` just as much as the sample size `n` within a simulation.
Sampling theory applies to the number of repetitions `nrep` (also referred to as the number of *replications*) just as much as it does to the sample size `n` within a simulation.

**Means and SDs from 24 simulations of N(0,1) with n = 10:**
**Means and SDs from 24 repetitions simulating N(0,1) with n = 10:**
<br/>
<img src="../assets/musd-24-10-N01.png" width="500">
<br/>

Now, let's do the same with a number of repeats `nrep` of 1000.
Now, let's do the same with a number of repetitions `nrep` of 1000.

**Means and SDs from 1000 simulations of N(0,1) with n = 10:**
**Means and SDs from 1000 repetitions simulating N(0,1) with n = 10:**
<br/>
<img src="../assets/1000hist10N01.png" width="500">
<br/>


### Conclusion
The number of simulations needs to be a large enough number to obtain a good representation of the distribution of the simulation results, e.g. 1000.

The number of repetitions needs to be a large enough number to obtain a good representation of the distribution of the simulation results, e.g. 1000.

***

Expand Down
2 changes: 1 addition & 1 deletion tutorial_pages/purpose.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ You can use computer simulations to:
<br/>

* **Perform power analyses.**
* *Example: Assess whether the sample size (within a replicate) is high enough to detect a simulated effect in more than 80% of the cases.*
* *Example: Assess whether the sample size (within a simulation repetition) is high enough to detect a simulated effect in more than 80% of the cases.*
<br/>

* **Perform bootstrapping to get a confidence interval around a parameter estimate.**
Expand Down
8 changes: 4 additions & 4 deletions tutorial_pages/real-life-example.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The R script screenshot below, `glm_Freq_vs_YN.R`, can be found in the folder [I
This walkthrough will use the steps as defined on the page '[General structure](./general-structure.qmd)'.


1. **Define sample sizes** (within a dataset and number of replicates), **experimental design** (fixed dataset structure, e.g. treatment groups, factors), and **parameters** that will need to vary (here, the strength of the effect).
1. **Define sample sizes** (within a dataset and number of repetitions), **experimental design** (fixed dataset structure, e.g. treatment groups, factors), and **parameters** that will need to vary (here, the strength of the effect).

<img src="../assets/define.png" width="1000">
<br/>
Expand All @@ -28,7 +28,7 @@ This walkthrough will use the steps as defined on the page '[General structure](
<br/>


4. **Replicate** steps 2 (data simulation) and 3 (data analyses) to get the distribution of the parameter estimates by wrapping these steps in a function.
4. **Repeat** steps 2 (data simulation) and 3 (data analyses) to get the distribution of the parameter estimates by wrapping these steps in a function.

Definition of the function at the beginning:
<br/>
Expand All @@ -38,7 +38,7 @@ This walkthrough will use the steps as defined on the page '[General structure](
<br/>
<img src="../assets/replicate2.png" width="1000">
<br/>
Replicate the function `nrep` times. Here, `pbreplicate()` is used to provide a bar of progress for R to run this command.
Repeat the function `nrep` times. Here, `pbreplicate()` is used to provide a bar of progress for R to run this command.
<br/>
<img src="../assets/replicate3.png" width="1000">
<br/>
Expand All @@ -48,7 +48,7 @@ This walkthrough will use the steps as defined on the page '[General structure](
<img src="../assets/explore.png" width="1000">
<br/>

6. **Analyse and interpret the combined results of many simulations**. In this case, the results of the two models were qualitatively the same (comparison of results for a few simulations), and both models gave the same expected 5% false positive results when no effect was simulated. Varying the effect (the probability of sampling 0 or 1 depending on the experimental treatment) allowed us to find the minimum effect size for which the number of positive results of the tests is over 80%.
6. **Analyse and interpret the combined results of many simulation repetitions**. In this case, the results of the two models were qualitatively the same (comparison of results for a few different parameter values), and both models gave the same expected 5% false positive results when no effect was simulated. Varying the effect (the probability of sampling 0 or 1 depending on the experimental treatment) allowed us to find the minimum effect size for which the number of positive results of the tests is over 80%.

<img src="../assets/conclude.png" width="1000">
<br/>
Expand Down
13 changes: 7 additions & 6 deletions tutorial_pages/sample-size-n.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ How many values should you generate within a simulation? Let's explore.

If I draw 10 data points from a normal distribution with a mean of 0 and a standard deviation of 1 (i.e N(0,1)), after setting the seed to 10 (for no specific reason), here is the distribution of the values I get:

**1 simulation of N(0,1) with n = 10:**
**1 repetition simulating N(0,1) with n = 10:**
<img src="../assets/hist10N01.png" width="300">
<br/>

If I repeat this simulation 24 times, here are the distributions of the 10 values pseudo-randomly sampled from N(0,1):

**24 simulations of N(0,1) with n = 10:**
**24 repetitions simulating N(0,1) with n = 10:**
<br/>
<img src="../assets/24hist10N01.png" width="600">
<br/>
Expand All @@ -19,21 +19,21 @@ If I repeat this simulation 24 times, here are the distributions of the 10 value
Note that because we are drawing from N(0,1), we expect the mean of the values drawn (mean(x), blue lines) to be very close to 0, i.e. the mean of the normal distribution we sample from (red dashed lines).
<br/>

How are the means and standard deviations of the 24 simulations distributed?
How are the means and standard deviations of the 24 repetitions distributed?

**Distributions of the means and SDs from 24 simulations of N(0,1) with n = 10:**
**Distributions of the means and SDs from 24 repetitions simulating N(0,1) with n = 10:**
<br/>
<img src="../assets/musd-24-10-N01.png" width="500">
<br/>

Now, let's do the same with a sample size `n` of 1000.

**24 simulations of the same distribution, i.e. N(0,1), with n = 1000:**
**24 repetitions simulating the same distribution, i.e. N(0,1), with n = 1000:**
<br/>
<img src="../assets/24hist1000N01.png" width="600">
<br/>

**Distributions of the means and SDs from 24 simulations of N(0,1) with n = 1000:**
**Distributions of the means and SDs from 24 repetitions simulating N(0,1) with n = 1000:**
<br/>
<img src="../assets/musd-24-1000-N01.png" width="500">
<br/>
Expand All @@ -43,6 +43,7 @@ Now, let's do the same with a sample size `n` of 1000.
The sample size within a simulation affects the **precision** with which the parameters of that distribution can be estimated.

What should determine the sample size within your simulation?

Choose a sample size that is relevant to the context of the simulation, e.g. the sample size you will be able to reach in your study or the minimum sample size that would allow you to detect the smallest effect of interest (as determined by a power analysis, which we will cover in a moment).

***
Expand Down

0 comments on commit 3134c25

Please sign in to comment.