Skip to content

Commit

Permalink
Merge branch 'main' into update-r-ii
Browse files Browse the repository at this point in the history
  • Loading branch information
diazrenata authored Oct 1, 2024
2 parents a2451d4 + 79d32ec commit 7c81131
Show file tree
Hide file tree
Showing 7 changed files with 133 additions and 37 deletions.
8 changes: 3 additions & 5 deletions _freeze/index/execute-results/html.json

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
project:
type: website
resources:
- "robots.txt"
render:
- "*.qmd"

Expand All @@ -12,6 +14,10 @@ website:
repo-actions: [edit, issue]
site-url: https://cct-datascience.github.io/repro-data-sci/
navbar:
tools:
- icon: github
text: Source #shows on hover
href: https://github.com/cct-datascience/repro-data-sci
left:
- href: index.qmd
text: Home
Expand Down
2 changes: 1 addition & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Welcome to the syllabus for the CCT Data Science fall workshop series: **Reprodu
We'll meet on Tuesdays and Thursdays from 11 a.m.to 1 p.m.
via Zoom (link pinned in Slack channel)

<!-- Edit by editing schedule.csv. To add links, use markdown [text](url) separated by commas. This code turns that into a bulleted list. -->
<!-- Edit by editing schedule.csv. To add links, use markdown [text](url) separated by commas. This code turns that into a bulleted list.-->

```{r}
#| echo: false
Expand Down
23 changes: 13 additions & 10 deletions lessons/10-get-credit/notes.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,13 @@ bibliography: references.bib

## Objective

Learn the wrap-up steps to publish/archive a research compendium with a DOI.
Understand reproducible computational environment.
Learn `renv` and discuss Docker (concept).
- Add a basic CITATION.cff file to your repo

- Practice the wrap-up steps to publish/archive a research compendium with a DOI.

- Understand concept of a reproducible computational environment.

- Learn `renv` and discuss Docker (concept).

## Lesson Outline

Expand All @@ -31,22 +35,21 @@ Learn `renv` and discuss Docker (concept).

- Show CITATION.cff files for this repo and maybe one for a research compendium
- Show "cite this" button on GitHub
- Show CITATION.cff creation tool [CFFINIT](https://citation-file-format.github.io/cff-initializer-javascript/#/)
- Maybe mention `cffr::cff_validate()`
- Everyone use CITATION.cff creation tool [CFFINIT](https://citation-file-format.github.io/cff-initializer-javascript/#/) to create a *basic* CITATION.cff
- *Maybe* mention `cffr::cff_validate()`

- Archiving

- Most participants probably won't be ready to follow along with their own repos, but we will be there to help when they are ready
- Demo archiving a repo with Zenodo using this repo
- Exercise: guide everyone through archiving a repo with Zenodo using sandbox.zenodo.org
- Add DOI badge to readme
- Update CITATION.cff with DOI

- `renv`

- Discuss why
- Ask students to activate `renv` for a project and inspect files it creates (have co-instructor share screen)
- Ask students to activate `renv` for a project and inspect files it creates
- Explain how `renv` works, especially `renv::status()` , and `renv::snapshot()`
- Clone co-instructor's repo with `renv` files
- Clone demo repo with `renv` files
- Show that no packages are available initially (project is isolated)
- run `renv::restore()`

Expand All @@ -58,4 +61,4 @@ Learn `renv` and discuss Docker (concept).

## Homework

- Prep for showcase session
- Prep for reproducibility colloquium
78 changes: 58 additions & 20 deletions lessons/10-get-credit/slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ date: "2024-10-03"
date-format: long
format:
uaz-revealjs:
reference-location: document
reference-location: section
link-external-newwindow: true
chalkboard: true
logo: "../../logo.png"
Expand All @@ -27,7 +27,7 @@ Figure 2 from @maitner
Figure 2: Predicted impacts of code-sharing on cumulative citations.
Predicted values shown are for the mean Impact Factor (3.0) across the publications analyzed.
Fully open = open code and open-access publication; fully closed = closed-access publication and no publicly available code.
Predictions are based on estimated model coe cients (Table 2)
Predictions are based on estimated model coeficients (Table 2)
:::

## Getting credit for your code
Expand All @@ -43,33 +43,51 @@ Make it **easier** for people to cite your code
## CITATION.cff

::: incremental
- Citation File Format are plain text files written in YAML
- A `CITATION.cff` file contains citation information written in YAML

- Adding a `CITATION.cff` file to your repo...

- Puts a "cite this repository" button on GitHub

- Helps code archive tools fill out metadata correctly when you archive your repo

- Create a CITATION.cff file with this [helper](https://citation-file-format.github.io/cff-initializer-javascript/#/)
- Learn more and create your own: <https://citation-file-format.github.io/>

- See example [here](https://github.com/cct-datascience/repro-data-sci/blob/main/CITATION.cff)
:::

## Options for archiving {.smaller}

| Service | Versioned DOIs? | Free? | GitHub integration? | Notes |
|--------------|--------------|--------------|--------------|----------------|
| Zenodo | Yes | Yes | Yes | Backed by CERN, built with code and data in mind |
| Dryad | Yes | No, but some publishers cover cost | No | Intended for data, not code. Partners with Zenodo |
| Figshare | Yes | Yes | Yes | Can't choose your license |
| UA ReDATA | Yes | Yes (for UA researchers) | No | University of Arizona Libraries |
| Service | Versioned DOIs? | Free? | GitHub integration? | Notes |
|---------------|---------------|---------------|---------------|---------------|
| Zenodo | Yes | Yes | Yes | Backed by CERN, built with code and data in mind |
| Dryad | Yes | No, but some publishers cover cost | No | Intended for data, not code. Partners with Zenodo |
| Figshare | Yes | Yes | Yes | Can't choose your license |
| UA ReDATA | Yes | Yes (for UA researchers) | No | University of Arizona Libraries |

::: aside
More detailed comparisons [here](https://www.agu.org/-/media/Files/Publications/Generalist-Data-Repository-Grid.pdf)
:::

# Zenodo Archiving Demo
## Zenodo Archiving Demo

1. Log in to [sandbox.zenodo.org](https://sandbox.zenodo.org) using GitHub[^1]
2. In drop-down menu with your username, select "GitHub"
3. Find your repo in the list and flip the switch next to it
4. Go to your repo on GitHub and make a release
5. On sandbox.zenodo.org, get markdown to add a badge to README.md

::: notes
Demo archiving a repo with sandbox.zenodo.org and have everyone follow along with the repo they've been using for notes.
:::

## When to archive?

No hard rules on this, but my preference:

1. Just before submitting a manuscript: release v 0.1.0
2. After responding to reviewers or re-submitting: increment "minor" version, e.g. v 0.2.0
3. After acceptance: release v 1.0.0

## Reproducible computational environments

Expand Down Expand Up @@ -113,15 +131,20 @@ To deactivate `renv`, run `renv::deactivate()`.
To also remove all the files it created, run `renv::deactivate(clean = TRUE)` instead.
:::

::: notes
Briefly discuss the contents of `renv.lock`, `.Rprofile`, and `renv/`.
Point out `renv/.gitignore` — trust it about which files should go on GitHub.
:::

## Limitations of `renv`

- Only tracks R packages [^1]
- Only tracks R packages [^2]

- Can't reproduce operating system or system libraries

- Sometimes quite annoying to use (but it's getting better!)

## Reproducible Everything with Docker
## *More* Reproducibility with Docker

Docker containers...

Expand All @@ -135,7 +158,7 @@ Docker containers...
- Can be downloaded and run from the command line
:::

## Making a Docker Container
## Making a Docker Container <!--# skipped for 2024 --> {visibility="hidden"}

A **Dockerfile** holds instructions on what to install and what code to run.
Actually creating a Docker container is beyond the scope of this workshop, but you can [learn how](https://www.r-bloggers.com/2019/02/running-your-r-script-in-docker/)!
Expand All @@ -159,7 +182,7 @@ RUN Rscript /02_code/install_packages.R

## Hold up, what *is* reproducibility again?

There is a reproducibility **tradeoff** for using `renv` and Docker---robust computational reproducibility **but** harder for novices to reproduce
There is a reproducibility **trade-off** for using `renv` and Docker---robust computational reproducibility **but** harder for novices to reproduce

. . .

Expand All @@ -169,17 +192,32 @@ If you use these tools, provide:
- Where to go for help troubleshooting
- Ways to access your code *without* extra layers

## Drop-in Session & Showcase
## Resources

- [`renv` website](https://rstudio.github.io/renv/articles/renv.html)

- CITATION.cff [info](https://citation-file-format.github.io/) and [creator tool](https://citation-file-format.github.io/cff-initializer-javascript/#/)

- [Zenodo-GitHub integration](https://zenodo.org/account/settings/github/)

- Next week (10/10, 10/12): No workshops!
- [Reproducible computational environments with Docker](https://reproducibility.rocks/materials/day4/01-docker/)

- Tuesday 10/17: Drop-in help session
## Next week

- Tuesday 10/24: Reproducibility show & tell
- Tuesday 10/18: **Drop-in co-working session**.

- Come and work on your reproducibility colloquium project/presentation

- Thursday 10/10: **Reproducibility Colloquium!**

- Invite your lab-mates, PI, friends!

## References

::: refs
:::

[^1]: `renv` can also be set up to track [Python dependencies](https://rstudio.github.io/renv/articles/python.html)
[^1]: sandbox.zenodo.org is just for practice.
Use zenodo.org when you're ready to get a *real* DOI!

[^2]: `renv` can also be set up to track [Python dependencies](https://rstudio.github.io/renv/articles/python.html)
51 changes: 51 additions & 0 deletions robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# sources:
# https://www.cyberciti.biz/web-developer/block-openai-bard-bing-ai-crawler-bots-using-robots-txt-file/
# https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/

# Data from Common Crawl is used to train ChatGPT, Bard, etc.
User-agent: CCBot
Disallow: /

# Stops ChatGPT users from instructing ChatGPT to access our site
User-agent: ChatGPT-User
Disallow: /

# Don't add any content to the GPT model
User-agent: GPTBot
Disallow: /

# Blocks Bard and VertexAI. Does not impact search indexing.
User-agent: Google-Extended
Disallow: /

# webz.io. They sell data for training LLMs
User-agent: Omgilibot
Disallow: /

User-agent: Omgili
Disallow: /

# Specific to AI. Won't prevent previews from showing up correctly on Facebook posts
User-agent: FacebookBot
Disallow: /

# Anthropic AI (Claude)
User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: ClaudeBot
Disallow: /

# ByteDance's bot for gathering LLM training data
User-agent: Bytespider
Disallow: /

User-agent: ImagesiftBot
Disallow: /

# Takes content and re-writes it using genAI
User-agent: PerplexityBot
Disallow: /
2 changes: 1 addition & 1 deletion schedule.csv
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ Lesson,Date,Theme,Topic,Links,Notes
7,2024-09-24,Tidy & Wrangle,Data manipulation & coding best practices,[slides](lessons/7-data-manipulation/slides.html),data carpentry R ecology episodes 2 & 3
8,2024-09-26,Repeat & Reproduce,Intermediate R programming I,[slides](lessons/8-intermediate-r-1/slides.html),functions
9,2024-10-01,Repeat & Reproduce,Intermediate R programming II,[slides](lessons/9-intermediate-r-2/slides.html),iteration
10,2024-10-03,Document & Publish,Getting credit for your hard work,,"renv, LICENSE, CITATION.cff, Zenodo, GitHub releases"
10,2024-10-03,Document & Publish,Getting credit for your hard work,[slides](lessons/10-get-credit/slides.html),"renv, LICENSE, CITATION.cff, Zenodo, GitHub releases"
11,2024-10-08,Review,Drop-in co-working,,
12,2024-10-10,Reproducibility Colloquium,An opportunity for you to show off what you've learned,[guidelines](colloquium.html),

0 comments on commit 7c81131

Please sign in to comment.