Skip to content

Commit

Permalink
Update 04152024
Browse files Browse the repository at this point in the history
  • Loading branch information
sungsy12345 committed Apr 15, 2024
1 parent 9f43357 commit 98b5099
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 54 deletions.
17 changes: 0 additions & 17 deletions 01-data-filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,17 @@ Copy cleanned data sets (tidy df) from the `SSRP_cleanining` repo into this repo

<div class="knitr-options" data-fig-width="576" data-fig-height="460"></div>


```r
tidy_dfs <- list.files(paste0(clean_path,"/processed/"))
file.copy(paste0(clean_path,"/processed/",tidy_dfs),
paste0("./processed/",tidy_dfs),
overwrite = TRUE)
```



```
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
```



### First: basic clean up
- load data
- rename the title of a reproduction
Expand All @@ -38,7 +33,6 @@ file.copy(paste0(clean_path,"/processed/",tidy_dfs),

<div class="knitr-options" data-fig-width="576" data-fig-height="460"></div>


```r
dis_df <- read_csv("./processed/tidy_di_df.csv") %>%
mutate(
Expand All @@ -54,13 +48,10 @@ dis_df <- read_csv("./processed/tidy_di_df.csv") %>%
filter(!is.na(repro_score))
```



- Same but to claims-level data

<div class="knitr-options" data-fig-width="576" data-fig-height="460"></div>


```r
claims_df <- read_csv("./processed/tidy_claim_df.csv") %>%
mutate(paper_title= ifelse(
Expand All @@ -75,13 +66,10 @@ claims_df <- read_csv("./processed/tidy_claim_df.csv") %>%
) # 98 reproductions (no abandoned)
```



- Same to abandoned papers

<div class="knitr-options" data-fig-width="576" data-fig-height="460"></div>


```r
abandoned_df <- read_csv("./processed/tidy_abandoned_df.csv") %>%
mutate(paper_title= ifelse(
Expand All @@ -96,15 +84,12 @@ claims_df <- read_csv("./processed/tidy_claim_df.csv") %>%
) # 104 reproductions total
```



# Make sure there's at least one scored Display Item

The rationale for this filter is that papers should have some sort of reproducibility score associated with them to be counted.

<div class="knitr-options" data-fig-width="576" data-fig-height="460"></div>


```r
has_score <- dis_df %>%
select(reproduction_id) %>%
Expand All @@ -116,8 +101,6 @@ claims_df <- claims_df %>%
```




# Future Filters

This section explains how to add a filter in the future. Additional features should have a title (e.g. "Make sure there's at least one estimate associated with each claim") followed by a description of the reason we want this filter.
Expand Down
Loading

0 comments on commit 98b5099

Please sign in to comment.