Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for text description of the data #69

Merged
merged 16 commits into from
May 16, 2024

Conversation

Aastedet
Copy link
Collaborator

@Aastedet Aastedet commented Mar 22, 2024

Just want some feedback on the current format before I fill out the remaining sections describing the raw data with similar content.

I suspect @lwjohnst86 will want to automate the data source descriptions at some point. Then we might have to add a column with the english register names to the variable description csv.

@Aastedet
Copy link
Collaborator Author

Relates to #35

…uture improvements.

Added description of raw data. National Patient Register only for now.
Moved changes section to #78
@Aastedet Aastedet changed the title First draft for text description of the algorithm First draft for text description of the data Apr 26, 2024
Anders Aasted Isaksen added 2 commits April 26, 2024 14:12
@Aastedet Aastedet marked this pull request as ready for review April 26, 2024 13:03
Copy link
Contributor

@signekb signekb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have added some minor suggestions/questions.
Do you expect to fill in the todo items in this PR as well, or is your plan to create new PRs for those? :)

Comment on lines 25 to 26
in future revisions. Refer to the other vignettes for background
information and a more general description of the algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add links to specific vignettes with relevant information here?


## Contents

This document describes the structure of the data components processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used the word "data sources". Do we prefer "data components"? Or "data objects" as in the title? I think it makes sense to align this, so the same words are used for the same things throughout the documentation :)

vignettes/algorithm_data.Rmd Outdated Show resolved Hide resolved
vignettes/algorithm_data.Rmd Outdated Show resolved Hide resolved
Comment on lines 44 to 47
In a future revision, the algorithm can also utilise the Danish Medical
Birth Register to extend the period of time of valid inclusions further
back in time compared to what is possible using obstetric codes from the
National Patient Register.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to #78 ?

Comment on lines 53 to 54
assumes that raw data is stored/structured in the most common format for
raw data provided on Statistics Denmark's servers (from our experience).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add which specific formats you are talking about here? :)

vignettes/algorithm_data.Rmd Outdated Show resolved Hide resolved
`d_inddto`/`dato_start`.

- Named `lpr_adm` in the LPR2-formatted data prior to 2019, and
`kontakter` in contact-based LPR3-formatted data from 2019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use the English abbreviation DNPR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add what the abbreviation comes from? I.e., Landspatientregisteret?
Or do we assume readers know this?

@lwjohnst86
Copy link
Member

@Aastedet and @signekb I've updated this vignette so the descriptions and fake data as tables are included automatically from the sources in data/

Comment on lines +9 to +10
rlang::check_installed("glue")
rlang::check_installed("knitr")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you use a function from a package that you set as "suggests" as a dependency, you need to include a check function to inform the user to install these packages if they are not installed. So that's what these do (when you use use_package("packagename", "suggests"), it tells you exactly what to do)


variable_description |>
dplyr::select(
.data$register_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned this already, but the .data$ is used to masked the variable so that CRAN doesn't warn of "undeclared variables". Since we declare this placeholder variable ".data" already.

the `dw_ek_kontakt` variable (LPR3 data).

```{r}
for (register in osdc:::get_register_abbrev()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I almost always suggest not to use for loops, this is one of those cases that you need to, since these functions are used to create Markdown text, and it doesn't really work in a "functional" way.

register,
caption = glue::glue("Variables and their descriptions within the `{register}` register.")
) |>
print()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to print because only the last thing in a for loop is output, but we want all these things to output.

```{r, include = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
results = "asis",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tells knitr to treat all the output as plain text rather than as code output text

Copy link
Collaborator Author

@Aastedet Aastedet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWESOME!

sources:

```{r, results='asis'}
osdc:::registers_as_md_table("Danish registers used in the OSDC algorithm.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triple-colons is because this is an internal function/object or what is going on here?
Also, a note to my future self: you need to build the package in order to access internal objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haah yes! Or use Ctrl-Shift-L to load the package. And yea, ::: accesses all internal objects in a package, neat trick!

@lwjohnst86 lwjohnst86 merged commit 7e2cf66 into main May 16, 2024
2 of 3 checks passed
@lwjohnst86 lwjohnst86 deleted the general-logic-description branch May 16, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants