Skip to content

An R package to help researchers explore publicly available metadata from health datasets, categorising variables into research domains

License

Notifications You must be signed in to change notification settings

aim-rsf/browseMetadata

Repository files navigation

browseMetadata

browseMetadata website

All Contributors
Project Status: Active – The project has reached a stable, usable state and is being actively developed.
DOI

Table of Contents

  1. What is the browseMetadata package?
  2. Getting started with browseMetadata
  3. Using a custom metadata input
  4. Using a custom domain list input
  5. Using a custom lookup table input
  6. Tips and future steps
  7. License
  8. Citation
  9. Contributing
  10. Acknowledgements

What is the browseMetadata package?

The browseMetadata package allows researchers to explore publicly available metadata from the Health Data Research Gateway and the connected Metadata Catalogue. This tool helps researchers plan projects by interacting with metadata prior to gaining full access to health datasets. Learn more about health metadata here.

At the early stages of a project, researchers can use this tool to browse datasets and categorise variables.

Browse metadata

What datasets are available? Which datasets fit my research?

The tool summarises datasets and their tables, and displays how many variables within each table have descriptions.

example bar plot showing number of variables for each table alongside counts of whether variables have missing descriptions

Map metadata

Which variables align with my research domains?
(e.g. socioeconomic, childhood adverse events, diagnoses, culture and community)

After browsing, users can categorise each variable into predefined research domains. To speed up this manual process, the function automatically categorises frequently used variables (e.g. ID, Sex, Age). The function also accounts for variables that appear across multiple tables and allows users to copy their categorisations to ensure consistency. The output files can be used in later analyses to filter and visualise variables by category.

Getting started with browseMetadata

Installation and set-up

Run in the R console:

install.packages("devtools")
devtools::install_github("aim-rsf/browseMetadata")

Load the library:

library(browseMetadata)

Set your working directory to an empty folder:

setwd("/Users/your-username/test-browseMetadata")

Demo (using the R Studio IDE)

Fo a longer more detailed demo, see the Getting Started page on the package website.

There are four main functions you can interact with: browseMetadata(), mapMetadata(), mapMetadata_compare_outputs(), and mapMetadata_convert_outputs(). For more information on any function, type ?function_name. For example: ?browseMetadata.

browseMetadata()

This function is easy to run and doesn't require user interaction. Run it in demo mode using the demo JSON file located in the inst/inputs directory:

browseMetadata()

Upon success, you should see:

ℹ Three outputs have been saved to your output directory.
ℹ Open the two HTML files in your browser for full-screen viewing.

The output files are saved to your working directory. You can change the save location by adjusting the output_dir argument. Examples of outputs are available in inst/outputs.

mapMetadata()

Use the outputs from browseMetadata() as a reference when running mapMetadata().

To run the mapping function in demo mode, use:

mapMetadata()

In demo mode, the function processes only the first 20 variables from selected tables. Follow the on-screen instructions, and categorise variables into research domains, using the Plot tab as your reference. The demo will simplify domains for ease of use; in a real scenario, you can define more specific domains.

Upon completion, your categorisations, session log, and a summary plot will be saved in your output directory.

Using a custom metadata input (recommended)

You can run mapMetadata() and browseMetadata() using a custom JSON file instead of the demo input:

new_json_file <- "path/your_new_json.json"
demo_domains_file <- system.file("inputs/domain_list_demo.csv", package = "browseMetadata")

browseMetadata(json_file = new_json_file)
mapMetadata(json_file = new_json_file, domain_file = demo_domains_file)

Currently, the recommended way of retrieving these metadata JSON files is to download them from Metadata Catalogue. Navigate to the Data Model page of interest and use the drop down button to select the JSON format to download.

Using a custom domain list input (recommended)

You can replace the default demo domains with research-specific domains. Remember any domain file input will have Codes 0,1,2 and 3 automatically appended to the start of the domain list, so do not include these in your domain list.

Using a custom lookup table input (advanced)

The lookup table governs the automatic categorisations. If you modify the default lookup file, ensure that all domain codes in the lookup file are also included in your domain file for valid outputs.

Tips and future steps

  • You can process a subset of variables in one session and complete the rest later.
  • If you're processing multiple tables, save all outputs in the same directory to enable table copying. This feature will speed up categorisation and ensure consistency.
  • You can compare categorisations across researchers using the mapMetadata_compare_outputs() function.
  • Use the output file from the mapMetadata() function as input for subsequent analysis to filter and visualise variables by research domain.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
For more information, refer to GNU General Public License.

Citation

To cite browseMetadata in publications:

Stickland R (2024). browseMetadata: Browse and categorise metadata for datasets. R package version 1.2.2.

A BibTeX entry for LaTeX users:

  @Manual{,
    title = {browseMetadata: Browse and categorise health metadata},
    author = {Rachael Stickland},
    year = {2024},
    note = {R package version 1.2.2},
    doi = {https://doi.org/10.5281/zenodo.10581499}, 
  }

Contributing

We welcome contributions to browseMetadata. Please read our Contribution Guidelines for details on how to contribute.

  • Report Issues: Found a bug? Have a feature request? Report it on GitHub Issues.
  • Submit Pull Requests: Follow our Contribution Guidelines for pull requests.
  • Feedback: Share your thoughts by opening an issue.

Contributors ✨

Thanks go to these wonderful people (emoji key):

Rachael Stickland
Rachael Stickland

🖋 📖 🚧 🤔 📆 👀
Batool Almarzouq
Batool Almarzouq

📓 👀 🤔 📆
Mahwish Mohammad
Mahwish Mohammad

📓 👀 🤔
Daniel Delbarre
Daniel Delbarre

🤔 📓
NidaZiaS
NidaZiaS

🤔

This project follows the all-contributors specification. Contributions of any kind are welcome!

Acknowledgements ✨

Thanks to the MELD-B research project, the SAIL Databank team, and the Health Data Research Innovation Gateway for ideas, feedback, and hosting open metadata.

This project is funded by the NIHR [Artificial Intelligence for Multiple Long-Term Conditions (AIM) programme (NIHR202647). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

About

An R package to help researchers explore publicly available metadata from health datasets, categorising variables into research domains

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages