- What is the
browseMetadata
package? - Getting started with
browseMetadata
- Using a custom metadata input
- Using a custom domain list input
- Using a custom lookup table input
- Tips and future steps
- License
- Citation
- Contributing
- Acknowledgements
The browseMetadata
package allows researchers to explore publicly available metadata from the Health Data Research Gateway and the connected Metadata Catalogue. This tool helps researchers plan projects by interacting with metadata prior to gaining full access to health datasets. Learn more about health metadata here.
At the early stages of a project, researchers can use this tool to browse datasets and categorise variables.
What datasets are available? Which datasets fit my research?
The tool summarises datasets and their tables, and displays how many variables within each table have descriptions.
Which variables align with my research domains?
(e.g. socioeconomic, childhood adverse events, diagnoses, culture and community)
After browsing, users can categorise each variable into predefined research domains. To speed up this manual process, the function automatically categorises frequently used variables (e.g. ID, Sex, Age). The function also accounts for variables that appear across multiple tables and allows users to copy their categorisations to ensure consistency. The output files can be used in later analyses to filter and visualise variables by category.
Run in the R console:
install.packages("devtools")
devtools::install_github("aim-rsf/browseMetadata")
Load the library:
library(browseMetadata)
Set your working directory to an empty folder:
setwd("/Users/your-username/test-browseMetadata")
Fo a longer more detailed demo, see the Getting Started page on the package website.
There are four main functions you can interact with: browseMetadata()
, mapMetadata()
, mapMetadata_compare_outputs()
, and mapMetadata_convert_outputs()
. For more information on any function, type ?function_name
. For example: ?browseMetadata
.
This function is easy to run and doesn't require user interaction. Run it in demo mode using the demo JSON file located in the inst/inputs directory:
browseMetadata()
Upon success, you should see:
ℹ Three outputs have been saved to your output directory.
ℹ Open the two HTML files in your browser for full-screen viewing.
The output files are saved to your working directory. You can change the save location by adjusting the output_dir
argument. Examples of outputs are available in inst/outputs.
Use the outputs from browseMetadata()
as a reference when running mapMetadata()
.
To run the mapping function in demo mode, use:
mapMetadata()
In demo mode, the function processes only the first 20 variables from selected tables. Follow the on-screen instructions, and categorise variables into research domains, using the Plot tab as your reference. The demo will simplify domains for ease of use; in a real scenario, you can define more specific domains.
Upon completion, your categorisations, session log, and a summary plot will be saved in your output directory.
You can run mapMetadata()
and browseMetadata()
using a custom JSON file instead of the demo input:
new_json_file <- "path/your_new_json.json"
demo_domains_file <- system.file("inputs/domain_list_demo.csv", package = "browseMetadata")
browseMetadata(json_file = new_json_file)
mapMetadata(json_file = new_json_file, domain_file = demo_domains_file)
Currently, the recommended way of retrieving these metadata JSON files is to download them from Metadata Catalogue. Navigate to the Data Model page of interest and use the drop down button to select the JSON format to download.
You can replace the default demo domains with research-specific domains. Remember any domain file input will have Codes 0,1,2 and 3 automatically appended to the start of the domain list, so do not include these in your domain list.
The lookup table governs the automatic categorisations. If you modify the default lookup file, ensure that all domain codes in the lookup file are also included in your domain file for valid outputs.
- You can process a subset of variables in one session and complete the rest later.
- If you're processing multiple tables, save all outputs in the same directory to enable table copying. This feature will speed up categorisation and ensure consistency.
- You can compare categorisations across researchers using the
mapMetadata_compare_outputs()
function. - Use the output file from the
mapMetadata()
function as input for subsequent analysis to filter and visualise variables by research domain.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
For more information, refer to GNU General Public License.
To cite browseMetadata
in publications:
Stickland R (2024). browseMetadata: Browse and categorise metadata for datasets. R package version 1.2.2.
A BibTeX entry for LaTeX users:
@Manual{,
title = {browseMetadata: Browse and categorise health metadata},
author = {Rachael Stickland},
year = {2024},
note = {R package version 1.2.2},
doi = {https://doi.org/10.5281/zenodo.10581499},
}
We welcome contributions to browseMetadata
. Please read our Contribution Guidelines for details on how to contribute.
- Report Issues: Found a bug? Have a feature request? Report it on GitHub Issues.
- Submit Pull Requests: Follow our Contribution Guidelines for pull requests.
- Feedback: Share your thoughts by opening an issue.
Thanks go to these wonderful people (emoji key):
Rachael Stickland 🖋 📖 🚧 🤔 📆 👀 |
Batool Almarzouq 📓 👀 🤔 📆 |
Mahwish Mohammad 📓 👀 🤔 |
Daniel Delbarre 🤔 📓 |
NidaZiaS 🤔 |
This project follows the all-contributors specification. Contributions of any kind are welcome!
Thanks to the MELD-B research project, the SAIL Databank team, and the Health Data Research Innovation Gateway for ideas, feedback, and hosting open metadata.
This project is funded by the NIHR [Artificial Intelligence for Multiple Long-Term Conditions (AIM) programme (NIHR202647). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.