Skip to content

Latest commit

 

History

History
53 lines (48 loc) · 3.48 KB

2024.09.09-newsletter-vignette.md

File metadata and controls

53 lines (48 loc) · 3.48 KB

Network News: CoE activities, project milestones, outputs, or tools.

Network News: CoE activities, project milestones, outputs, or tools.

Only a little over one decade ago, there was a deadly outbreak of carbapenem-resistant K. pneumoniae at the U.S. National Institutes of Health Clinical Center despite early infection control procedures. A study published in the Science Translational Medicine journal used whole genome sequencing to investigate the outbreak’s causes and transmission routes. In 2012, integrating K. pneumoniae isolates genomic data with mined epidemiologic data allowed the research team to retroactively infer transmission routes for the 18 affected patients, 11 of whom died.

Modernized genomic surveillance systems of today like those being developed by the Georgia Tech Research Institute (GTRI), in partnership with the Georgia Department of Public Health (GDPH), are rapidly moving toward near real-time analysis that could save lives. Researchers and engineers at GTRI are investigating how cases within robust antimicrobial resistant (AMR) genomic analysis pipelines can be flagged rapidly using modernized data linkage methods and the automation of manual, human-in-the-loop workflows.

GTRI researchers and software engineers of the Center for Applied Pathogen Epidemiology and Outbreak Response (CAPE) team are developing infrastructure built on Amazon Web Services, which leverages a data lakehouse and stores raw and cleaned data of various types and sources. Using automated data transformation, data outputs are cleaned, different sources are integrated, then ported back to the data lakehouse for human analysis by epidemiologists and further automated analysis as appropriate. In addition to the data lakehouse, the team envisions including a compute environment for the execution of analysis pipelines. The lakehouse and compute environment would provide an end-to-end data-to-analysis pipeline framework. The CAPE infrastructure is being built with discoverability in mind, taking into consideration the varying needs of different public health professionals, and intended to support numerous known use cases as well as those yet to be revealed.

One use case of CAPE’s infrastructure is the Antimicrobial Resistance Log (AR Log), which was developed to integrate various inputs from the data lakehouse for optimization of public health workflows leading to faster decision-making. Some examples of the sources being integrated into the AR Log include data from: the Laboratory Information Management System (LIMS), Georgia’s State Electronic Notifiable Disease Surveillance System (SendSS), and Excel spreadsheets containing non-standard fields, emphasizing the AR Log’s ability to process diverse formats, including raw data, which are translated to a common, query-able format. The AR Log’s Healthcare-Acquired Infections (HAI) analysis pipeline can be further tested and refined using a vast repository of sample data with known AMR genes retrieved from the National Center for Biotechnology Information (NCBI)’s Sequence Read Archive (SRA). Looking ahead, with continued collaboration made possible through the PGCoE, GTRI researchers envision that further innovations such as flagging cluster cases that are genomically similar with spatial-temporal considerations could make vast improvements for the timeliness, accuracy, and effectiveness of public health response in epidemiologic investigations.