-
Notifications
You must be signed in to change notification settings - Fork 8
Project Goals
The overall scientific goal is to generate high-quality single-cell mRNA sequencing data representing the full expression profiles of all major and minor cell types from 25 organs across eight healthy adult individuals, and to make the data broadly available to the scientific community through the Human Cell Atlas Data Coordination Platform (DCP). The data and analysis will be a first-draft benchmark HCA reference set establishing the composition and variance of cells types within and among normal individuals. This will serve as a benchmark for later comparisons to disease states as well as normal developing and aging tissues.
Our first aim is to create a high-quality two-million cell gene expression reference data set representing all major and minor human cell types across 25 organs from eight individuals. We will use the Stanford Biobank and Rapid Autopsy program to obtain four female and four male individuals as age and ethnically diverse as possible, although we will not have precise control over timing, age, sex, or race. We have previously developed techniques to preserve tissues that are compatible with single cell transcriptome analysis.12 Data generation capacity will be built assuming samples become available at the rate of one every two months. To serve as a benchmark the data must provide a comprehensive representation of cell types within tissues, between tissue types and between organs of different individuals. To this end, the two million profiled cells will be distributed evenly at ~250,000 cells per individual. This will yield ~80,000 total cells for each of the 25 organs, allowing full characterization of cellular composition including rare cell types down to ~0.01% abundance. Cell preparations from each organ will be surface-stained for tissue compartment specific antigens (epithelial, endothelial, stromal, immune, neural/glial) and then FACS- or MACS-sorted prior to cDNA generation to ensure an optimal balance of cells profiled across tissue compartments, as we did for human lung. Additional FACS enrichment markers chosen by tissue experts will be used where necessary to reveal the full cellular heterogeneity in an organ. mRNA profiling will be done by a mix of droplet-based and plate-based cDNA generation. Droplet methods generate far greater numbers of cells allowing identification of rare cell types and overall composition of cell types. Plate-based methods detect more genes per individual cell and provide full length mRNA transcript information, allowing one to see lower expressed genes such as transcription factor and many disease genes, and can reveal common SNPs and alternative splicing. We will employ both methods on all 25 tissues at a typical ratio of 90% droplet to 10% plate-seq analyzed cells; this ratio was chosen to balance cost, effort, and scientific value of the data generated. The sample processing workflow will be modeled on the Tabula Muris and Tabula Microcebus projects. All 25 organs for each individual will be processed in a single day in a fixed order and timeline to minimize individual-to-individual technical differences. Upon receipt, organs will be perfused, dissected and disseminated to labs with tissue-specific expertise to be disassociated into cell suspensions and stained for compartment-specific surface markers using protocols validated in the pilot experiments described above. Each lab will deliver their stained, viable cell suspension to the Stanford CZ Biohub site to be divided and (1) sorted into pre-made 384 or 96 well cell lysis plates using the Sony HS800 FACS instrument, and (2) processed into cDNA on the 10X Genomics GemCode Single-Cell Instrument. The resulting lysed cells in plates and the 10X cDNA libraries will be frozen and sent for processing in a central automated sequencing pipeline at the CZ BioHub Mission Bay site using the existing infrastructure.
Our second aim is to rapidly map sequence reads to genes, cluster genes of common expression patterns, and annotate the cell types indicated by these clusters for all 25 tissue- types using a consistent, accepted ontology. This will be aided by the comprehensive lists for each human organ of all known cell types and their relative abundance, compiled from the literature by each of the human organ specific experts, as we did for the human lung. We propose to sequence 25 organs selected from among those sequenced in Tabula Muris and Tabula Microcebus (Figure 3), prioritizing organs of greatest interest to human health. The comparison to mouse and mouse lemur is of scientific interest and will aid annotation and quality control. To rapidly identify cell-types we will leverage a data processing pipeline developed for Tabula Muris to produce unbiased cell clusters to be annotated by tissue experts based on abundance of known marker genes and molecular homology to mouse and mouse lemur cell types. The annotation pipeline is a prescribed set of steps within the Seurat package which separates cells from each organ into their major compartments, finds the most variable genes, performs a principle component analysis (PCA) on the variable genes for dimension reduction, and then clusters the cells using a nearest-neighbor graph within PCA space. The defined nature of the pipeline will allow organ-specific annotation experts to iteratively cluster and sub-cluster cells to find known and novel populations while maintaining the structure needed for the annotation process to work consistently enough to be reassembled and appropriate quality control applied. Cell type naming will be made consistent through use of a controlled Cell Ontology Vocabulary to facilitate comparisons with Tabula Muris, Tabula Microcebus and outside data sets including the commonly accepted cell type names and synonyms used in the field. The histology and locations in the organ of any newly identified cell types or subtypes (15 in our pilot with human lung) will be defined using the multiplex single molecule in-situ hybridization approach established for human lung.
Our third aim is to deliver the data to the wider HCA and scientific communities as rapidly as possible consistent with growing the HCA infrastructure and with the needs of the broader scientific and especially medical community to access the data. To this end, consistent with our past practice we will release a preprint of the manuscript and all raw data in a form accessible to the entire scientific community on the same day that the paper is submitted for peer review. We will also provide access to the data for non-experts via a simple web interface similar to the browser we developed for Tabula Muris (www.tabula-muris.ds.biohub.org). This project will use the existing Tabula Muris framework to be consistent and forward compatible with the growing HCA infrastructure. Being part of the Seed Network and having CZBiohub as part of the project ensures this. In terms of cell numbers, the project is a ten-fold scale-up from Tabula Muris and Tabula Microcebus. Resources will be dedicated to ensuring adequate storage and power for users to rapidly query the atlas.
Having all tissues from a single donor enables unprecedented experimental control over variation due to age, environmental exposure, epigenetic effects, and genetic background within the data. Repeating this experiment with multiple donors will enable us to gain an initial understanding of biological variation due to age, gender and ethnicity. Our consortium brings a unique expertise in performing large scale, multi-organ studies from the same individual donor. We have performed a 20 organ study on mouse (the Tabula Muris) and a 30 organ study on the mouse lemur (the Tabula Microcebus); these enabled us to assemble a diverse set of teams with the coordination required to maximize the scientific contribution of each donor. Our teams have the experience of working together on a tight schedule and with a timed protocol that optimizes the order of tissue preparation and harvesting. We have also established well defined procedures and tools to perform consistent data analysis across all of the tissues. We will share these protocols and best practices with other Seed Networks. Our consortium has access to the resources of the CZ Biohub, which includes their Genomics Platform to perform the library generation and sequencing, and their Data Science Platform to help coordinate the data analysis and to develop further tools for browsing the data. The Genomics platform includes sufficient robot and sequencing infrastructure to perform the work proposed here without the need for further capital equipment. The Data Science team already has the pipeline in place for processing and analyzing data based on our previous whole organism projects, both of which were done in close collaboration with the CZ Biohub. We will also share these resources with the other Seed Networks. The major resource we will develop in this project is the actual data on human cell types, and this will be shared not only with the other Seed Networks but also with the entire scientific community. We will also develop software for analyzing and annotation the tissues, all of which will also be shared broadly. Finally, we will develop a browser to enable non-expert access to the data across the scientific and medical communities.
The Tabula Sapiens Consortium