This file serves to collect thoughts on how to stimulate the reuse of open data, and on how to quantify such reuse. Reuse is to be understood here as non-primary use, i.e. use in a way that was not the primary purpose of collecting the respective data. This blog post considers some of the conditions that have to be met for reuse to happen, and the importance of generativity in the systems involved. Another blog post highlighted
Open tools, open media, and syndication
as key factors on how to achieve that. There are others, and the precise mixture probably depends much on context.
- How are research objects being used?
- What kinds of reuses are there?
- What are useful ways to track these kinds of reuses?
- How is reuse distributed over time, space, discipline, application sector, user populations, operating systems?
- How can reuse be encouraged?
- Google search
- Data reuse stories
- all examples given are from Scientific Data
- The (Re)usable Data Project
Examining the genomes within a sample from a wastewater treatment plant in Austria, Schulz et al. assembled a previously undiscovered giant virus genome, which they used to mine genetic databases for related viruses.
Here we report the discovery of a group of giant viruses (Klosneuviruses) in metagenomic data.
- From the abstract of Integrated Molecular Meta-Analysis of 1,000 Pediatric High-Grade and Diffuse Intrinsic Pontine Glioma (emphasis added):
-
We collated data from 157 unpublished cases of pediatric high-grade glioma and diffuse intrinsic pontine glioma and 20 publicly available datasets in an integrated analysis of >1,000 cases. We identified co-segregating mutations in histone-mutant subgroups including loss of FBXW7 in H3.3G34R/V, TOP3A rearrangements in H3.3K27M, and BCOR mutations in H3.1K27M. Histone wild-type subgroups are refined by the presence of key oncogenic events or methylation profiles more closely resembling lower-grade tumors. Genomic aberrations increase with age, highlighting the infant population as biologically and clinically distinct. Uncommon pathway dysregulation is seen in small subsets of tumors, further defining the molecular diversity of the disease, opening up avenues for biological study and providing a basis for functionally defined future treatment stratification.
-
- From the abstract of Novel antigenic shift in HA sequences of H1N1 viruses detected by big data analysis (emphasis added):
-
The influenza virus H1N1 has been prevalent all over the world for nearly a century. Many studies on its evolutionary history, substitution rate and antigenicity-associated sites have been done with small datasets. To have a complete view, we analysed 3171 full-length HA sequences from human H1N1 viruses sampled from 1918 to 2016, and discovered a new clade has formed with sequences isolated in Iran.
-
- https://twitter.com/blueraster/status/1168933523924889603
- intended for monitoring fires but can be used for monitoring hurricanes as well
PubMed Central (PMC) is a repository for scholarly literature in the biomedical field. Some of its content is available under terms that allow for Reusing, Revising, Remixing and Redistributing Research, e.g. to extract audio and video materials from these articles and upload them to Wikimedia Commons, as the Open Access Media Importer does.
The bot's activity has revealed a number of inconsistencies in the XML at PMC, since the XML standard in use at PMC (JATS) is by design not very prescriptive and leaves lots of room for interpretation.
This sparked the formation of the JATS for Reuse (JATS4R) Working Group that now elaborates recommendations on how best to tag articles in JATS, so as to facilitate reuse (overview).
- adopting JATS4R recommendations
- providing high-res images via API
- making the ingestion XSLT and schematrons public (and under an open license)
- probably useful for PMC partner repositories (think PMC International but also SciELO or NASA)
- useful to publishers
- useful for Wikimedia
- search by license via the OA webservices API
- more fine-grained search, e.g. for supplementary video or audio files
- help standardize the implementation of data citation as per JATS 1.1d2 (cf. JATS-Con paper)
- JATS is one of the suggested metadata formats in NISO's Protocol for Exchanging Serial Content (PESC)
- Page views of medical pages on the English Wikipedia
- GLAMorous
- BaGLAMa
- Cite-o-Meter
- Cocytus
- Total DOIs cited from the English Wikipedia
- OPENi is a searchable collection of images from PubMed Central and other sources
- ImageJ is a widely used software package for visualizing and analyzing biomedical data.
- UKSG 2015 Mechanical curator and British Library labs
- contains overview of use and reuse of the collection
- "A metadata specification for the social and behavioral sciences"
- "an effort to create an international standard in XML for metadata describing social science data"
- Reusing Te Papa’s collections images, by the numbers
- qualitative and quantitative indications of reuse, e.g.
- “Uploading to wikimedia and wikipedia article” (no known copyright restrictions)
- “I knit, and would love to make this into a knitting pattern…” (item is NC-ND-licensed)
- GitHub repo
- qualitative and quantitative indications of reuse, e.g.
- Wikimedia Commons as a media source
- Fig. 5 of Three-dimensional Magnetic Resonance Imaging of fossils across taxa (CC BY) is reused in final figure of NMR Studies of Fossilized Wood (paywalled)
- File:Ernst-Abbe-Denkmal Jena Fürstengraben - 20140802 125709.jpg (CC0) reused in Cell imaging: Beyond the limits (paywalled)
- Deletion discussion of a video due to deletions of images that had been used in the video
- Pop Culture Pulsar: Origin Story of Joy Division’s Unknown Pleasures Album Cover
- If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology
- Public Domain Status for Publicly Funded Works in the EU
- NIH 3D Print Exchange
- reuse biographic info from ORCID to start an NIH Biosketch
- PMC Europe has entity recognition tools and uses them to mine its corpus for identifiers from around 20 databases
- Atul Butte's work
- Drug repositioning
- “Open” disclosure of innovations, incentives and follow-on reuse: Theory on processes of cumulative innovation and a field experiment in computational biology
- Library usage records as a way to track usage
- "using text mining to track how the museum's 80-million-specimen collection is used in research papers'
- NIH Data Sharing Repositories — "This table lists NIH-supported data repositories that make data accessible for reuse."
- Think about tracking reuse of software libraries
- Public Domain Protection: Uses and Reuses of Public Domain Works
- On the formalization and reuse of scientific research
- Open Data and Data Re-use
- 10 Simple rules for design, provision, and reuse of persistent identifiers for life science data
- open data as a form of OER
- Planning for Data Reuse Checklist
- A century of trends in adult human height
- reuses 1472 datasets
- SciDataCon 2016 session Data fitness: What are the processes and components to assess and communicate re-usability of data?
- Robotics in general
- Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample
- 407 publications used GBIF data in 2015 (as per the GBIF Science Review 2016)
- http://plos.io/allofplos (3.9GB zip)
- Raw diffraction data preservation and reuse: overview, update on practicalities and metadata requirements
- Measurement of the Earth's rotation: 720 BC to AD 2015
- reuses astronomical data recorded on Babylonian clay tables
- 1970s and ‘Patient 0’ HIV-1 genomes illuminate early HIV/AIDS history in North America
- reused blood samples to sequence patient genomes to analyze the spread of HIV
- Reilly Center's annual Top 10 List of Ethical Dilemmas and Policy Issues in Science and Technology
- The Sharing Experimental Animal Resources, Coordinating Holdings (SEARCH) Framework: Encouraging Reduction, Replacement, and Refinement in Animal Research
- reCAPTCHA
- Why Are Scientific Data Rarely Reused?
- Australian National Data Service: Data Impact Book
- UK DataService: Case Studies
- Drug repurposing/drug repositioning
- 2016 dataset usage stats at Dryad
- "This is what #OpenScience is all about. A team in France used our data to build The Virtual Mouse Brain. Great work!"
- EU action Promoting sharing and reuse of IT solutions
- Sharing and Reuse Awards — for shared IT solutions in public administration
- How is Open Data being re-used in Europe?
- Communicating Use and Reuse in the Digital Collection Interface — on reuse policies at various GLAM institutions
- University of Oklahoma Astrophysicists Discover Extragalactic Planets for First Time — based on open data from NASA's Chandra X-ray Observatory
- Crowdsourcing 600 Years of Human History — uses family-level genealogy data to analyze population-level history
- Reusing photos from INaturalist to study goat molt
- Global Infections by the Numbers — journalistic infographic reusing data from various sources
- Data use Ontology
- Reanalyzing environmental lidar data for archaeology: Mesoamerican applications and implications