This documents the changes from release to release.
This release updates pancreas common data elements as follows:
exocrine_clinical_M_AJCC_8
is now optionalexocrine_clinical_N_AJCC_8
is now optionalexocrine_clinical_T_AJCC_8
is now optionalexocrine_group_stage_AJCC_8
is now optionalexocrine_pathologic_M_AJCC_8
is now optionalexocrine_pathologic_N_AJCC_8
is now optionalexocrine_pathologic_T_AJCC_8
is now optionalfinal_path_duct_communication
is now a string and is optionalhistological_subtypes_ipmn
now has a new permissible value:Mixed
lesion_size
is now optionalneuroendocrine_clinical_M_AJCC_8
is now optionalneuroendocrine_clinical_N_AJCC_8
is now optionalneuroendocrine_clinical_T_AJCC_8
is now optionalneuroendocrine_group_stage
is now optionalneuroendocrine_pathologic_M_AJCC_8
is now optionalneuroendocrine_pathologic_N_AJCC_8
is now optionalneuroendocrine_pathologic_T_AJCC_8
is now optionalnumber_lesions
is now optionalpath_acc_num_diag_biopsy
is now optionalpath_management_recommendation
has a new permissible value:Pancreaticoduodenectomy
path_number_of_tumors
is now optionalpath_tumor_size_largest_lesion
is now a string and is optionaltumor_pathology_location
is now a string; the TumorPathologyLocation enumeration no longer exists
This release simply adds unknown
as a possible value to the Mode
enumeration, thus satisfying issue 21.
This release increases the limits on sizes in Biospecimen
for the following fields:
specimen_ID_local
25 → 200specimen_parent_ID
50 → 200
This release accommodates listings in the CDE changelog from 2022-03-15–2022-04-11, notably:
pancreaticoduodenectomy
is a new enumerated value inRulesOfAcquisition
.higpin
is a new enumerated value inPrecancers
.sarcoma
is a new enumerated value inLesion
.- In class
Biospecimen
, these fields are no longer enumerations or floats, but nullable strings:storage_method
section_thickness
shipping_destination
- In class
Genomics
,sequencing_platform
is now a nullable string.
The CDE changelog also mentions that location_extent_extraprostatic_extension
and seminal_vesicle_invasion
(although it misspells it "vessicle") have two new enumerated values, however in Sickbay these are simply strings, so there's no need to change anything.
This release contains some incompatible changes to accommodate CDE updates from 2022-01-20 through 2022-01-31 See the CDE changelog for details. However, given the moribund nature of the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions, these changes are simplified for the following reasons:
- The project has concluded and we must intake any form of outstanding data presented (as Kristen Anton said: "MCL is ended, we should take what data we can.")
- No one actually writes queries for the relational database or uses the query interface of the ORM.
- The user interface just wants plain text over JSON anyway.
Given that, this and all future changes no longer models 1-to-many relationships as traditional relational database structures of 1-table-row-to-multiple-other-table-rows. Instead, we just use plain text and expect pipe (|
) separated strings. This is what the input to this database is (pipe-separated cells in a spreadsheet) and what the UI expects, so the Herculean overhead of making the tables and ORM models to support this is ultimately fatuous.
The changes in this release include:
- The enumeration
GroupStage8
now has an additional value,unknown
. This is not currently reflected in the Lung CDEs at this time, but we're not concerned. - In the
LungOrgan
class:- The following values are now string types instead of enumerations:
primary_adenocarcinoma_differentiation_type
prior_treatment
- The
ajcc_8_lung_pathologic_m
has an error; the enumeration name was duplicated asmetastasis_enum
; it's nowmetastasis_enum8
. - The changelog for CDEs says that all 14 AJCC fields are now optional; but this was already the case for Sickbay since 1.1.0.
- The following values are now string types instead of enumerations:
- In the
ProstateOrgan
class, the following values are now string types instead of enumerations:location_dominant_nodule
location_secondary_nodule
location_extent_extraprostatic_extension
location_nature_positive_margins
summed_length_positive_margin
seminal_vesicle_invasion
This release contains some incompatible changes in order to accommodate CDE updates from 2021-08-26 through 2021-11-18. Please see the CDE changelog for highly pedantic details of these updates. The changes to the software include:
- On class
Organ
:histopathology_precancer_type
was a 1-to-many attribute ofLungOrgan
only; now it belongs to all organs as 1-to-many.- This base class now has the following optional attributes:
ajcc_clinical_m
ajcc_clinical_n
ajcc_clinical_t
ajcc_clinical_stage
ajcc_pathologic_m
ajcc_pathologic_n
ajcc_pathologic_t
ajcc_pathologic_stage
lymph_nodes_tested
lymph_node_location
- On the class
LungOrgan
:- There are numerous changes. For one, the
ajcc_staging_system_edition
indicates whether the entire record uses the AJCC Staging edition 7 or 8, and depending on this, it tells which set of attributes to use.- The attributes are:
ajcc_7_lung_clinical_m
ajcc_7_lung_clinical_n
ajcc_7_lung_clinical_t
ajcc_7_lung_disease_stage
ajcc_7_lung_pathologic_m
ajcc_7_lung_pathologic_n
ajcc_7_lung_pathologic_t
ajcc_8_lung_clinical_m
ajcc_8_lung_clinical_n
ajcc_8_lung_clinical_t
ajcc_8_lung_disease_stage
ajcc_8_lung_pathologic_m
ajcc_8_lung_pathologic_n
ajcc_8_lung_pathologic_t
- Note that all of these attributes are optional; this is because it's also possible that
ajcc_staging_system_edition
isunknown
ornot_reported
, in which case we can't enforce that a specific set of the above attributes are actually used.
- The attributes are:
- Lungs also have a new attribute:
lymph_nodes_positive
, an optional integer.
- There are numerous changes. For one, the
- On the class
ProstateOrgan
, these attributes have moved "up" into the superclassOrgan
:lymph_nodes_tested
lymph_node_location
ajcc_clinical_m
ajcc_clinical_n
ajcc_clinical_t
ajcc_clinical_stage
ajcc_pathologic_m
ajcc_pathologic_n
ajcc_pathologic_t
ajcc_pathologic_stage
- In class
Biospecimen
, these attributes were required and are now optional: -days_to_collection
-time_excision_to_processing
-days_to_storage
- The following enumerated types have changed:
TStage7
no longer includes the termst1c
ort1mi
ClinicalMStage7
has dropped the termsM1c
andpM1
- For
GroupStage7
, the following permissible values are no longer permissible:ia1
ia2
ia3
iva
ivb
Precancers
now includes anormal
kindFixatives
now supports anot_applicable
value- When it comes to
Storage
you now have two new optionsroom_temperature_then_refrigerated
frozen_at__20c
SlideCharges
has made these values impermissible:cm0
,cm1
,pm1
,pm1a
,pm1b
,pm1c
- We now finally have a blessed description for
Treatment
instead of the kind contrived by a mere software developer - At long last an expert has realized that
cannot_be_determine
should becannot_be_determined
inNecrosis
- The following new enumerations are ready for use:
ClinicalMStage8
with 8 valuesClinicalNStage8
with 7 valuesGroupStage8
with 17 valuesAJCCMetastasisStage8
with 8 values
- Removal of zc.buildout. We cannot recommend this tool less. Just use virtual environments like everyone else.
For issue https://github.com/EDRN/MCL-metadata/issues/22
- Additional permissible value on
sequencing_platform
(enumGenomicAnalyzer
), namelyillumina_hiseq_1500
. - Changed the
read_length
from numeric to a string (10) - Note that we do not have schema migrations set up so these steps must be run manually:
ALTER TABLE "genomics" ALTER "read_length" SET DATA TYPE CHARACTER VARYING(10)
ALTER TYPE "genomic_analyzier_enum" ADD VALUE 'illumina_hiseq_1500' AFTER 'illumina_genome_analyzer_iix'
- This version adds the human-readable label plus the token value to all enumerations over the JSON; see #16 for more information.
- A "more official" release.
For issue #1:
- On
ClinicalCore
:- The
race
attribute is now a 1-to-many mapping toCoreRace
viacore_races
- The
type_tobacco_used
is now a 1-to-many mapping toCoreTobacco
viacore_tobaccos
- The attribute
days_to_birth
is now required
- The
- On
Biospecimen
:- The enumeration for
Precancers
has a whole bunch of new permitted values
- The enumeration for
- On
BreastOrgan
:- The enumeration for
PrecancerousHistopathology
contains values for "unknown" and "data not available" - The enumeration for
BreastSite
now has anunknown
value - A new value
pending
is available forGeneticTestingAnswer
,TestResults
,EstrogenTestResults
- The enumeration
HER2Results
addspending
andunknown
values - The enumeration
BreastImagingWorkup
adds anunknown
value - The enumeration
BIRADSTissues
adds values for "unknown" and "data not available"
- The enumeration for
- New
LungOrgan
plus (bogus) test data for it - New
PancreasOrgan
plus (bogus) test data for it - Updated
ProstateOrgan
- Previously, this was just a placeholder to test multiple inheritance from the common
Organ
class in terms of both Python class hierachy and database hierarchy - Now it's completely filled out with the
v0
prostate common data elements with its numerous controlled vocabularies
- Previously, this was just a placeholder to test multiple inheritance from the common
- Expanded enumerations:
ClinicalMStage7
,TStage7
,ClinicalNStage7
,GroupStage7
,MarginalStatus
- New enumerations, far too many to enumerate 😏
For issue #4:
- All fields in
LabCASMetadata
are nowString
.
For issue #3:
inscribed_clinicalCore_participant_ID
is a new field onPriorLesion
,CoreRace
, andCoreTobacco
inscribed_biospecimen_identifier
is a new field onAdjacentSpecimen
For issue #5:
- The following updates diverge from the data dictionaries of the common data elements:
participant_ID
is now 50 characters (along with foreign keys andinscribed
fields), up from 14specimen_ID
is now 50 characters (along with foreign keys andinscribed
fields), up from 16
And finally, for issue #6 … we add unknown
to all enumerations that didn't have it already.
- Rename
inscribed_participant_ID
→inscribed_clinicalCore_participant_ID
- Rename
inscribed_specimen_ID
→inscribed_biospecimen_specimen_ID
- Addresses #2 by:
- Adding
inscribed_participant_ID
andinscribed_specimen_ID
toGenomics
- Adding
inscribed_participant_ID
andinscribed_specimen_ID
toImaging
- Adding
inscribed_participant_ID
toBiospecimen
- (It also adds some test data to these fields.)
- Adding
In this release:
- The
labcasFileURL
field is now justlabcasID
; everything else is the same except the name (and the semantics; it no longer is used to hold URLs) - The
Organ
class now has aninscribed_participant_ID
field you can use to note a future participant ID association with aClinicalCore
- All enumerations now use advanced enumerations for their base class.
- All enumerations now have a case-insensitive lookup.
The implications of that last bullet mean:
>>> from mcl.sickbay.model.enums import Race
>>> Race.black_or_african_american == Race('Black or African American')
True
>>> Race.black_or_african_american == Race['Black or African American']
True
>>> Race.black_or_african_american == Race['black or african american']
True
>>> Race('black or african american')
Traceback (most recent call last):
...
ValueError: 'black or african american' is not a valid Race
So if you want case-insensitive lookups, use brackets, not parentheses.
In this release:
- Base metadata for all classes now includes:
consortium
, a nullable string that can be used to contain an RDF URI to the consortium that originated the data, such ashttps://mcl.nci.nih.gov/
for the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions.protocolID
, a nullable integer that tells the research protocol that generated the data.
- Kristen's sample data (
--add-sample-data
) includes these consortium and protocol IDs
This release fixes:
- In
BreastOrgan
, the fieldher2_in_situ_hybridization
was the wrong enumerated type. It should've beenHER2InSituHybridization
. - In the enums, add the type
HER2InSituHybridization
. - Add test data from
12_78_BreastCore_20200625_0
. - Removed foreign key constraint from
Biospecimen.specimen_parent_ID
because the parent ID may be either another biospecimen or could be a participant (clinical core) object. - New class
AdjacentSpecimen
to work around circular dependency problem of having adjacent specimens directly onBiospeciment
. - New JSON serialization for
adjacent_specimens
onBiospecimen
- Misspelled enumeration
AnatomicalSite
:pancrease
→pancreas
- Change
create-demo-db
tocreate-clinical-db
since this is no longer a demo but the real deal - Transition from old style
setup.py
to everything insetup.cfg
In this release, 0.0.5, we also finally start keeping a changelog 😮