Releases: datacommonsorg/import
Releases · datacommonsorg/import
0.1-alpha.1k
Includes the following fixes:
- Support for CSVs from MS Excel (with BOM characters)
- Allows empty column names in CSV
- Propagate exceptions better so we don't fail silently when bad things happen
0.1-alpha.1j
Bug Fixes:
- Fixes exceptions thrown when a series has a mix of numeric and non-numeric values
0.1-alpha.1h
Highlights
- Added support for categorical variables (SVs with
statType: measurementResult
) - Performance optimizations with ~25% expected speed gains
Changelog
New checks and verifications
- Added support for categorical variables
- Any non-numeric StatVarObservation values must now be explicitly allowed with the
--allow-non-numeric-obs-values=true
flag. - Categorical variables (SVs with
statType: measurementResult
) can be checked for existence by specifying--check-measurement-result=true
. - Some common checks (inconsistent values, date gaps) apply to all time series, including categorical variables.
- Any non-numeric StatVarObservation values must now be explicitly allowed with the
Speed Optimizations
- Added a heuristic to date checking, vastly improving speed for the “correct-path”. Expected improvement is ~25%
Summary Report changes
- Added “expand/collapse all” buttons that work for all collapsible tags on the report
- Changed chart style to highlight data points and support datasets with many data points
Bug fixes
- Fixed issue where the line number of the last CSV row was incorrect
- Fixed issue where logs were duplicated when more than two values had the same date
- Fixed flaky ordering of output in some test goldens
0.1-alpha.1g
What’s new:
-
Improvements to speed when using the tool;
- Allow external IDs to be resolved using local side MCF, saving on the need to first get new external IDs updated in the reconciliation API which could take days before those IDs were verified by the tool.
- Optimized performance for an estimated ~10% raw speed boost.
-
Expanded checks to catch more issues and support additional data types;
- Existence checks for “observationAbout” references (behind a new flag
-ep
) - Expanded validation to recently introduced statTypes (confidence interval {upper, lower} limit, kurtosis, skewness, growth rate).
- Support schemaless SVs with init-cap mprop
- Existence checks for “observationAbout” references (behind a new flag
-
Added documentation for;
- Tool usage (docs/usage.md)
- Error counters (docs/counters.md)
- Complex Values (docs/complex_values.md)
-
Summary Report improvements;
- Added missing observationPeriods field
- Added table of contents
- Made tables sortable on-click
- Separated the display of time series facets
- Displayed human-readable names for places, taking priority over dcid
- Improved sample place heuristics
-
Bug fixes
- Fix issue where a time series with a single datapoint smaller than -1 would cause a fatal crash
- Fix order of census area code for resolution
0.1-alpha.1f
What's new:
- Fix HTTP exception in DC calls in Java 11.x version
- Fix runtime errors in chart generation
- Remove the requirement for StatVars to have a populationType
- Fix bug in percentile* statType validation
0.1-alpha.1e
This release includes:
- Support for generating an HTML Summary Report
- Enabled by default. To disable, pass -sr=false
- Upgrade log4j version to 2.16.0
- Minor bug fixes and updates
0.1-alpha.1d
This release includes:
- Support for Stat Checker
- Enabled by default. To disable, pass
-s=false
- Enabled by default. To disable, pass
- Support for Resolution (aka resolving local-refs and generating dcids for nodes)
- Defaults to local mode (
-r=LOCAL
), for use when you already reference place DCIDs. - To resolve external IDs to DCIDs for places, pass
-r=FULL
. This will make Recon API calls. - To disable resolution, pass
-r=NONE
- Defaults to local mode (
- Support for parallel processing of CSV files
- Parallel processing happens when there are multiple CSV files
- Defaults to no parallelism. Set
-n=<number-of-threads>
to increase parallelism
- More batching for existence checks
- This is enabled by default. To disable, pass
-e=false
- This is enabled by default. To disable, pass
- Changes in default output directory
- Default is now
dc_generated/
in the current directory. To change, set-o=<your-directory>
- Default is now
0.1-alpha.1c
Intermediate release with partial stat check and resolution support.
0.1-alpha.1b
This release includes:
- Existence checks for DCID references using DC Staging API
- End-to-end integration tests (refer to test-cases here
- Several bug fixes)
0.1-alpha.1
An early version of the import tool to get user feedback.