v3.1.0

RobinL released this 03 Aug 15:44

· 6961 commits to master since this release

eb37154

What's Changed

Warning
In version 3.1.0 there's a small API change to the SparkLinker that’s backwards incompatible. i.e. it’s a minor violation of semver

The changes affect the SparkLinker only:

The default break_lineage_method will change to parquet
The break_lineage_after_blocking param is renamed to repartition_after_blocking for clarity

Features

Add the ability to use pyarrow + on on disk parquet/csv in duckdb by @ThomasHepworth in #684
Add completeness (by dataset) chart by @samnlindsay in #669
Add cumulative blocking rule comparison chart by @ThomasHepworth in #660
Allow find_matches_to_new_records to take table name as input, in addition to rows by @RobinL in #659

Bugfixes

remove duplicate column selections by @ThomasHepworth in #681
fix em training tooltip by @ThomasHepworth in #665

Maintenance

[MAINT] Clarify sql execution function names by @RobinL in #690
[MAINT] Clarify Spark Linker caching logic by @RobinL in #691
[MAINT] Bump version to 3.1.0 by @RobinL in #693
Fix code formatting on count_num_comparisons_from_blocking_rules_for_prediction by @RobinL in #661
Add salting to spark full test by @RobinL in #655

Docs

Improve customising comparisons topic guide by @RobinL in #667
[DOCS] Performance topic guide, covering blocking by @RobinL in #675
[docs] Add issue template for bug report by @RobinL in #676
[DOCS] Add topic guide for optimising spark jobs by @RobinL in #679
[DOCS] Fix problem with spark docs copy by @RobinL in #685
[Docs] Developers' guide to caching and pipelining by @RobinL in #686
[Docs] Developer guide: Understanding and debugging Splink's computations by @RobinL in #688
[DOCS] Developers' guide to spark caching and pipelining by @RobinL in #689

Full Changelog: v3.0.1...v3.1.0

Contributors

RobinL, samnlindsay, and ThomasHepworth

Assets 2