Skip to content

v3.1.0

Compare
Choose a tag to compare
@RobinL RobinL released this 03 Aug 15:44
· 6961 commits to master since this release

What's Changed

Warning
In version 3.1.0 there's a small API change to the SparkLinker that’s backwards incompatible. i.e. it’s a minor violation of semver

The changes affect the SparkLinker only:

  • The default break_lineage_method will change to parquet
  • The break_lineage_after_blocking param is renamed to repartition_after_blocking for clarity

Features

  • Add the ability to use pyarrow + on on disk parquet/csv in duckdb by @ThomasHepworth in #684
  • Add completeness (by dataset) chart by @samnlindsay in #669
  • Add cumulative blocking rule comparison chart by @ThomasHepworth in #660
  • Allow find_matches_to_new_records to take table name as input, in addition to rows by @RobinL in #659

Bugfixes

Maintenance

  • [MAINT] Clarify sql execution function names by @RobinL in #690
  • [MAINT] Clarify Spark Linker caching logic by @RobinL in #691
  • [MAINT] Bump version to 3.1.0 by @RobinL in #693
  • Fix code formatting on count_num_comparisons_from_blocking_rules_for_prediction by @RobinL in #661
  • Add salting to spark full test by @RobinL in #655

Docs

  • Improve customising comparisons topic guide by @RobinL in #667
  • [DOCS] Performance topic guide, covering blocking by @RobinL in #675
  • [docs] Add issue template for bug report by @RobinL in #676
  • [DOCS] Add topic guide for optimising spark jobs by @RobinL in #679
  • [DOCS] Fix problem with spark docs copy by @RobinL in #685
  • [Docs] Developers' guide to caching and pipelining by @RobinL in #686
  • [Docs] Developer guide: Understanding and debugging Splink's computations by @RobinL in #688
  • [DOCS] Developers' guide to spark caching and pipelining by @RobinL in #689

Full Changelog: v3.0.1...v3.1.0