Skip to content
This repository has been archived by the owner on May 3, 2023. It is now read-only.

Latest commit

 

History

History
473 lines (390 loc) · 43.8 KB

CHANGELOG.md

File metadata and controls

473 lines (390 loc) · 43.8 KB

Changelog

v0.23.0 (2023-04-26)

Feature

  • Add logging and choose sfi types (d5f8e23)
  • Create example scripts (76e063a)
  • Initial text model pipelines (1934db0)
  • Add tests (d7a8bab)
  • Initial simple preprocessing pipeline for all sfis (f941a4d)
  • Add include_sfi_name in load_text_split (4605c88)
  • Include_sfi_name arg (58baf9a)
  • Fit and load tfidf, bow, and lda models (3d33d9b)

Fix

  • Preprocess to one regex (c716653)
  • Remove symbols again (1210b7e)
  • Based on HLasses comments (32da48f)
  • Insert model type in filename (1457387)
  • Add doc strings to preprocessing functions (4e27650)
  • Remove log.info and small fixes (84f3cc3)
  • Ruff fixes (ea9c564)
  • Return vectorizer and matrix + clean-up (e1c48a0)
  • Query string (cb7424c)
  • Naming and doc string update (141e52a)
  • General clean-up and change corpus in fit functions to list (22b6a9e)
  • Change ngram default and clean-up (387f845)
  • Small fixes to logging (c3a3f53)
  • Remove old comments (4b88514)
  • Change view name (a9bb0fc)
  • Move save_text_model_to_dir to utils (469df3b)
  • Move save_text_model_to_dir to utils (26a80d2)
  • Renaming in preprocessing (c381768)
  • Remove stop_words arg and return models (3d29012)
  • Change arg path to path_str (f781a74)
  • Enable multiple splits when loading data + add n_rows arg (8ae2d2e)
  • Remove Path from arg (29b442b)

v0.22.0 (2023-04-24)

Feature

  • Add feature descriptions for text features (84c696a)

Documentation

v0.21.4 (2023-04-04)

Fix

  • Remove unreasonably high or low bmi values (07f52c2)

v0.21.3 (2023-04-03)

Fix

  • Make sql query executable (e006490)
  • Str turned into list of characters instead of list of words (0fae478)

v0.21.2 (2023-03-27)

Fix

  • Add unpack args to skema 2 wo nutrition (95c35c8)

v0.21.1 (2023-03-22)

Fix

  • Only keep weights above 0.5 kg (8a5a104)
  • Do not load invalid weights (7be4653)

v0.21.0 (2023-03-22)

Feature

  • Support new pipe annotation (a1bde17)

Fix

v0.20.3 (2023-03-14)

Fix

  • Set unpack_to_intervals to default (64391ca)
  • Remove unintended space (9c6cd33)

v0.20.2 (2023-03-14)

Fix

  • Add skema_2_without_nutrition again (685c5cb)

v0.20.1 (2023-03-11)

Fix

  • Cruft github action (c8f6278)
  • Bug in cruft action (ec8267a)
  • Remove psycop-ml-utils, no longer exists (d8fbb65)

v0.20.0 (2023-03-09)

Feature

  • Add more glc loaders (b765e77)
  • Add type 1 diabetes loaders (b682984)
  • Make sql loader verbose (602f4f3)
  • Add caching to sql_load (a68c15d)
  • Ibid (46da732)
  • Add support for keeping code col when loading diagnoses (51ca63e)
  • Add t2d diagnosis loading (6b8231c)
  • Add ogtt (f6c07a9)
  • Update current blood sugar measurements (5e8051a)

Fix

  • Lacking prefix on loading glc (d9bdbcb)
  • Inappropriate matching (e2409ed)
  • Poetry formatted dependencies (125500a)

v0.19.2 (2023-03-06)

Fix

v0.19.1 (2023-03-06)

Fix

  • Drop rows with NaT (5a1d908)
  • Round timestamps to whole seconds befor droppig duplicates (e503bf3)

v0.19.0 (2023-03-03)

Feature

  • Add option for which timestamp to get when loading physical visits (ef369b8)

Fix

  • Drop duplicates in the output_df (636cc48)
  • Don't load duplicate visits (5028b1d)
  • Physical visits should only load physical visits (b7c50cf)
  • Did not rename to timestamp before returning (f43522c)

v0.18.4 (2023-02-22)

Fix

  • Loader names still too long (3321b88)

v0.18.3 (2023-02-22)

Fix

  • Loader names too long for wandb (cc14da2)

v0.18.2 (2023-02-21)

Fix

v0.18.1 (2023-02-15)

Fix

  • Adjust function for saving integrity checks (de2577e)
  • Restructure overarching description func (54c24a2)

Documentation

  • Better function description (7eb9e54)

v0.18.0 (2023-02-14)

Feature

  • Add arg for choosing timestamp and add warning (159a176)

v0.17.2 (2023-02-13)

Fix

  • Make naming scheme consistent (c125b48)
  • Attempted rename of unspecified df (c266bd8)
  • Revert logic (ad110ee)
  • Quarantine_df and quarantine_days can be left as None (f130370)

v0.17.1 (2023-02-10)

Fix

  • Allowed types works again (dbe75ca)
  • All arg names now congruent, visit_types takes a list of visit types instead of string (e63e9d4)

v0.17.0 (2023-02-09)

Feature

v0.16.1 (2023-01-31)

Fix

  • Use acute outpatient visits as well (659af23)
  • Typo, and use newest data (bbbc8f5)
  • Use end dates for all contacts (d8940c1)
  • Use end times for all diagnosis loading (4d9e600)

v0.16.0 (2023-01-27)

Feature

  • Remove try/except to avoid debugger getting stuck on it (3884ab8)

Fix

  • Move all str operations into the if statement (91f9174)

v0.15.0 (2022-12-19)

Feature

  • Move logs next to their dataset (e0ed033)

Documentation

  • Improve quarantine docs (1b23f19)

v0.14.0 (2022-12-16)

Feature

  • Name wandb project_name-feature-generation (b601d80)

v0.13.0 (2022-12-16)

Feature

  • Improve logging in flatten_dataset (63f252f)
  • Enable minimum specificaitons (669e3ed)
  • Enable minimum specificaitons (523cfd1)
  • Log rows dropped by PredictionTimeFilterer (7e02d8e)
  • Add moves loader (0521dd0)
  • First stab at loader (f9048b8)

Fix

  • Add pred_time_uuid if not specified when filtering (acca5b9)

Performance

  • Avoid groupby in filter_prediction_times (a66e361)

v0.12.0 (2022-12-15)

Feature

  • Add rows dropped logging (33ba525)
  • Allow filtering based on quarantine dates (3deb052)
  • Improve logging - debug to file, info to stdout (aff10a9)
  • Move wandb init earlier so wandb_alerts can cover values_df loading (6c153b1)
  • Generate full feature set (9ba907a)
  • Wrap as much of main as possible in wandb exception (3b085af)
  • Allow timestamps only return from visit loaders for use as pred_times (f9534e0)
  • Migrate some loaders to logging. (f81fd92)
  • More explicit logging (7969210)
  • Init changes (f257daa)

Fix

  • Use lookbehind instead of interval days (7e14ad5)
  • Only one feature cache per project (cb0b8b0)
  • Unused input args (fa14461)
  • Wandb util was missing text kwarg (64c1729)

Performance

  • Infer CPU cores from logical cores (309e9d2)

v0.11.0 (2022-12-13)

Feature

  • Add wandb alert on exception (3ff6e37)

Documentation

v0.10.0 (2022-11-21)

Feature

  • Add n_hba1c_within_n_lookahead_days (e84b591)
  • Add outcome (cd39dd6)
  • Add birth year as a predictor (7b186d2)
  • Allow exclusion of specific atc codes (75619a1)

Fix

  • Date of birth col name should respect output prefix (6ec6535)
  • Incorrect column name when adding age as predictor (cdbf25c)
  • Errors in sql loaders after refactor (28c9f63)
  • Correct type hinting in load_diagnoses (f2d5c5b)

Documentation

  • Speccify that n_rows = None returns all rows. (a4720a8)

Performance

  • Shuffle feature specs to even out compute vs. IO load (0db9f0f)
  • Tweak n_workers for more performance (3eeee4d)
  • Segment feature loading for more parallelisation (9ee5c87)
  • Rotate feature addition for debugging (76af9c7)
  • Parallelise temporal predictor loading (8d53f16)
  • Only create one subprocess per values loader (1a3e5de)
  • Parralelise groupspec combination creation (9ccba2a)

v0.9.0 (2022-11-18)

Feature

  • At groupspec init, iterate over values_loader and check that they exist in the loader registry (04dfd7e)

Fix

  • More explanation in error message (b784991)
  • Bettee valueerror message formatting (7b3b994)
  • Better valueerror message (d92f798)
  • Find invalid loaders (ba2d4c5)

v0.8.0 (2022-11-17)

Feature

  • Allow load_medications to concat a list of medications (d78f465)

Fix

  • Remove original functions (da59110)

Documentation

v0.7.0 (2022-11-16)

Feature

  • Full run (142212f)
  • Rename resolve_multiple registry keys to their previous one (3fd3f35)
  • Reimplement (c99585f)
  • Use lru cache decorator for values_df loading (4006818)
  • Add support for loader kwargs (127f821)
  • Move values_df resolution to _AnySpec object (714e83f)
  • Make date of birth output prefix a param (0ed1198)
  • Ensure that dfs are sorted and of same length before concat (84a4d65)
  • Use pandas with set_index for concat (b93290a)
  • Use pandas with set_index for concat (995da41)
  • Speed up dask join by using index (3402281)
  • Require feature name for all features, ensures proper specification (6af454a)
  • First stab at adapting generate_main (7243130)
  • Add exclusion timestamp (b02de1a)
  • Improve dd.concat (429da34)
  • Handle strs for generate_feature_spec (7d54488)
  • Convert to dd before concat (06101d8)
  • Add n hba1c (3780d84)
  • Add n hba1c (614245e)

Fix

  • Coerce by default (60adb99)
  • Output_col_name_override applied at loading, not flattening (95a96ce)
  • Typo (01240ed)
  • Incorrect attribute addressing (a6e82b5)
  • Correctly resolve values_df (def67cd)
  • MinGroupSpec should take a sequence of name to permute over (f0c8140)
  • Typo (61c7241)
  • Remove resolve_multiple_fn_name (617d386)
  • Old concat resulted in wrong ordering of rrows. (3759f71)
  • Set hba1c as eval (89fe6d2)
  • Typos (6eac440)
  • Correct col name inference for static predictors (dfe5dc7)
  • Misc. fixes (45f8348)
  • Generate the correct amount of combinations when creating specs (c472b3c)
  • Typo resulted in cache breaking (fdd47d7)
  • Correct col naming (bc74ae3)
  • Do not infer feature name from values_df (150569f)
  • Misc. errors found from tests (3a1b5db)
  • Revert falttened dataset to use specs (e4fada7)
  • Misc. errors after introducing feature specs (0308eca)
  • Correctly merge dataframes (a907885)
  • Cache error because of loss off UUID (89d7f6f)
  • New bugs in resolve_multiple (5714a39)
  • Rename outcomespec appropriately (41fa220)
  • Lookbehind_days must be iterable (cc879e9)

Documentation

Performance

  • Move pd->dd into subprocesses (dc5f38d)

v0.6.3 (2022-10-18)

Fix

  • Remove shak_code + operator check (f97aee8)

v0.6.2 (2022-10-17)

Fix

  • Ignore cat_features (2052505)
  • Failing test (f8190b4)
  • Incorrect 'latest' and handling of NaN in cache (dc33f7e)

v0.6.1 (2022-10-13)

Fix

  • Check for value column prediction_times_df (5356464)
  • Change variable name (990a848)
  • More flex loaders (bcad700)

v0.6.0 (2022-10-13)

Feature

  • Use wandb to monitor script errors (67ae9b9)

Fix

  • Duplicate loading when pre_loading dfs (7f864dc)

v0.5.2 (2022-10-12)

Fix

v0.5.1 (2022-10-10)

Fix

  • Change_per_day functions (d696389)
  • Change_per_day function (4c8c118)

v0.5.0 (2022-10-10)

Feature

  • Add variance to resolve multiple functions (8c471df)

Fix

  • Add vairance resolve multiple (7a64c5b)

v0.4.4 (2022-10-10)

Fix

  • Deleted_irritating_blank_space (a4cdfc5)

v0.4.3 (2022-10-10)

Fix

  • Auto inferred cat features (ea0d946)
  • Auto inferred cat features error (f244715)
  • Resolves errors caused from auto cat features (667a905)

v0.4.2 (2022-10-06)

Fix

  • Incorrect function argument (33e0a3e)
  • Expanded test to include outcome, now passes locally (640e7ec)
  • Passing local tests (6ed4b2e)
  • First stab at bug fix (339d793)

v0.4.1 (2022-10-06)

Fix

  • Add parents to wandb dir init (5eefe3a)

v0.4.0 (2022-10-06)

Feature

Fix

  • Refactor feature spec generation (17e9f16)
  • Align arguments with colnames in SQL (09ae5f7)
  • Refactor feature specification (373b0f0)

v0.3.2 (2022-10-05)

Fix

v0.3.1 (2022-10-05)

Fix

  • Mismatched version in .tomll (292979b)

v0.3.0 (2022-10-05)

Feature

Fix

  • Pass value_col only when necessary (dc1019f)
  • Pass value_col (4674e4a)
  • Don't remove NaNs, might be informative. (1ad5d81)
  • Remove parquet default argument except in top level functions (ec3a98b)
  • Align .toml and release version (80adbde)
  • Failing tests (b5e4321)
  • Incorrect feature sets path, linting (605ccb7)
  • Handle dicts for duplicate checking (34524c0)
  • Check for duplicates in feature combinations (63ad162)
  • Remove duplicate alat key which prevented file saving (f0c3e00)
  • Incorrect argumetn (b97d54b)
  • Linting (7406288)
  • Use suffix instead of string parsing (cfa96f0)
  • Refactor dataset loading into a separate function (bca8cbf)
  • More migration to parquet (f1bc2b7)
  • Mark hf embedding test as slow, only run if passing --runslow to pytest (0e03395)

v0.2.4 (2022-10-04)

Fix

  • Wandb not logging on overtaci. (3baab57)

v0.2.3 (2022-10-04)

Fix

  • Use dask for concatenation, increases perf (4235f5c)

v0.2.2 (2022-10-03)

Fix

  • Use pypi release of psycopmlutils (5283b05)

v0.2.1 (2022-10-03)

Fix

v0.2.0 (2022-09-30)

Feature

  • Add test for chunking logic (199ee6b)

Fix

v0.1.0 (2022-09-30)

Feature

Fix

  • Force dtype for windows (2e6e8bf)
  • Linting (5cdfcfa)
  • Pre code-split import statements need to be updated (a9e0639)
  • Misspecified python version in action (fdde2d2)