Changelog

v0.23.0 (2023-04-26)

Feature

Add logging and choose sfi types (d5f8e23)
Create example scripts (76e063a)
Initial text model pipelines (1934db0)
Add tests (d7a8bab)
Initial simple preprocessing pipeline for all sfis (f941a4d)
Add include_sfi_name in load_text_split (4605c88)
Include_sfi_name arg (58baf9a)
Fit and load tfidf, bow, and lda models (3d33d9b)

Fix

Preprocess to one regex (c716653)
Remove symbols again (1210b7e)
Based on HLasses comments (32da48f)
Insert model type in filename (1457387)
Add doc strings to preprocessing functions (4e27650)
Remove log.info and small fixes (84f3cc3)
Ruff fixes (ea9c564)
Return vectorizer and matrix + clean-up (e1c48a0)
Query string (cb7424c)
Naming and doc string update (141e52a)
General clean-up and change corpus in fit functions to list (22b6a9e)
Change ngram default and clean-up (387f845)
Small fixes to logging (c3a3f53)
Remove old comments (4b88514)
Change view name (a9bb0fc)
Move save_text_model_to_dir to utils (469df3b)
Move save_text_model_to_dir to utils (26a80d2)
Renaming in preprocessing (c381768)
Remove stop_words arg and return models (3d29012)
Change arg path to path_str (f781a74)
Enable multiple splits when loading data + add n_rows arg (8ae2d2e)
Remove Path from arg (29b442b)

v0.22.0 (2023-04-24)

Feature

Add feature descriptions for text features (84c696a)

Documentation

Add readme link (217e550)

v0.21.4 (2023-04-04)

Fix

Remove unreasonably high or low bmi values (07f52c2)

v0.21.3 (2023-04-03)

Fix

Make sql query executable (e006490)
Str turned into list of characters instead of list of words (0fae478)

v0.21.2 (2023-03-27)

Fix

Add unpack args to skema 2 wo nutrition (95c35c8)

v0.21.1 (2023-03-22)

Fix

Only keep weights above 0.5 kg (8a5a104)
Do not load invalid weights (7be4653)

v0.21.0 (2023-03-22)

Feature

Support new pipe annotation (a1bde17)

Fix

Correct types (5cb0d5d)

v0.20.3 (2023-03-14)

Fix

Set unpack_to_intervals to default (64391ca)
Remove unintended space (9c6cd33)

v0.20.2 (2023-03-14)

Fix

Add skema_2_without_nutrition again (685c5cb)

v0.20.1 (2023-03-11)

Fix

Cruft github action (c8f6278)
Bug in cruft action (ec8267a)
Remove psycop-ml-utils, no longer exists (d8fbb65)

v0.20.0 (2023-03-09)

Feature

Add more glc loaders (b765e77)
Add type 1 diabetes loaders (b682984)
Make sql loader verbose (602f4f3)
Add caching to sql_load (a68c15d)
Ibid (46da732)
Add support for keeping code col when loading diagnoses (51ca63e)
Add t2d diagnosis loading (6b8231c)
Add ogtt (f6c07a9)
Update current blood sugar measurements (5e8051a)

Fix

Lacking prefix on loading glc (d9bdbcb)
Inappropriate matching (e2409ed)
Poetry formatted dependencies (125500a)

v0.19.2 (2023-03-06)

Fix

Disable cache (0242114)

v0.19.1 (2023-03-06)

Fix

Drop rows with NaT (5a1d908)
Round timestamps to whole seconds befor droppig duplicates (e503bf3)

v0.19.0 (2023-03-03)

Feature

Add option for which timestamp to get when loading physical visits (ef369b8)

Fix

Drop duplicates in the output_df (636cc48)
Don't load duplicate visits (5028b1d)
Physical visits should only load physical visits (b7c50cf)
Did not rename to timestamp before returning (f43522c)

v0.18.4 (2023-02-22)

Fix

Loader names still too long (3321b88)

v0.18.3 (2023-02-22)

Fix

Loader names too long for wandb (cc14da2)

v0.18.2 (2023-02-21)

Fix

ValueError correction (595479e)

v0.18.1 (2023-02-15)

Fix

Adjust function for saving integrity checks (de2577e)
Restructure overarching description func (54c24a2)

Documentation

Better function description (7eb9e54)

v0.18.0 (2023-02-14)

Feature

Add arg for choosing timestamp and add warning (159a176)

v0.17.2 (2023-02-13)

Fix

Make naming scheme consistent (c125b48)
Attempted rename of unspecified df (c266bd8)
Revert logic (ad110ee)
Quarantine_df and quarantine_days can be left as None (f130370)

v0.17.1 (2023-02-10)

Fix

Allowed types works again (dbe75ca)
All arg names now congruent, visit_types takes a list of visit types instead of string (e63e9d4)

v0.17.0 (2023-02-09)

Feature

Add text loaders (9c7d959)

v0.16.1 (2023-01-31)

Fix

Use acute outpatient visits as well (659af23)
Typo, and use newest data (bbbc8f5)
Use end dates for all contacts (d8940c1)
Use end times for all diagnosis loading (4d9e600)

v0.16.0 (2023-01-27)

Feature

Remove try/except to avoid debugger getting stuck on it (3884ab8)

Fix

Move all str operations into the if statement (91f9174)

v0.15.0 (2022-12-19)

Feature

Move logs next to their dataset (e0ed033)

Documentation

Improve quarantine docs (1b23f19)

v0.14.0 (2022-12-16)

Feature

Name wandb project_name-feature-generation (b601d80)

v0.13.0 (2022-12-16)

Feature

Improve logging in flatten_dataset (63f252f)
Enable minimum specificaitons (669e3ed)
Enable minimum specificaitons (523cfd1)
Log rows dropped by PredictionTimeFilterer (7e02d8e)
Add moves loader (0521dd0)
First stab at loader (f9048b8)

Fix

Add pred_time_uuid if not specified when filtering (acca5b9)

Performance

Avoid groupby in filter_prediction_times (a66e361)

v0.12.0 (2022-12-15)

Feature

Add rows dropped logging (33ba525)
Allow filtering based on quarantine dates (3deb052)
Improve logging - debug to file, info to stdout (aff10a9)
Move wandb init earlier so wandb_alerts can cover values_df loading (6c153b1)
Generate full feature set (9ba907a)
Wrap as much of main as possible in wandb exception (3b085af)
Allow timestamps only return from visit loaders for use as pred_times (f9534e0)
Migrate some loaders to logging. (f81fd92)
More explicit logging (7969210)
Init changes (f257daa)

Fix

Use lookbehind instead of interval days (7e14ad5)
Only one feature cache per project (cb0b8b0)
Unused input args (fa14461)
Wandb util was missing text kwarg (64c1729)

Performance

Infer CPU cores from logical cores (309e9d2)

v0.11.0 (2022-12-13)

Feature

Add wandb alert on exception (3ff6e37)

Documentation

Improve create_flattened_dataset docs (637edfe)
Misc. docs (4eac2ba)
Fix github test badge (dffeedc)

v0.10.0 (2022-11-21)

Feature

Add n_hba1c_within_n_lookahead_days (e84b591)
Add outcome (cd39dd6)
Add birth year as a predictor (7b186d2)
Allow exclusion of specific atc codes (75619a1)

Fix

Date of birth col name should respect output prefix (6ec6535)
Incorrect column name when adding age as predictor (cdbf25c)
Errors in sql loaders after refactor (28c9f63)
Correct type hinting in load_diagnoses (f2d5c5b)

Documentation

Speccify that n_rows = None returns all rows. (a4720a8)

Performance

Shuffle feature specs to even out compute vs. IO load (0db9f0f)
Tweak n_workers for more performance (3eeee4d)
Segment feature loading for more parallelisation (9ee5c87)
Rotate feature addition for debugging (76af9c7)
Parallelise temporal predictor loading (8d53f16)
Only create one subprocess per values loader (1a3e5de)
Parralelise groupspec combination creation (9ccba2a)

v0.9.0 (2022-11-18)

Feature

At groupspec init, iterate over values_loader and check that they exist in the loader registry (04dfd7e)

Fix

More explanation in error message (b784991)
Bettee valueerror message formatting (7b3b994)
Better valueerror message (d92f798)
Find invalid loaders (ba2d4c5)

v0.8.0 (2022-11-17)

Feature

Allow load_medications to concat a list of medications (d78f465)

Fix

Remove original functions (da59110)

Documentation

Improve docs (9aad0af)

v0.7.0 (2022-11-16)

Feature

Full run (142212f)
Rename resolve_multiple registry keys to their previous one (3fd3f35)
Reimplement (c99585f)
Use lru cache decorator for values_df loading (4006818)
Add support for loader kwargs (127f821)
Move values_df resolution to _AnySpec object (714e83f)
Make date of birth output prefix a param (0ed1198)
Ensure that dfs are sorted and of same length before concat (84a4d65)
Use pandas with set_index for concat (b93290a)
Use pandas with set_index for concat (995da41)
Speed up dask join by using index (3402281)
Require feature name for all features, ensures proper specification (6af454a)
First stab at adapting generate_main (7243130)
Add exclusion timestamp (b02de1a)
Improve dd.concat (429da34)
Handle strs for generate_feature_spec (7d54488)
Convert to dd before concat (06101d8)
Add n hba1c (3780d84)
Add n hba1c (614245e)

Fix

Coerce by default (60adb99)
Output_col_name_override applied at loading, not flattening (95a96ce)
Typo (01240ed)
Incorrect attribute addressing (a6e82b5)
Correctly resolve values_df (def67cd)
MinGroupSpec should take a sequence of name to permute over (f0c8140)
Typo (61c7241)
Remove resolve_multiple_fn_name (617d386)
Old concat resulted in wrong ordering of rrows. (3759f71)
Set hba1c as eval (89fe6d2)
Typos (6eac440)
Correct col name inference for static predictors (dfe5dc7)
Misc. fixes (45f8348)
Generate the correct amount of combinations when creating specs (c472b3c)
Typo resulted in cache breaking (fdd47d7)
Correct col naming (bc74ae3)
Do not infer feature name from values_df (150569f)
Misc. errors found from tests (3a1b5db)
Revert falttened dataset to use specs (e4fada7)
Misc. errors after introducing feature specs (0308eca)
Correctly merge dataframes (a907885)
Cache error because of loss off UUID (89d7f6f)
New bugs in resolve_multiple (5714a39)
Rename outcomespec appropriately (41fa220)
Lookbehind_days must be iterable (cc879e9)

Documentation

Document feature spec objects (c7f1074)
Typo (6bc7140)

Performance

Move pd->dd into subprocesses (dc5f38d)

v0.6.3 (2022-10-18)

Fix

Remove shak_code + operator check (f97aee8)

v0.6.2 (2022-10-17)

Fix

Ignore cat_features (2052505)
Failing test (f8190b4)
Incorrect 'latest' and handling of NaN in cache (dc33f7e)

v0.6.1 (2022-10-13)

Fix

Check for value column prediction_times_df (5356464)
Change variable name (990a848)
More flex loaders (bcad700)

v0.6.0 (2022-10-13)

Feature

Use wandb to monitor script errors (67ae9b9)

Fix

Duplicate loading when pre_loading dfs (7f864dc)

v0.5.2 (2022-10-12)

Fix

Change_per_day function (bf4f18c)
Change_per_day function (b11bcaa)

v0.5.1 (2022-10-10)

Fix

Change_per_day functions (d696389)
Change_per_day function (4c8c118)

v0.5.0 (2022-10-10)

Feature

Add variance to resolve multiple functions (8c471df)

Fix

Add vairance resolve multiple (7a64c5b)

v0.4.4 (2022-10-10)

Fix

Deleted_irritating_blank_space (a4cdfc5)

v0.4.3 (2022-10-10)

Fix

Auto inferred cat features (ea0d946)
Auto inferred cat features error (f244715)
Resolves errors caused from auto cat features (667a905)

v0.4.2 (2022-10-06)

Fix

Incorrect function argument (33e0a3e)
Expanded test to include outcome, now passes locally (640e7ec)
Passing local tests (6ed4b2e)
First stab at bug fix (339d793)

v0.4.1 (2022-10-06)

Fix

Add parents to wandb dir init (5eefe3a)

v0.4.0 (2022-10-06)

Feature

Add BMI loader (b6681ea)

Fix

Refactor feature spec generation (17e9f16)
Align arguments with colnames in SQL (09ae5f7)
Refactor feature specification (373b0f0)

v0.3.2 (2022-10-05)

Fix

Hardcoded file suffix (0101acc)

v0.3.1 (2022-10-05)

Fix

Mismatched version in .tomll (292979b)

v0.3.0 (2022-10-05)

Feature

Update PR template (dfbf153)
Migrate to parquet (a027549)
Set ranges for dependencies (e98b2a7)

Fix

Pass value_col only when necessary (dc1019f)
Pass value_col (4674e4a)
Don't remove NaNs, might be informative. (1ad5d81)
Remove parquet default argument except in top level functions (ec3a98b)
Align .toml and release version (80adbde)
Failing tests (b5e4321)
Incorrect feature sets path, linting (605ccb7)
Handle dicts for duplicate checking (34524c0)
Check for duplicates in feature combinations (63ad162)
Remove duplicate alat key which prevented file saving (f0c3e00)
Incorrect argumetn (b97d54b)
Linting (7406288)
Use suffix instead of string parsing (cfa96f0)
Refactor dataset loading into a separate function (bca8cbf)
More migration to parquet (f1bc2b7)
Mark hf embedding test as slow, only run if passing --runslow to pytest (0e03395)

v0.2.4 (2022-10-04)

Fix

Wandb not logging on overtaci. (3baab57)

v0.2.3 (2022-10-04)

Fix

Use dask for concatenation, increases perf (4235f5c)

v0.2.2 (2022-10-03)

Fix

Use pypi release of psycopmlutils (5283b05)

v0.2.1 (2022-10-03)

Fix

First release to pypi (c29aa3c)

v0.2.0 (2022-09-30)

Feature

Add test for chunking logic (199ee6b)

Fix

Pre-commit edits (94af649)
Remove unnecessary comment (3931395)

v0.1.0 (2022-09-30)

Feature

First release! (95a557c)
Add automatic release (a5023e5)
Update dependencies (34efeaf)
First rename (879bde9)
Init commit (cdcab07)

Fix

Force dtype for windows (2e6e8bf)
Linting (5cdfcfa)
Pre code-split import statements need to be updated (a9e0639)
Misspecified python version in action (fdde2d2)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

v0.23.0 (2023-04-26)

Feature

Fix

v0.22.0 (2023-04-24)

Feature

Documentation

v0.21.4 (2023-04-04)

Fix

v0.21.3 (2023-04-03)

Fix

v0.21.2 (2023-03-27)

Fix

v0.21.1 (2023-03-22)

Fix

v0.21.0 (2023-03-22)

Feature

Fix

v0.20.3 (2023-03-14)

Fix

v0.20.2 (2023-03-14)

Fix

v0.20.1 (2023-03-11)

Fix

v0.20.0 (2023-03-09)

Feature

Fix

v0.19.2 (2023-03-06)

Fix

v0.19.1 (2023-03-06)

Fix

v0.19.0 (2023-03-03)

Feature

Fix

v0.18.4 (2023-02-22)

Fix

v0.18.3 (2023-02-22)

Fix

v0.18.2 (2023-02-21)

Fix

v0.18.1 (2023-02-15)

Fix

Documentation

v0.18.0 (2023-02-14)

Feature

v0.17.2 (2023-02-13)

Fix

v0.17.1 (2023-02-10)

Fix

v0.17.0 (2023-02-09)

Feature

v0.16.1 (2023-01-31)

Fix

v0.16.0 (2023-01-27)

Feature

Fix

v0.15.0 (2022-12-19)

Feature

Documentation

v0.14.0 (2022-12-16)

Feature

v0.13.0 (2022-12-16)

Feature

Fix

Performance

v0.12.0 (2022-12-15)

Feature

Fix

Performance

v0.11.0 (2022-12-13)

Feature

Documentation

v0.10.0 (2022-11-21)

Feature