- Add logging and choose sfi types (
d5f8e23
) - Create example scripts (
76e063a
) - Initial text model pipelines (
1934db0
) - Add tests (
d7a8bab
) - Initial simple preprocessing pipeline for all sfis (
f941a4d
) - Add include_sfi_name in load_text_split (
4605c88
) - Include_sfi_name arg (
58baf9a
) - Fit and load tfidf, bow, and lda models (
3d33d9b
)
- Preprocess to one regex (
c716653
) - Remove symbols again (
1210b7e
) - Based on HLasses comments (
32da48f
) - Insert model type in filename (
1457387
) - Add doc strings to preprocessing functions (
4e27650
) - Remove log.info and small fixes (
84f3cc3
) - Ruff fixes (
ea9c564
) - Return vectorizer and matrix + clean-up (
e1c48a0
) - Query string (
cb7424c
) - Naming and doc string update (
141e52a
) - General clean-up and change corpus in fit functions to list (
22b6a9e
) - Change ngram default and clean-up (
387f845
) - Small fixes to logging (
c3a3f53
) - Remove old comments (
4b88514
) - Change view name (
a9bb0fc
) - Move save_text_model_to_dir to utils (
469df3b
) - Move save_text_model_to_dir to utils (
26a80d2
) - Renaming in preprocessing (
c381768
) - Remove stop_words arg and return models (
3d29012
) - Change arg path to path_str (
f781a74
) - Enable multiple splits when loading data + add n_rows arg (
8ae2d2e
) - Remove Path from arg (
29b442b
)
- Add feature descriptions for text features (
84c696a
)
- Add readme link (
217e550
)
- Remove unreasonably high or low bmi values (
07f52c2
)
- Make sql query executable (
e006490
) - Str turned into list of characters instead of list of words (
0fae478
)
- Add unpack args to skema 2 wo nutrition (
95c35c8
)
- Support new pipe annotation (
a1bde17
)
- Correct types (
5cb0d5d
)
- Add skema_2_without_nutrition again (
685c5cb
)
- Cruft github action (
c8f6278
) - Bug in cruft action (
ec8267a
) - Remove psycop-ml-utils, no longer exists (
d8fbb65
)
- Add more glc loaders (
b765e77
) - Add type 1 diabetes loaders (
b682984
) - Make sql loader verbose (
602f4f3
) - Add caching to sql_load (
a68c15d
) - Ibid (
46da732
) - Add support for keeping code col when loading diagnoses (
51ca63e
) - Add t2d diagnosis loading (
6b8231c
) - Add ogtt (
f6c07a9
) - Update current blood sugar measurements (
5e8051a
)
- Lacking prefix on loading glc (
d9bdbcb
) - Inappropriate matching (
e2409ed
) - Poetry formatted dependencies (
125500a
)
- Disable cache (
0242114
)
- Add option for which timestamp to get when loading physical visits (
ef369b8
)
- Drop duplicates in the output_df (
636cc48
) - Don't load duplicate visits (
5028b1d
) - Physical visits should only load physical visits (
b7c50cf
) - Did not rename to timestamp before returning (
f43522c
)
- Loader names still too long (
3321b88
)
- Loader names too long for wandb (
cc14da2
)
- ValueError correction (
595479e
)
- Adjust function for saving integrity checks (
de2577e
) - Restructure overarching description func (
54c24a2
)
- Better function description (
7eb9e54
)
- Add arg for choosing timestamp and add warning (
159a176
)
- Make naming scheme consistent (
c125b48
) - Attempted rename of unspecified df (
c266bd8
) - Revert logic (
ad110ee
) - Quarantine_df and quarantine_days can be left as None (
f130370
)
- Allowed types works again (
dbe75ca
) - All arg names now congruent, visit_types takes a list of visit types instead of string (
e63e9d4
)
- Add text loaders (
9c7d959
)
- Use acute outpatient visits as well (
659af23
) - Typo, and use newest data (
bbbc8f5
) - Use end dates for all contacts (
d8940c1
) - Use end times for all diagnosis loading (
4d9e600
)
- Remove try/except to avoid debugger getting stuck on it (
3884ab8
)
- Move all str operations into the if statement (
91f9174
)
- Move logs next to their dataset (
e0ed033
)
- Improve quarantine docs (
1b23f19
)
- Name wandb project_name-feature-generation (
b601d80
)
- Improve logging in flatten_dataset (
63f252f
) - Enable minimum specificaitons (
669e3ed
) - Enable minimum specificaitons (
523cfd1
) - Log rows dropped by PredictionTimeFilterer (
7e02d8e
) - Add moves loader (
0521dd0
) - First stab at loader (
f9048b8
)
- Add pred_time_uuid if not specified when filtering (
acca5b9
)
- Avoid groupby in filter_prediction_times (
a66e361
)
- Add rows dropped logging (
33ba525
) - Allow filtering based on quarantine dates (
3deb052
) - Improve logging - debug to file, info to stdout (
aff10a9
) - Move wandb init earlier so wandb_alerts can cover values_df loading (
6c153b1
) - Generate full feature set (
9ba907a
) - Wrap as much of main as possible in wandb exception (
3b085af
) - Allow timestamps only return from visit loaders for use as pred_times (
f9534e0
) - Migrate some loaders to logging. (
f81fd92
) - More explicit logging (
7969210
) - Init changes (
f257daa
)
- Use lookbehind instead of interval days (
7e14ad5
) - Only one feature cache per project (
cb0b8b0
) - Unused input args (
fa14461
) - Wandb util was missing text kwarg (
64c1729
)
- Infer CPU cores from logical cores (
309e9d2
)
- Add wandb alert on exception (
3ff6e37
)
- Improve create_flattened_dataset docs (
637edfe
) - Misc. docs (
4eac2ba
) - Fix github test badge (
dffeedc
)
- Add n_hba1c_within_n_lookahead_days (
e84b591
) - Add outcome (
cd39dd6
) - Add birth year as a predictor (
7b186d2
) - Allow exclusion of specific atc codes (
75619a1
)
- Date of birth col name should respect output prefix (
6ec6535
) - Incorrect column name when adding age as predictor (
cdbf25c
) - Errors in sql loaders after refactor (
28c9f63
) - Correct type hinting in load_diagnoses (
f2d5c5b
)
- Speccify that n_rows = None returns all rows. (
a4720a8
)
- Shuffle feature specs to even out compute vs. IO load (
0db9f0f
) - Tweak n_workers for more performance (
3eeee4d
) - Segment feature loading for more parallelisation (
9ee5c87
) - Rotate feature addition for debugging (
76af9c7
) - Parallelise temporal predictor loading (
8d53f16
) - Only create one subprocess per values loader (
1a3e5de
) - Parralelise groupspec combination creation (
9ccba2a
)
- At groupspec init, iterate over values_loader and check that they exist in the loader registry (
04dfd7e
)
- More explanation in error message (
b784991
) - Bettee valueerror message formatting (
7b3b994
) - Better valueerror message (
d92f798
) - Find invalid loaders (
ba2d4c5
)
- Allow load_medications to concat a list of medications (
d78f465
)
- Remove original functions (
da59110
)
- Improve docs (
9aad0af
)
- Full run (
142212f
) - Rename resolve_multiple registry keys to their previous one (
3fd3f35
) - Reimplement (
c99585f
) - Use lru cache decorator for values_df loading (
4006818
) - Add support for loader kwargs (
127f821
) - Move values_df resolution to _AnySpec object (
714e83f
) - Make date of birth output prefix a param (
0ed1198
) - Ensure that dfs are sorted and of same length before concat (
84a4d65
) - Use pandas with set_index for concat (
b93290a
) - Use pandas with set_index for concat (
995da41
) - Speed up dask join by using index (
3402281
) - Require feature name for all features, ensures proper specification (
6af454a
) - First stab at adapting generate_main (
7243130
) - Add exclusion timestamp (
b02de1a
) - Improve dd.concat (
429da34
) - Handle strs for generate_feature_spec (
7d54488
) - Convert to dd before concat (
06101d8
) - Add n hba1c (
3780d84
) - Add n hba1c (
614245e
)
- Coerce by default (
60adb99
) - Output_col_name_override applied at loading, not flattening (
95a96ce
) - Typo (
01240ed
) - Incorrect attribute addressing (
a6e82b5
) - Correctly resolve values_df (
def67cd
) - MinGroupSpec should take a sequence of name to permute over (
f0c8140
) - Typo (
61c7241
) - Remove resolve_multiple_fn_name (
617d386
) - Old concat resulted in wrong ordering of rrows. (
3759f71
) - Set hba1c as eval (
89fe6d2
) - Typos (
6eac440
) - Correct col name inference for static predictors (
dfe5dc7
) - Misc. fixes (
45f8348
) - Generate the correct amount of combinations when creating specs (
c472b3c
) - Typo resulted in cache breaking (
fdd47d7
) - Correct col naming (
bc74ae3
) - Do not infer feature name from values_df (
150569f
) - Misc. errors found from tests (
3a1b5db
) - Revert falttened dataset to use specs (
e4fada7
) - Misc. errors after introducing feature specs (
0308eca
) - Correctly merge dataframes (
a907885
) - Cache error because of loss off UUID (
89d7f6f
) - New bugs in resolve_multiple (
5714a39
) - Rename outcomespec appropriately (
41fa220
) - Lookbehind_days must be iterable (
cc879e9
)
- Move pd->dd into subprocesses (
dc5f38d
)
- Remove shak_code + operator check (
f97aee8
)
- Ignore cat_features (
2052505
) - Failing test (
f8190b4
) - Incorrect 'latest' and handling of NaN in cache (
dc33f7e
)
- Check for value column prediction_times_df (
5356464
) - Change variable name (
990a848
) - More flex loaders (
bcad700
)
- Use wandb to monitor script errors (
67ae9b9
)
- Duplicate loading when pre_loading dfs (
7f864dc
)
- Add variance to resolve multiple functions (
8c471df
)
- Add vairance resolve multiple (
7a64c5b
)
- Deleted_irritating_blank_space (
a4cdfc5
)
- Auto inferred cat features (
ea0d946
) - Auto inferred cat features error (
f244715
) - Resolves errors caused from auto cat features (
667a905
)
- Incorrect function argument (
33e0a3e
) - Expanded test to include outcome, now passes locally (
640e7ec
) - Passing local tests (
6ed4b2e
) - First stab at bug fix (
339d793
)
- Add parents to wandb dir init (
5eefe3a
)
- Add BMI loader (
b6681ea
)
- Refactor feature spec generation (
17e9f16
) - Align arguments with colnames in SQL (
09ae5f7
) - Refactor feature specification (
373b0f0
)
- Hardcoded file suffix (
0101acc
)
- Mismatched version in .tomll (
292979b
)
- Pass value_col only when necessary (
dc1019f
) - Pass value_col (
4674e4a
) - Don't remove NaNs, might be informative. (
1ad5d81
) - Remove parquet default argument except in top level functions (
ec3a98b
) - Align .toml and release version (
80adbde
) - Failing tests (
b5e4321
) - Incorrect feature sets path, linting (
605ccb7
) - Handle dicts for duplicate checking (
34524c0
) - Check for duplicates in feature combinations (
63ad162
) - Remove duplicate alat key which prevented file saving (
f0c3e00
) - Incorrect argumetn (
b97d54b
) - Linting (
7406288
) - Use suffix instead of string parsing (
cfa96f0
) - Refactor dataset loading into a separate function (
bca8cbf
) - More migration to parquet (
f1bc2b7
) - Mark hf embedding test as slow, only run if passing --runslow to pytest (
0e03395
)
- Wandb not logging on overtaci. (
3baab57
)
- Use dask for concatenation, increases perf (
4235f5c
)
- Use pypi release of psycopmlutils (
5283b05
)
- First release to pypi (
c29aa3c
)
- Add test for chunking logic (
199ee6b
)