Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/35 implement regex matching #44

Merged
merged 20 commits into from
Jan 29, 2024
Merged

Conversation

PiaSchroeder
Copy link
Collaborator

@PiaSchroeder PiaSchroeder commented Dec 5, 2023

closes #35

Changes:

  • In config:
    • added regexs
    • removed active_db
  • In db:
    • removed set_db() and get_db() and all references to it
    • added match_db() --> Function that matches item names with db using regex
    • added db_name parameter to all other functions in db
  • In cache:
    • Adjusted JOB_ID_PATTERN and normalize_name() to also work with the ZensusDB and RegioDB item names
  • In helloworld:
    • added db_name to logincheck()
  • In http_helper:
    • adjusted JOB_ID_PATTERN to be more specific
    • added db matching to get_data_from_endpoint()
    • added _get_db_from_user_input() for when matching fails
  • nb 0_Databases.ipynb might be obsolete?

nb/01_Databases.ipynb Outdated Show resolved Hide resolved
nb/profile.ipynb Outdated Show resolved Hide resolved
src/pystatis/config.py Outdated Show resolved Hide resolved
src/pystatis/config.py Outdated Show resolved Hide resolved
src/pystatis/config.py Outdated Show resolved Hide resolved
src/pystatis/db.py Outdated Show resolved Hide resolved
src/pystatis/db.py Outdated Show resolved Hide resolved
src/pystatis/db.py Outdated Show resolved Hide resolved
src/pystatis/db.py Outdated Show resolved Hide resolved
src/pystatis/db.py Outdated Show resolved Hide resolved
src/pystatis/helloworld.py Outdated Show resolved Hide resolved
src/pystatis/http_helper.py Outdated Show resolved Hide resolved
src/pystatis/http_helper.py Outdated Show resolved Hide resolved
src/pystatis/http_helper.py Outdated Show resolved Hide resolved
src/pystatis/http_helper.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pmayd pmayd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a few comments. The main issue I see is that with this new approach it is no longer possible to fetch data from non-data endpoints like catalogue and find without the user having to specify the database name interactively, which might be ok in interactive sessions but not if you use our library in a script, which was also something we considered and should support. I am not sure what the best solution might be in this case. And probably we won't need it either, if no-one is going to use the library like that so it might be ok for the moment. However, I would like to have a proper and good solution instead of the mix we have right now. Maybe we can try to integrate results from all supported databases but that would increase the runtime a lot...

I guess for now we can live with it but we should try to think of a better approach

@pmayd
Copy link
Collaborator

pmayd commented Dec 8, 2023

I just noticed that my objection against the user input is not as strong as I thought because currently, we only use such endpoints in find. And I guess no-one is going to use the Find class in a script or to automate something, this will only be used in an interactive session to search something. Nevertheless, I think the best thing to do would be to specify the database in the Find class and pass it down to the load_data function the same we do it for logincheck etc. So you can either pass a database or if none is given we can try to get the number from the name.

I can implement this idea or leave it to you @PiaSchroeder . I can also fix the tests if you want

src/pystatis/db.py Outdated Show resolved Hide resolved
src/pystatis/db.py Outdated Show resolved Hide resolved
@pmayd
Copy link
Collaborator

pmayd commented Dec 8, 2023

I fixed all the tests and updated the branch

@pmayd pmayd force-pushed the feat/35-implement-regex-matching branch from 549bff3 to f897c77 Compare December 8, 2023 22:12
@pmayd
Copy link
Collaborator

pmayd commented Dec 8, 2023

I implemented a simple support in the Find class for setting the db parameter, so Find uses db_name to avoid asking the user, everything else should work as it was

@pmayd
Copy link
Collaborator

pmayd commented Dec 12, 2023

Should be clearer now what is open and what not.

@pmayd pmayd merged commit ceff3ad into dev Jan 29, 2024
9 checks passed
@pmayd pmayd deleted the feat/35-implement-regex-matching branch January 29, 2024 14:13
pmayd added a commit that referenced this pull request Jan 29, 2024
* Implemented regex matching, initial commit

* Added credentials check for cubes and removed all references to set_db()

* Implemented regex matching, initial commit

* Added credentials check for cubes and removed all references to set_db()

* fix tests

* refactoring Find and Result class to work with new database detection logic; because find does not use names like Table and Cube, use has to specify the database

* fix tests
---------

Co-authored-by: Michael Aydinbas <michael.aydinbas@gmail.com>
Co-authored-by: Michael Aydinbas <michael.aydinbas@new-work.se>
pmayd added a commit that referenced this pull request Feb 20, 2024
* Bump version to next major version #9

* Revert flake8 to ^3.0 for docstrings #9

* add a notebook that shows how to run init_config

* Make dev dependencies optional, update lock and README #9

* Update workflow install --with dev, add matrix poetry version #9

* Fix python and poetry version definition #9

* Fix python and poetry version definition #9

* fix lock file

* update dev dependencies and add python-dotenv to dev

* improve readme

* update readme

* Feat/8 handle multiple databases and users (#20)

* change config module to handle multiple databases

* finalize work on config module to handle multiple databases; significantly reduced lines of code by getting rid of the settings.ini

* add a new db module that serves as a layer between the user and the config. Can set the current active database and get the settings from the config

* simplify config module

* refactor code to implement new config; correct tests

* fix all remaining tests

* fix all text issues

* update notebooks according to latest changes in config

* drop support for Python 3.9 due to pipe operator for types and set supported versions to 3.10 and 3.11

* fix problem with config dir creation during setup

* fix isort

* Improve clear_cache output for full wipe, remove unused import

* Address all non global-related pylint issues #20

* because of complexity get rid of the current support of custom config dir and always use the default config dir under user home.

* fix all tests; get rid of settings.ini and functionality for user to define own config path; pystatis supports only default config path but custom data cache path

* fix all tests; get rid of settings.ini and functionality for user to define own config path; pystatis supports only default config path but custom data cache path

* refactor config module to work with a ConfigParser global config object instead of overwriting the config variable within the functions using global (bad style according to pylint)

* address pylint issues

* fix mypy issues

* fix pylint issues

---------

Co-authored-by: MarcoHuebner <marco_huebner1@gmx.de>

* update README to the latest changes of multi database support

* Added lists of all available statistics and tables

* Feat/10 update and auto deploy sphinx (#27)

* Updated dev-dependencies, added first version of Sphinx documentation, including built html documentation.

* Added Logo, updated theme, updated GitHub workflow, fixed docstrings in cache and cube. Hosting on ReadTheDocs has to be done by Owner/ CorrelAid (but can be requested and triggered that way).

* Updated urllib3 version, but everything <2.0.0 (deprecating `strict`) should be fine...

* Updated poetry as recommended in cachecontrol issue report.

* Fixed black formatting, fixed make docs (is now ran by poetry).

* Fixed linting issue, updated packages, updated make docs.

* Updated ReadMe, added developer sphinx documentation, added custom pre-commit hook and changed to hard-coded version in docs, added built documentation to artifacts, #3

* Add deployment workflow, needs Repo updates

* Update depencies for Sphinx documentation #10

* Remove redundant docu information #10

Render parts of the README.md in the respective .rst files

* Remove unused mdinclude, fix run-test py version, update pre-commit #10

* Fix dependency group for SPhinx workflow #10

* Fix docstring parameter rendering in Sphinx #10

* Fix image rendering by mimicking folder structure #10

* Add comment on warnings related to ext.napoleon #10

* Rename deploy-docs #10

* Fix black format issue in conf.py #10

* Update deploy key, add deploy trigger comment #10

* Update documentation deploy workflow #10

* Switch to matrix.os definition #10

* Fix pull_request target in deploy workflow #10

* Update poetry.lock #10

* Import package version to Sphinx docu #10

* Manually fix black formatting issue #10

* With auto-deploy working, decrease retention days #10

* Update readme and Sphinx header references #10

* Fix deploy to update files on the remote #10

* fix cube functionality: it seems like structure of QEI header part was changed as well as DQA no longer has information about axis so we assume that the order is preserved (#43)

* add jupytext and new nb for presentation

* Feat/35 implement regex matching (#44)

* Implemented regex matching, initial commit

* Added credentials check for cubes and removed all references to set_db()

* Implemented regex matching, initial commit

* Added credentials check for cubes and removed all references to set_db()

* fix tests

* refactoring Find and Result class to work with new database detection logic; because find does not use names like Table and Cube, use has to specify the database

* fix tests
---------

Co-authored-by: Michael Aydinbas <michael.aydinbas@gmail.com>
Co-authored-by: Michael Aydinbas <michael.aydinbas@new-work.se>

* add presentation nb

* remove presentation nb for now

* Feat/19 improve readability of the table format (#42)

* Reformatting the raw data tables for readability

* Adding comments

* Applied suggested changes and run code formatting

* add tests for Table

---------

Co-authored-by: Michael Aydinbas <michael.aydinbas@new-work.se>

* prepare Table so it can parse data from three different sources

* Added description and examples of Find

* implement parse logic for prettify zensus tables

* fix pylint issues

* edits on Find section

* fixing overwritten changes

* update presentation nb

* add genesis parse code for regio, too, for the moment.

* Feat/34 visualization examples (#48)

* Add 02_Geo_visualization_example.ipynb

* changed '-' to 0 instead of nan --> reproduce Simons result

* new case study in visualization notebook, integration to presentation notebook

* catch NA-values in read_csv and added Auspraegung_Code to table.py to have the unique region identifiers

---------

Co-authored-by: jkrause <jkrause123@users.noreply.github.com>

* final presentation nb and shape data; omit file check in pre-commit

* fixed typo and beautified plots in presentation.ipynb /.py

* add a first workaround for the new Zensus zip content type

* fix all tests; separate Find and Results classes into own modules

* update dependencies

* update README

* set version to 0.2

* remove Cubes from package for now; we no longer support cubes until they are requested

* fix all tests; fix all relevant nb;

* fix pylint issues

* fix mypy issues

* add documentation key

* update changelog

---------

Co-authored-by: MarcoHuebner <marco_huebner1@gmx.de>
Co-authored-by: Pia <45008571+PiaSchroeder@users.noreply.github.com>
Co-authored-by: MarcoHuebner <57489799+MarcoHuebner@users.noreply.github.com>
Co-authored-by: zosiaboro <50183305+zosiaboro@users.noreply.github.com>
Co-authored-by: Zosia Borowska <zofia.anna.borowska@gmail.com>
Co-authored-by: jkrause123 <89632018+jkrause123@users.noreply.github.com>
Co-authored-by: jkrause <jkrause123@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement regex matching
2 participants