Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] SA2.0, model/data-access edits, unit testing #17551

Closed
wants to merge 39 commits into from

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Feb 27, 2024

Simple model and data-access unit testing. Db created and loaded once per session; cleared after each test. Dependent model instances created recursively as needed, with provided arguments or reasonable defaults.
Draft.
Builds on #17180.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/feature area/database Galaxy's database or data access layer labels Feb 27, 2024
@jdavcs jdavcs added this to the 24.1 milestone Feb 27, 2024
@jdavcs jdavcs force-pushed the dev_data_access branch 4 times, most recently from 776e6d4 to 7041b3a Compare February 28, 2024 04:23
Comment on lines 1184 to 1186
job_id = self.sa_session.get(Job, self.job_id)
assert job_id
return job_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised that this is actually a job not a job_id, so wrong variable name, sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll squash all the commits from the previous SA20 PR into one to avoid confusion (this one has to build on that one, which is not yet merged)

@jdavcs jdavcs force-pushed the dev_data_access branch 6 times, most recently from befbbf3 to a3bde5f Compare March 4, 2024 22:49
@jdavcs jdavcs force-pushed the dev_data_access branch 5 times, most recently from 7ea1b84 to d2f8e8b Compare March 8, 2024 20:07
@jdavcs jdavcs changed the title [WIP] Model and data-access unit testing [WIP] SA2.0 upgrades; model/data-access improvements and unit testing Mar 8, 2024
@jdavcs jdavcs changed the title [WIP] SA2.0 upgrades; model/data-access improvements and unit testing [WIP] SA2.0, model/data-access edits, unit testing Mar 8, 2024
Upgrade SQLAlchemy to 2.0

This conflicts with dependency requirements for sqlalchemy-graphene
(used only in toolshed, new WIP client)

Remove RemovedIn20Warning from config

This does not exist in SQLAlchemy 2.0

Update import path for DeclarativeMeta

Move declaration of injected attrs into constructor
Remove unused import

For context: https://github.com/galaxyproject/galaxy/pull/14717/files#r1486979280

Also, remove model attr type hints that conflict with SA2.0

Apply Mapped/mapped_column to model definitions

Included models: galaxy, tool shed, tool shed install
Column types:
DateTime
Integer
Boolan
Unicode
String (Text/TEXT/TrimmedString/VARCHAR)
UUID
Numeric

NOTE on typing of nullability: db schema != python app

- Mapped[datetime] specifies correct type for the python app;
- nullable=True specifies correct mapping to the db schema (that's what
  the CREATE TABLE sql statement will reflect).

mapped_column.nullable takes precedence over typing annotation of
Mapped. So, if we have:

foo: Mapped[str] = mapped_column(String, nullable=True)

- that means that the foo db field will allow NULL, but the python app
  will not allow foo = None. And vice-versa:

bar: Mapped[Optional[str]] = mapped_column(String, nullable=False)

- the bar db field is NOT NULL, but bar = None is OK.

This might need to be applied to other column definitions, but for now
this addresses specific mypy errors.

Ref: https://docs.sqlalchemy.org/en/20/orm/declarative_tables.html#mapped-column-derives-the-datatype-and-nullability-from-the-mapped-annotation

Add typing to JSON columns, fix related mypy errors

Columns:
MutableJSONType
JSONType
DoubleEncodedJsonType

TODO: I think we need a type alias for json-typed columns: bytes understand
iteration, but not access by key.

Use correct type hints to define common model attrs

Start applying Mapped to relationship definitions in the model

Remove column declaration from HasTags parent class

Fix SA2.0 error: wrap sql in text()

Fix SA2.0 error: pass bind to create_all

Fix SA2.0 error: use Row._mapping for keyed attribute access

Ref: https://docs.sqlalchemy.org/en/20/changelog/migration_20.html#result-rows-act-like-named-tuples

Fix SA2.0 error: show password in url

SA 1.4: str(url) renders connection string with password
SA 2.0: str(url) renders connection string WITHOUT password
Solution: Use render_as_string(hide_password=False)

Fix SA2.0 error: use attribute_keyed_dict

Replaces attribute_mapped_collection (SA20)

Fix SA2.0 error: make select stmt a subquery

Rename varable to fix mypy

Fix SA2.0 error: explicitly use subquery() for select-from argument

Fix SA2.0 error: replase session.bind with session.get_bind()

Fix SA2.0 error: joinedload does not take str args

Fix use of table model attribute

- Use __table__ (SA attr) instead of table (galaxy attr) on mapped classes
- Drop .table and .table.c where redundant

Fix bug: fix HistoryAudit model

It is not a child of RepresentById becuase it does not and should not have an id attr.

Duplicating the __repr__ definition in the HistoryAudit class is a
temporary fix: a proper fix requires changing all models (id and
__repr__ should be split into 2 mixins): to be done in a follow-up PR.

Fix bug: check if template.fields is not null before iterating

Fix bug: call unique() on result, not select stmt

Fix bug: do not pass subquery to in_

Fix bug/typo: use select_from

Fix bug: if using alias on ORM entity, use __table__ as valid FromClause

Fix bug: HDAH model is not serializable (caught by mypy)

Fix typing error: migrations.base

Fix typing error: managers.secured

This fixed 58 mypy errors!

Fix typing error: session type

Fix typing error: use Session instead of scoped_session

No need to pass around scoped_session as arguments

Fix typing error: sharable

Fix SA2.0 error: sqlalchemy exceptions import; minor mypy fix

Mypy: type-ignore: this is never SessionlessContext

Mypy: use verbose assignment to help mypy

Mypy: add assert stmt

Mypy: add assert to ensure seesion is not None

Calling that method when a User obj is not attached to a session should
not happen.

Mypy: return 0 if no results

Mypy: type-ignore: scoped_session vs. install_model_session

We use the disctinction for DI.

Mypy: refactor to one-liner

Mypy: add assert stmts where we know session returns an object

Mypy: rename wfi_step > wfi_step_sq when it becomes a subquery

Job search refactor: factor out build_job_subquery

Job search refactor: build_stmt_for_hda

Job search refactor: build_stmt_for_ldda

Job search refactor: build_stmt_for_hdca

Job search refactor: build_stmt_for_dce

Job search refactor: rename query >> stmt

Mypy: add anno for Lists; type-ignore for HDAs

Note: type-ignore is due to imperative mapping of HDAs (and LDDAs). This
will be removed once we map those models declaratively

Mypy: managers.histories

Mypy: model.deferred

Mypy: arg passed to template can be None

Mypy: celery tasks

type ignore arg: we need to map DatasetInstance classes declaratively
for that to work correctly.

Mypy: type-ignore hda attr-defined error

Need to map declaratively to remove this

Convert visualization manager index query to SA Core

Mypy: session is not none

Mypy: type-ignore what requires more refactoring

Mypy: type-ignore hda, ldda attrs: need declarative mapping

Also, minor SA2.0 syntax fix

Mypy: type-ignores to handle late evaluation of relationship arguments

Mypy: type-ignore column property assignments (type is correct)

Mypy: typing errors, misc. fixes

Mypy: all statements are reachable

Mypy: need to map hda declaratively, then its parent is model.Base

Fix typing errors: sharable, secured

Fix package mypy errors

Fix SA2.0 error: celery task

1. In 2.0, when the statement contains "returning", the result type is
   ChunkedIteratorResult, which does not have the rowcount attr,
   becuase:
2. result.rowcount should not be used for statements containting the returning clause

Ref: https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.CursorResult.rowcount

Wrap call to ensure session is closed

Otherwise there's an idle transaction left in the database (+locks)

Ensure session is closed on TS Registry load

Same as prev. commit: otherwise db locks are left

Fix SA2.0 error: list arg to select; mypy

Use NullPool for sqlite engines

This restores the behavior under SQLAlchemy 1.4
(Note that we set the pool for sqlite only if it's not an in-memory db

Help mypy: job is never None
This tested SQLAlchemy:
1. create: dataset, job
2. set: dataset.job = job
3. save, then load dataset by dataset.id; verify that dataset.job = job
Tested SQLAlchemy:
- create job
- set job.tool_id
- save, load job by id, verify job.tool_id
Tests SQLAlchemy:

1. Create tag
2. Verify there's no FooTagAssociation with this tag
3. Add new FooTagAssociation to Foo and save to db
4. Verify there is one FooTagAssociation with this tag
No assert (only TODO); but would test SQLAlchemy
Tests SQLAlchemy, user-history relationship mapping
Tests SQLAlchemy (create foo + set foo.bar + load foo + check bar)
mypy bug workaround no longer needed as we are no longer specifying a metaclass
HasTable was a hack to accommodate declarative mapping without changing
thousands of lines that referred to the `table` attribute of model
instances.

The one type:ignore added to managers.datasets can be removed after we
map DatasetInstance models declaratively.
Uncovered by adding missing type hints to model
Drop syntax that can be inferred from Mapped[] type hint:
- basic datatypes
- mapped_column() expression where the only argument is the datatype

Ref: https://docs.sqlalchemy.org/en/20/changelog/whatsnew_20.html#migrating-an-existing-mapping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer kind/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants