Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor: Add
ruff
rules, improve type annotations, improve ci perf…
…ormance (#3431) * ci: Add additional `ruff` rules to `pyproject.toml` Only highly autofixable groups from [#3430](#3430). Aiming to refine further, depending on how much manual fixing this generates. * fix: add item to `pyproject.toml` to silence import errors for `pylance|pyright` * ci: add subset of `SIM` to `extend-safe-fixes` * ci: add unfixable `RUF` rules to `ignore` [RUF002](https://docs.astral.sh/ruff/rules/ambiguous-unicode-character-docstring/) may be avoidable during schema generation, so I'm not doing any manual fixes right now. * refactor: apply new `ruff` rules, fix and reformat * test: Skip tests on Win that require a tz database See apache/arrow#36996 * ci: enable `tool.ruff.lint.preview` Testing the effect this has with existing rules, prior to adding `pylint`, `refurb` * fix: replace [F841](https://docs.astral.sh/ruff/rules/unused-variable/) violations with dummy variable * ci: add additional `preview` fixes to `extend-safe-fixes` * refactor: apply `preview` fixes for existing `ruff` rules, reformat * ci: add `preview` category `FURB` rules Almost all have autofixes, splitting out from [pylint](https://docs.astral.sh/ruff/rules/#pylint-pl) which is much larger in scope + fewer fixes * refactor: apply `FURB` rule fixes, manually fix `FURB101/3` * fix: Revert newer fstring syntax, not available to `sphinx` * ci: add fixable `pylint` rules Across the 4 `PL.` categories, there are over 50 rules, with a mix of fixable, preview. Tried to be picky with what is added, because adding even 1 of the groups would generate a lot of manual fixes. * ci: add `pylint` fixes to `extend-safe-fixes` * refactor: apply `pylint` rule fixes, add an inline optimization for `_selection` `param_kwds` in `_selection` triggered `PLR6201`. Instead rewrote as a dictcomp. * fix: Recover comments lost during linting #3431 (comment) * fix: Replace sources of `RUF002` violations * fix: manual fix `RUF002` in `api` * ci: add `PTH` rules `flake8-use-pathlib` rules all require manual fixes. > Found 70 errors. <details> <summary>Details</summary> ```md 20 PTH118 `os.path.join()` should be replaced by `Path` with `/` operator 12 PTH120 `os.path.dirname()` should be replaced by `Path.parent` 11 PTH100 `os.path.abspath()` should be replaced by `Path.resolve()` 11 PTH123 `open()` should be replaced by `Path.open()` 7 PTH110 `os.path.exists()` should be replaced by `Path.exists()` 6 PTH107 `os.remove()` should be replaced by `Path.unlink()` 3 PTH103 `os.makedirs()` should be replaced by `Path.mkdir(parents=True)` ``` * refactor: Manually fix `PTH` rule violations * fix: Use safer `inspect.getattr_static` in `update_init_file` Avoids [triggering](https://docs.python.org/3/library/inspect.html#inspect.getattr_static) on objects like `datum`, `expr` * fix: Use correct f-string `!s` modifier * fix: Resolve `doc:build-html` error ```log Extension error (sphinxext.altairgallery): Handler <function main at 0x7f0873471ab0> for event 'builder-inited' threw an exception (exception: unsupported operand type(s) for +: 'PosixPath' and 'str') ``` * fix: Resolve utf-8 error for [emoji example](https://altair-viz.github.io/gallery/isotype_emoji.html) * Update sphinxext/altairgallery.py Co-authored-by: Stefan Binder <binder_stefan@outlook.com> * build: bump `docbuild.yml` python `3.10` -> `3.12` See @binste [comment](#3431 (comment)) * style: Simplify `PluginRegistry` repr #3431 (comment) * revert: ignore `suppressible-exception` rule in `pyproject.toml` #3431 (comment) * refactor: remove need for exception handling in `utils._vegafusion_data.get_inline_tables` I'm happy to revert back to the original `try-catch`, but think it was less both less efficient and understandable. If I've understood correctly: - Every time this was called, all tables would be iterated over - On the first call, tables processed by the `vegafusion_data_transformer` would popped and returned - On all calls, other tables would be ignored but always trigger exceptions - On any call other than the first, every table triggers an exception This behaviour is easily reduced using [set methods](https://docs.python.org/3/library/stdtypes.html#set) directly, as an empty `set` would produce an empty `dict` during the `dictcomp`. #3431 (comment) * refactor: remove `python<3.3` compat code in `utils.core.use_signature` Was replaced with `contextlib.suppress` in a3fd77a but targets long-unsupported python versions. For future reference, there are also related `stdlib` functions that may have not been available when this was written: - [inspect.getdoc](https://docs.python.org/3/library/inspect.html#inspect.getdoc) - [functools.wraps](https://docs.python.org/3/library/functools.html#functools.wraps) Wouldn't cover everything there, but may replace the need for some parts. #3431 (comment) * revert: replace `PLW1514` fixes with "utf-8" The rule remains in `pyproject.toml`, but is no longer autofixed with undesirable `locale.getprefferedencoding(False)`. #3431 (comment) * refactor: minor simplify and sort `__all__` from `generate_schema_wrapper` * docs: `FA100` on most of `tools/`*` The start of a cautious introduction of `from __future__ import annotations`. Excludes `schemaapi.schemaapi.py` for now, but I think that would be safe as well. * docs: `FA100` on `sphinxext/*` * docs: `FA100` on `alt.jupyter` * docs: `FA100` on `alt.expr` * docs: `FA100` on `alt.vegalite`, excluding `v5.api`, `v5.schema/*` - Covers non-autogenerated code. - `api` is being special-cased for now. - Will be a great candidate for these fixes, but there are many simpler modules in `alt.utils` that I'm priotizing in case they highlight issues * docs: `FA100` on private `alt.utils` modules * docs(typing): Add annotations for `sphinxext` - Making use of `FA100` enabled syntax - Eventually would like to improve the performance of this, since `sphinxext` is by far the longest build step - Towards that end, the annotations give a contextual starting point * docs: `FA100` on public `alt.utils` modules, excluding `schemapi` - Also did made some improvements to annotations more broadly, now that the newer syntax is allowed * ci: update `pyproject.toml`, add new granular `tool.ruff.lint.per-file-ignores` setting - So far been making the changes one file at a time, followed by tests - Not encountered any runtime issues - `mypy` has complained occasionally, but has been resolvable * revert: Change `write_file_or_filename` default `encoding` to `None` * docs: `FA100` on `tools.schemapi.schemapi` Also adds some annotations that are now possible without circular import concerns * feat(typing): add `from __future__ import annotations` for each file in `generate_schema_wrapper` Also: - ensure `annotations` excluded from `__all__` - minor standardizing import order - adjust `ruff` rules to correct schemagen * ci(typing): adds `generate-schema-wrapper` script to `hatch` Rewrites and reformats all annotations in `altair/vegalite/v?/schema` - Meaning that the original code stays the same: ``` >>> ruff check altair/vegalite/v?/schema --fix --exit-zero Found 13006 errors (13006 fixed, 0 remaining). >>> ruff format altair/vegalite/v?/schema 4 files reformatted >>> ruff check altair/vegalite/v?/schema --statistics ``` * refactor(typing): Improve annotations and narrowing in `schemapi` - Changes are mainly on `_FromDict.from_dict` - Avoids type errors due to multiple reassignments - Adds a range of overloads that preserve the `SchemaBase` subclass - Not possible in all cases - replacing `cls` with `tp` - when used in an instance method - The former is quite strange to see outside of a `@classmethod` - Especially in a class that has a `@classmethod` - updated `_default_wrapper_classes` to match `_subclasses` return type in `generate_schema_wrapper` * build: run `generate-schema-wrapper` The cumulative effect of `FA100`, `UP`, `TCH` rules applied to `v5.schema`. All tests pass! * revert(typing): Roll back runtime evaluated types not possible on older python `from __future__ import annotations` applies in annotation scope, but not in aliases. Until `altair` requires `python>=3.9-10`, some syntax will need to differ - like many of the changes seen in this commit. # Note The current `ruff` config will only upgrade syntax in annotation scope, until `tool.ruff.target-version` "unlocks" new features. These were changes I made manually, not having tested against all versions - which are being reverted. * fix(typing): Adds overloads and reduce ignore comments relating to `spec_to_mimebundle` The repetition in `utils.save` is unfortunate, but is required to satisfy mypy. The issues are from overloading `spec_to_mimebundle` with `tuple/dict` as each has different indexing constraints. A longer term solution would be to adding an intermediate function to return the nested value. Due to `spec_to_mimebundle`'s use elsewhere operating directly on the full output. * revert(typing): Roll back additional runtime evaluated types for `python=3.8` * fix(typing): Use `...` for `DataTransformerType` Original `pass` keyword was equivalent to a `None` return * docs(typing): `FA100` on `alt.vegalite.v5.api` and manual annotation rewrites ### `api` A unique, but significant, difference to the other commits is the large number of module prefixed annotations that can be shortened. `TopLevelMixin.project` shows the greatest reduction in combined characters for annotations, but there are small improvements throughout. I've made all of these `TYPE_CHECKING`-only imports explicit, to highlight a new path forward this opens up. There are many repeat/similar `Union` types, which can now be declared in another module - without concern for circular import issues. ### `v5.__init__` These changes have also removed the need for both the import of `expr` in `api`, and the `# type: ignore[no-redef]` in `v5.__init__`. * test: add pytest rules `PT` and fix `PT001` violations https://docs.astral.sh/ruff/rules/pytest-fixture-incorrect-parentheses-style/ * test: add ignores for `PT011` on existing tests Probably useful to enforce on new code, https://docs.astral.sh/ruff/rules/pytest-raises-too-broad/ * test: fix `PT018` violation https://docs.astral.sh/ruff/rules/pytest-composite-assertion/ * test: fix `PT006` violations https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/ * test: add ignore for `PT012` violation Not sure how this could be rewritten. https://docs.astral.sh/ruff/rules/pytest-raises-with-multiple-statements/ * test: add `pytest.mark.xfail` for flaky `scatter_with_layered_histogram` test Struggling to find the source of the failure, the `mark` is my best guess but currently can't reproduce the following error: ``` ___________________________________________ test_compound_chart_examples[False-scatter_with_layered_histogram.py-all_rows18-all_cols18] ___________________________________________ [gw2] win32 -- Python 3.8.19 C:\Users\*\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\Scripts\python.exe filename = 'scatter_with_layered_histogram.py', all_rows = [2, 17], all_cols = [['gender'], ['__count']], to_reconstruct = False @pytest.mark.skipif(vf is None, reason="vegafusion not installed") # fmt: off @pytest.mark.parametrize("filename,all_rows,all_cols", [ ("errorbars_with_std.py", [10, 10], [["upper_yield"], ["extent_yield"]]), ("candlestick_chart.py", [44, 44], [["low"], ["close"]]), ("co2_concentration.py", [713, 7, 7], [["first_date"], ["scaled_date"], ["end"]]), ("falkensee.py", [2, 38, 38], [["event"], ["population"], ["population"]]), ("heat_lane.py", [10, 10], [["bin_count_start"], ["y2"]]), ("histogram_responsive.py", [20, 20], [["__count"], ["__count"]]), ("histogram_with_a_global_mean_overlay.py", [9, 1], [["__count"], ["mean_IMDB_Rating"]]), ("horizon_graph.py", [20, 20], [["x"], ["ny"]]), ("interactive_cross_highlight.py", [64, 64, 13], [["__count"], ["__count"], ["Major_Genre"]]), ("interval_selection.py", [123, 123], [["price_start"], ["date"]]), ("layered_chart_with_dual_axis.py", [12, 12], [["month_date"], ["average_precipitation"]]), ("layered_heatmap_text.py", [9, 9], [["Cylinders"], ["mean_horsepower"]]), ("multiline_highlight.py", [560, 560], [["price"], ["date"]]), ("multiline_tooltip.py", [300, 300, 300, 0, 300], [["x"], ["y"], ["y"], ["x"], ["x"]]), ("pie_chart_with_labels.py", [6, 6], [["category"], ["value"]]), ("radial_chart.py", [6, 6], [["values"], ["values_start"]]), ("scatter_linked_table.py", [392, 14, 14, 14], [["Year"], ["Year"], ["Year"], ["Year"]]), ("scatter_marginal_hist.py", [34, 150, 27], [["__count"], ["species"], ["__count"]]), ("scatter_with_layered_histogram.py", [2, 17], [["gender"], ["__count"]]), ("scatter_with_minimap.py", [1461, 1461], [["date"], ["date"]]), ("scatter_with_rolling_mean.py", [1461, 1461], [["date"], ["rolling_mean"]]), ("seattle_weather_interactive.py", [1461, 5], [["date"], ["__count"]]), ("select_detail.py", [20, 1000], [["id"], ["x"]]), ("simple_scatter_with_errorbars.py", [5, 5], [["x"], ["upper_ymin"]]), ("stacked_bar_chart_with_text.py", [60, 60], [["site"], ["site"]]), ("us_employment.py", [120, 1, 2], [["month"], ["president"], ["president"]]), ("us_population_pyramid_over_time.py", [19, 38, 19], [["gender"], ["year"], ["gender"]]), ]) # fmt: on @pytest.mark.parametrize("to_reconstruct", [True, False]) def test_compound_chart_examples(filename, all_rows, all_cols, to_reconstruct): source = pkgutil.get_data(examples_methods_syntax.__name__, filename) chart = eval_block(source) if to_reconstruct: # When reconstructing a Chart, Altair uses different classes # then what might have been originally used. See # vega/vegafusion#354 for more info. chart = alt.Chart.from_dict(chart.to_dict()) dfs = chart.transformed_data() if not to_reconstruct: # Only run assert statements if the chart is not reconstructed. Reason # is that for some charts, the original chart contained duplicated datasets # which disappear when reconstructing the chart. assert len(dfs) == len(all_rows) for df, rows, cols in zip(dfs, all_rows, all_cols): > assert len(df) == rows E assert 19 == 17 E + where 19 = len( bin_step_5_age bin_step_5_age_end gender __count\n0 45.0 50.0 M 247\n1 ... 11\n17 30.0 35.0 F 5\n18 70.0 75.0 F 1) tests\test_transformed_data.py:132: AssertionError ``` * ci: tidy up `ruff` section of `pyproject.toml` - removed `SIM` rules from `extend-safe-fixes`, only 1 required it and manual fixes should be fine for new code - Provided links for new config groups/settings #3431 (comment) * perf: adds config `pyproject.toml` for faster build, test runs - `pip` -> `uv` - `pytest` single-core -> logical max https://hatch.pypa.io/latest/how-to/environment/select-installer/#enabling-uv https://pytest-xdist.readthedocs.io/en/latest/distribution.html#running-tests-across-multiple-cpus Should speed up CI, although GitHub Actions only runs on 4 cores iirc * ci: adds `update-init-file` hatch script * ci: adds newer-style `hatch` test config I've attempted to recreate the existing `test` scripts, but haven't quite worked out the `coverage` parts. This provides: matrix python version testing, parallel tests (within a python version), and slightly shorter commands: ```bash >>> hatch run test >>> hatch test >>> hatch run test-coverage >>> hatch test --cover ``` * feat(typing): Use `TYPE_CHECKING` block for `schema` modules - Removes the need for `_Parameter` protocol - Reduce lines of code by more than 5000, by removing prefix for `SchemaBase`, `Parameter` * test: use a more reliable `xfail` condition for flaky test The difficulty in reproducing came from the python debugger disabling `xdist`. * build: Embed extra `ruff` calls in `generate-schema-wrapper` into the python script Hopefully solves: https://github.com/vega/altair/actions/runs/9456437889/job/26048327764?pr=3431 * build: Manually rewrite certain exceptions with flaky autofix These were previously fixed by `EM` rules, but seem to not always occur - leading to CI fail https://github.com/vega/altair/actions/runs/9466550650/job/26078482982#step:8:60 * refactor: remove now-unneeded `PARAMETER_PROTOCOL` * refactor: replace existing references to `typing.Optional` * refactor(typing): rename `T` TypeVar to `TSchemaBase` * feat(typing): Adds dedicated `Optional` alias for `Union[..., UndefinedType]` `typing.Optional` is deprecated and is no longer needed in `altair`. The new alias improves the readability of generated signatures, by reducing noise (repeating `UndefinedType`) and clearly grouping together the options for an argument. * refactor(typing): Remove `UndefinedType` dependency in `api` * refactor(typing): Remove `UndefinedType` dependency in `api` * refactor(typing): Remove annotation scope `UndefinedType` dependency in `api` Aligns `api` with `schema` changes in fbc02e2 * ci: add `W291` to ensure trailing whitespace is autofixed * refactor: define non-relevant attributes closer to imports in `update_init_file` Will make it easier to keep the two in sync, and now they can be copy/pasted - rather than adding an `or attr is _` expression * feat(typing): Adds `_TypeAliasTracer` and reorders `generate_schema_wrapper` to collect unique `Literal` types The next commit will show the changes from build * build: run `generate-schema-wrapper` using `_TypeAliasTracer` This commit reduces the length of each module in `schema` by ~50-70% and provides the each alias in the new `_typing` module. ## Existing art [`pandas`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L199) and [`polars`](https://github.com/pola-rs/polars/blob/faf0f061eead985a8b29e4584b619fb5d8647e31/py-polars/polars/dataframe/frame.py#L105) similarly define a large number of `TypeAlias`, `Union`, `Literal` types in a single file. These definitions can be imported anywhere in the project - without concern for circular dependencies. It helps keep signatures consistent, and in many cases, simplifies those signatures such that the user can quickly understand what they represent. ## Future config The generated names are currently: ```py f"{SchemaInfo.title}_T" ``` so as to not conflict with the names defined in `core`/`channels`. I have left this configurable via: ```py _TypeAliasTracer(fmt=...) ``` Personally don't mind what the aliases are called, so if anyone has any ideas please feel free to change. * fix: Add missing `LiteralString` import * test: add `pytest.mark.filterwarnings` for tests that cannot avoid them - `pandas` warnings cannot trivially be avoided, as they need to be compatible with `pandas~=0.25` and are used in examples. - I have been unable to track down what specific calls are triggering the `traitlets` warnings. There seems to be multiple levels of indirection, but the stacklevel reported is external to `altair`. * refactor(typing): Replace `Literal[None]` -> `None` Came across [Literal[None] conversion](https://typing.readthedocs.io/en/latest/spec/literal.html#legal-parameters-for-literal-at-type-check-time) during #3426 (comment) * test: Change `skipif` -> `xfail` for `test_sanitize_pyarrow_table_columns` on `win32` Seems to be a better fit, considering this would fail for `win32` devs not using `pyarrow` but have it installed. This could be removed in the future if installing/confirming install `tzdata` to the correct location was handled during environment activation. Possibly resolves: #3431 (comment) * ci: bump `actions/setup-python`, `python-version`, use `uv` in `lint.yml` I've made quite a few improvements to performance locally like fd20f00, but only `docbuild.yml` uses `hatch` so they haven't reduced CI run time. This is more of a test run on the simplest workflow to identify any issues. * ci: create venv before uv pip install * ci: ensure venv is activated after install * ci: use environment provided by `hatch` * ci: testing `pytest-xdist` in `build` workflow Would be faster if using `uv`/`hatch` instead of `pip` - but I'm not sure how to untangle `pip` from this yet. * test(perf): adds significant parallelism for slow `test_examples` #3431 (comment) * refactor, perf: reduce `sys.path` usage, use dictcomp in `update_init_file` # `sys.path` Mainly modules in `tools`, but also `tests`, `sphinxext` repeat the same pattern of manipulating `sys.path` to find imports. In some cases, I think this wasn't need at all - but others were resolves by adding `tools.__init__.py`. - Aligns with many of the other sub/packages - Removes need for ignoring `E402` - Makes many of the imports in these modules explicit, and therefore easier to follow. # `update_init_file` While rewriting the imports here, I saw an opportunity to simplify gathering relevant attributes. - `attr = getattr(alt, attr_name)` using the keys of `__dict__`, was a slow equivalent of a dictionary comprehension. - `str(Path(alt.__file__).parent)` is a verbose `str(Path.cwd())` - An earlier commit of mine used this instead of `os.path` - This version is much shorter and obvious in what it does * build: adding debug message to help with build failure The previous run worked locally due to `hatch` managing `sys.path` - but not in CI, which doesn't use hatch yet https://github.com/vega/altair/actions/runs/9527747043/job/26264700374 * build: add cwd to debug message * fix: possibly fix cwd not on path * ci: testing invoking script with `-m` for cwd * fix: remove debug code * fix: resolve lint, format, type conflicts following rebase Fixes https://github.com/vega/altair/actions/runs/9581196541/job/26417448238?pr=3431 * DO NOT MERGE - TESTING DOC PERF * revert: undo last commit Intended to discard the changes, but accidentally pushed * refactor: Remove commented out dead code Fixes #3431 (comment) * ci: Remove extra whitespace in `build.yml` Fixes #3431 (comment) --------- Co-authored-by: Stefan Binder <binder_stefan@outlook.com>
- Loading branch information