SDK and CLI for CrateDB Cloud Cluster APIs #81

amotl · 2023-11-14T12:28:57Z

About

Wraps a few calls of croud into a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.

Configuration

Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (.env).

CRATEDB_CLOUD_CLUSTER_NAME=Hotzenplotz
CRATEDB_USERNAME='admin'
CRATEDB_PASSWORD='H3IgNXNvQBJM3CiElOiVHuSp6CjXMCiQYhB4I9dLccVHGvvvitPSYr1vTpt4'

Python SDK

Deploy cluster, run database workload, and stop cluster again.

from cratedb_toolkit import ManagedCluster

# Acquire database cluster handle, obtaining cluster identifier
# or name from the user's environment.
cluster = ManagedCluster.from_env().start()

# Run database workload.
cratedb = cluster.get_client_bundle()
results = cratedb.adapter.run_sql("SELECT * from sys.summits LIMIT 2;", records=True)
print(json.dumps(results, indent=2))  # noqa: T201

# Stop cluster again.
cluster.stop()

See also examples/python/cloud_cluster.py and examples/python/cloud_import.py.

CLI interface

Deploy cluster, import data, and query a few samples worth of data.

ctk cluster start
ctk load table "https://github.com/crate/cratedb-datasets/raw/main/machine-learning/timeseries/nab-machine-failure.csv"
ctk shell --command 'SELECT * FROM "nab-machine-failure" LIMIT 10;'

See also examples/shell/cloud_cluster.sh and examples/shell/cloud_import.sh.

References

Backlog

Software tests
Documentation

amotl · 2023-11-14T12:42:46Z

cratedb_toolkit/cluster/croud.py

+        # TODO: `--product-name=crfree` is not always the right choice. ;]
+        # TODO: How to select CrateDB nightly, like `--version=nightly`?
+        # TODO: Add more parameters, like `--org-id`, `--channel`, `--unit`, and more.
+        # TODO: What about `--sudo`?


A few spots to be addressed on another iteration.

codecov · 2023-11-15T07:38:41Z

Codecov Report

Attention: Patch coverage is 78.34395% with 136 lines in your changes missing coverage. Please review.

Project coverage is 83.10%. Comparing base (8bbbaff) to head (0eabc99).

Files	Patch %	Lines
cratedb_toolkit/testing/testcontainers/util.py	31.25%	22 Missing ⚠️
cratedb_toolkit/api/main.py	85.03%	19 Missing ⚠️
cratedb_toolkit/cluster/croud.py	76.71%	17 Missing ⚠️
cratedb_toolkit/util/runtime.py	46.87%	17 Missing ⚠️
cratedb_toolkit/util/setting.py	71.18%	17 Missing ⚠️
cratedb_toolkit/io/croud.py	83.60%	10 Missing ⚠️
cratedb_toolkit/cluster/cli.py	65.00%	7 Missing ⚠️
cratedb_toolkit/config.py	93.18%	3 Missing ⚠️
cratedb_toolkit/io/cli.py	76.92%	3 Missing ⚠️
cratedb_toolkit/testing/testcontainers/cratedb.py	25.00%	3 Missing ⚠️
... and 9 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
+ Coverage   81.17%   83.10%   +1.92%     
==========================================
  Files          73       84      +11     
  Lines        2826     3409     +583     
==========================================
+ Hits         2294     2833     +539     
- Misses        532      576      +44

Flag	Coverage Δ
influxdb	`28.58% <9.10%> (-6.66%)`	⬇️
main	`62.56% <44.74%> (-6.79%)`	⬇️
mongodb	`47.81% <43.64%> (+3.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

amotl · 2023-11-17T20:52:49Z

cratedb_toolkit/cluster/croud.py

+        # FIXME: Fix `croud clusters deploy`.
+        #        It yields *two* payloads to stdout, making it
+        #        unusable in JSON-capturing situations.
+        # The main advantage of the `JSONDecoder` class is that it also provides
+        # a `.raw_decode` method, which will ignore extra data after the end of the JSON.
+        # https://stackoverflow.com/a/75168292
+        payload = wr.invoke()
+        decoder = json.JSONDecoder()
+        data = decoder.raw_decode(payload)


Observation

This works around a minor flaw of croud, which yields a JSON output two times, rendering the output not parseable.

Suggestion

croud should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy.

amotl · 2023-11-17T20:55:14Z

cratedb_toolkit/io/croud.py

+    def fix_job_info_table_name(self):
+        """
+        Adjust full-qualified table name by adding appropriate quotes.
+        Fixes a minor flaw on the upstream API.
+
+        Currently, the API returns `testdrive.pems-1`, but that can not be used at
+        all, because it is not properly quoted. It also can not be used 1:1, because
+        it is not properly quoted.
+
+        So, converge the table name into `"testdrive"."pems-1"` manually, for a
+        full-qualified representation.
+
+        FIXME: Remove after upstream has fixed the flaw.
+        """
+        job_info = self.info
+        if "destination" in job_info and "table" in job_info["destination"]:
+            table = job_info["destination"]["table"]
+            if '"' not in table and "." in table:
+                schema, table = table.split(".")
+                table = f'"{schema}"."{table}"'
+                job_info["destination"]["table"] = table


Observation

This works around another minor upstream flaw.

Suggestion

The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table attribute so that it can be re-used without further ado.

Example

{ "destination": { "table": "\"testdrive\".\"pems-1\"" } }

amotl · 2023-11-17T21:01:02Z

cratedb_toolkit/config.py

+def configure_croud(no_spinner: bool = None, use_spinner: bool = None):
+    """
+    Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well.
+
+    - https://github.com/ManrajGrover/halo/issues/32
+    - https://github.com/manrajgrover/halo/issues/179
+    """
+    if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner):
+        mod = types.ModuleType(
+            "croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner"
+        )
+        setattr(mod, "HALO", NoopContextManager())  # noqa: B010
+        sys.modules["croud.tools.spinner"] = mod
+
+
+class NoopContextManager:
+    """
+    For making the Halo progressbar a no-op.
+    """
+
+    def __init__(self, *args, **kwargs):
+        pass
+
+    def __enter__(self):
+        pass
+
+    def __exit__(self, exc_type, exc_value, exc_traceback):
+        pass
+
+    def stop(self):
+        pass


Observation

We had to turn off the Halo spinner currently used in croud, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.

Jupyter Notebooks don't support Halo manrajgrover/halo#32

BUG: Halo.__init__.<locals>.clean_up() takes 0 positional arguments but 1 was given manrajgrover/halo#179

Suggestion

Submit a patch to croud to only use interactivity when is_tty() is True, or such. At least, don't start the HALO at module-scope level, but initialize/configure it at runtime instead.

amotl · 2023-11-20T22:17:21Z

examples/python/cloud_cluster.py

+    # Log in to CrateDB Cloud.
+    croud login --idp azuread


Observation

There is an alternative, headless way of doing that: Using croud config show, you can display the location of the croud configuration file in YAML format.

current-profile: cratedb.cloud default-format: table profiles: cratedb.cloud: auth-token: xxxxxxxxxx endpoint: https://console.cratedb.cloud key: REDACTED organization-id: null region: _any_ secret: xxxxxxxxxx

If you fill in the key and secret values, obtained by running croud api-keys create, to create an API key ¹, operations like croud clusters list will start working without further ado, even after logging out again using croud logout.

It works well. Thanks, @proddata.

Suggestion

Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_KEY and CRATEDB_CLOUD_SECRET.

Footnotes

Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings ↩

amotl · 2023-11-20T22:17:59Z

examples/python/cloud_cluster.py

+    cratedb_toolkit.configure(
+        runtime_errors="exit",
+        settings_accept_cli=True,
+        settings_accept_env=True,
+        settings_errors="exit",
+    )


This may eventually be refactored into a context manager.

amotl · 2023-11-20T22:20:32Z

tests/cluster/test_examples_python.py

+@pytest.mark.skip(
+    "Does not work: Apparently, the 'responses' mockery is not properly activated when evaluating the notebook"
+)
+@responses.activate
+def test_example_cloud_import_notebook(mocker, mock_cloud_cluster_exists):
+    """
+    Verify the Jupyter Notebook example works.
+    """


In the meanwhile, we discovered a solution based on testbook, which will also let you execute a notebook on behalf of a pytest test case, but also use corresponding mocking to manage the environment.

Applying that here will make it possible to actually unskip the test case.

It is just a rough sketch, which barely works.

- Naming things: Use more appropriate names for variables. - OO: Add `EnvironmentConfiguration`, to manage information about the toolkit environment. - OO: Add `CloudJob` entity, to manage information about a cloud job, and to streamline passing of information. - OO: Refactor low-level methods to `CloudCluster`. - Capability to configure error handling based on the environment. - Capability to conveniently acquire configuration settings from the environment, for obtaining the CrateDB Cluster identifier or name. - Tests: Add `DockerSkippingContainer` and `PytestTestcontainerAdapter`. Both skip test execution when the Docker daemon is not running, or not available to the environment. - Examples: Improve UX of `cloud_*.py` example programs.

It can be used to acquire corresponding client handles (adapter, dbapi, sqlalchemy), in order to communicate with the database.

When not logged in using `croud login`, the "create project" operation needs an organization identifier.

amotl commented Nov 14, 2023

View reviewed changes

amotl force-pushed the amo/cloud-second branch 3 times, most recently from 89f8c3a to f794fd9 Compare November 15, 2023 12:19

amotl mentioned this pull request Nov 16, 2023

[FALSE POSITIVE] "S608 Possible SQL injection" when not applicable astral-sh/ruff#8717

Open

amotl force-pushed the amo/cloud-second branch from 119c7ac to a9f4d82 Compare November 16, 2023 13:10

amotl marked this pull request as ready for review November 16, 2023 13:14

amotl commented Nov 17, 2023

View reviewed changes

amotl force-pushed the amo/cloud-second branch 3 times, most recently from 062ee9b to 7d89748 Compare November 18, 2023 01:56

amotl commented Nov 20, 2023

View reviewed changes

This was referenced Dec 5, 2023

Timeseries: Add notebook about using CrateDB Cloud with Datashader crate/cratedb-examples#186

Draft

[LIB] Improve UX for ad hoc applications #92

Open

amotl mentioned this pull request Feb 5, 2024

Support for in-kernel patching? chrisjsewell/pytest-notebook#26

Open

Cloud API: Add wrapper around croud clusters deploy

82d4659

It is just a rough sketch, which barely works.

amotl force-pushed the amo/cloud-second branch 2 times, most recently from da905dc to 8ca895a Compare July 6, 2024 10:26

amotl added 11 commits July 6, 2024 12:46

Settings: Python 3.8 does not have ArgumentParser(exit_on_error=False)

3d23652

Cloud API: Add ManagedCluster.get_client_bundle method

a697794

It can be used to acquire corresponding client handles (adapter, dbapi, sqlalchemy), in order to communicate with the database.

Chore: Satisfy linter

1babc74

Cloud API: Use CRATEDB_CLOUD_SUBSCRIPTION_ID environment variable

bdafdba

Cloud API: Add software tests for cluster deployment conversation

b2e96b3

Cloud API: Use environment variable CRATEDB_CLOUD_ORGANIZATION_ID

c7323d0

When not logged in using `croud login`, the "create project" operation needs an organization identifier.

Cloud API: Trim options/settings parsing, improve object model

7a7d99f

Cloud API: Improve environment handling for Jupyter Notebooks

072528f

Cloud API: Demonstrate cluster mgmt. on behalf of a Jupyter Notebook

6b4a34e

Cloud API: Add example programs written in Bash

5ac8436

amotl added 3 commits July 6, 2024 12:46

Cloud API: Clean up API and utility functions

7598e07

Cloud API: Update changelog

34f2291

Cloud API: Improve testing

0eabc99

amotl force-pushed the amo/cloud-second branch from 8ca895a to 0eabc99 Compare July 6, 2024 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK and CLI for CrateDB Cloud Cluster APIs #81

SDK and CLI for CrateDB Cloud Cluster APIs #81

amotl commented Nov 14, 2023 •

edited

Loading

amotl Nov 14, 2023 •

edited

Loading

codecov bot commented Nov 15, 2023 •

edited

Loading

amotl Nov 17, 2023

amotl Nov 17, 2023

amotl Nov 17, 2023 •

edited

Loading

amotl Nov 20, 2023 •

edited

Loading

amotl Nov 20, 2023

amotl Nov 20, 2023 •

edited

Loading

SDK and CLI for CrateDB Cloud Cluster APIs #81

Are you sure you want to change the base?

SDK and CLI for CrateDB Cloud Cluster APIs #81

Conversation

amotl commented Nov 14, 2023 • edited Loading

About

Configuration

Python SDK

CLI interface

References

Backlog

amotl Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 15, 2023 • edited Loading

Codecov Report

amotl Nov 17, 2023

Choose a reason for hiding this comment

Observation

Suggestion

amotl Nov 17, 2023

Choose a reason for hiding this comment

Observation

Suggestion

Example

amotl Nov 17, 2023 • edited Loading

Choose a reason for hiding this comment

Observation

Suggestion

amotl Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Observation

Suggestion

Footnotes

amotl Nov 20, 2023

Choose a reason for hiding this comment

amotl Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

amotl commented Nov 14, 2023 •

edited

Loading

amotl Nov 14, 2023 •

edited

Loading

codecov bot commented Nov 15, 2023 •

edited

Loading

amotl Nov 17, 2023 •

edited

Loading

amotl Nov 20, 2023 •

edited

Loading

amotl Nov 20, 2023 •

edited

Loading