Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK and CLI for CrateDB Cloud Cluster APIs #81

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

amotl
Copy link
Member

@amotl amotl commented Nov 14, 2023

About

Wraps a few calls of croud into a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.

Configuration

Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (.env).

CRATEDB_CLOUD_CLUSTER_NAME=Hotzenplotz
CRATEDB_USERNAME='admin'
CRATEDB_PASSWORD='H3IgNXNvQBJM3CiElOiVHuSp6CjXMCiQYhB4I9dLccVHGvvvitPSYr1vTpt4'

Python SDK

Deploy cluster, run database workload, and stop cluster again.

from cratedb_toolkit import ManagedCluster

# Acquire database cluster handle, obtaining cluster identifier
# or name from the user's environment.
cluster = ManagedCluster.from_env().start()

# Run database workload.
cratedb = cluster.get_client_bundle()
results = cratedb.adapter.run_sql("SELECT * from sys.summits LIMIT 2;", records=True)
print(json.dumps(results, indent=2))  # noqa: T201

# Stop cluster again.
cluster.stop()

See also examples/python/cloud_cluster.py and examples/python/cloud_import.py.

CLI interface

Deploy cluster, import data, and query a few samples worth of data.

ctk cluster start
ctk load table "https://github.com/crate/cratedb-datasets/raw/main/machine-learning/timeseries/nab-machine-failure.csv"
ctk shell --command 'SELECT * FROM "nab-machine-failure" LIMIT 10;'

See also examples/shell/cloud_cluster.sh and examples/shell/cloud_import.sh.

References

Backlog

  • Software tests
  • Documentation

Comment on lines 74 to 107
# TODO: `--product-name=crfree` is not always the right choice. ;]
# TODO: How to select CrateDB nightly, like `--version=nightly`?
# TODO: Add more parameters, like `--org-id`, `--channel`, `--unit`, and more.
# TODO: What about `--sudo`?
Copy link
Member Author

@amotl amotl Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few spots to be addressed on another iteration.

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

Attention: Patch coverage is 78.34395% with 136 lines in your changes missing coverage. Please review.

Project coverage is 83.10%. Comparing base (8bbbaff) to head (0eabc99).

Files Patch % Lines
cratedb_toolkit/testing/testcontainers/util.py 31.25% 22 Missing ⚠️
cratedb_toolkit/api/main.py 85.03% 19 Missing ⚠️
cratedb_toolkit/cluster/croud.py 76.71% 17 Missing ⚠️
cratedb_toolkit/util/runtime.py 46.87% 17 Missing ⚠️
cratedb_toolkit/util/setting.py 71.18% 17 Missing ⚠️
cratedb_toolkit/io/croud.py 83.60% 10 Missing ⚠️
cratedb_toolkit/cluster/cli.py 65.00% 7 Missing ⚠️
cratedb_toolkit/config.py 93.18% 3 Missing ⚠️
cratedb_toolkit/io/cli.py 76.92% 3 Missing ⚠️
cratedb_toolkit/testing/testcontainers/cratedb.py 25.00% 3 Missing ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
+ Coverage   81.17%   83.10%   +1.92%     
==========================================
  Files          73       84      +11     
  Lines        2826     3409     +583     
==========================================
+ Hits         2294     2833     +539     
- Misses        532      576      +44     
Flag Coverage Δ
influxdb 28.58% <9.10%> (-6.66%) ⬇️
main 62.56% <44.74%> (-6.79%) ⬇️
mongodb 47.81% <43.64%> (+3.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +146 to +154
# FIXME: Fix `croud clusters deploy`.
# It yields *two* payloads to stdout, making it
# unusable in JSON-capturing situations.
# The main advantage of the `JSONDecoder` class is that it also provides
# a `.raw_decode` method, which will ignore extra data after the end of the JSON.
# https://stackoverflow.com/a/75168292
payload = wr.invoke()
decoder = json.JSONDecoder()
data = decoder.raw_decode(payload)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

This works around a minor flaw of croud, which yields a JSON output two times, rendering the output not parseable.

Suggestion

croud should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy.

Comment on lines 66 to 86
def fix_job_info_table_name(self):
"""
Adjust full-qualified table name by adding appropriate quotes.
Fixes a minor flaw on the upstream API.

Currently, the API returns `testdrive.pems-1`, but that can not be used at
all, because it is not properly quoted. It also can not be used 1:1, because
it is not properly quoted.

So, converge the table name into `"testdrive"."pems-1"` manually, for a
full-qualified representation.

FIXME: Remove after upstream has fixed the flaw.
"""
job_info = self.info
if "destination" in job_info and "table" in job_info["destination"]:
table = job_info["destination"]["table"]
if '"' not in table and "." in table:
schema, table = table.split(".")
table = f'"{schema}"."{table}"'
job_info["destination"]["table"] = table
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

This works around another minor upstream flaw.

Suggestion

The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table attribute so that it can be re-used without further ado.

Example

{
  "destination": {
    "table": "\"testdrive\".\"pems-1\""
  }
}

Comment on lines +67 to +97
def configure_croud(no_spinner: bool = None, use_spinner: bool = None):
"""
Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well.

- https://github.com/ManrajGrover/halo/issues/32
- https://github.com/manrajgrover/halo/issues/179
"""
if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner):
mod = types.ModuleType(
"croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner"
)
setattr(mod, "HALO", NoopContextManager()) # noqa: B010
sys.modules["croud.tools.spinner"] = mod


class NoopContextManager:
"""
For making the Halo progressbar a no-op.
"""

def __init__(self, *args, **kwargs):
pass

def __enter__(self):
pass

def __exit__(self, exc_type, exc_value, exc_traceback):
pass

def stop(self):
pass
Copy link
Member Author

@amotl amotl Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

We had to turn off the Halo spinner currently used in croud, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.

Suggestion

Submit a patch to croud to only use interactivity when is_tty() is True, or such. At least, don't start the HALO at module-scope level, but initialize/configure it at runtime instead.

@amotl amotl force-pushed the amo/cloud-second branch 3 times, most recently from 062ee9b to 7d89748 Compare November 18, 2023 01:56
Comment on lines +18 to +19
# Log in to CrateDB Cloud.
croud login --idp azuread
Copy link
Member Author

@amotl amotl Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

There is an alternative, headless way of doing that: Using croud config show, you can display the location of the croud configuration file in YAML format.

current-profile: cratedb.cloud
default-format: table
profiles:
  cratedb.cloud:
    auth-token: xxxxxxxxxx
    endpoint: https://console.cratedb.cloud
    key: REDACTED
    organization-id: null
    region: _any_
    secret: xxxxxxxxxx

If you fill in the key and secret values, obtained by running croud api-keys create, to create an API key 1, operations like croud clusters list will start working without further ado, even after logging out again using croud logout.

It works well. Thanks, @proddata.

Suggestion

Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_KEY and CRATEDB_CLOUD_SECRET.

Footnotes

  1. Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings

Comment on lines +110 to +115
cratedb_toolkit.configure(
runtime_errors="exit",
settings_accept_cli=True,
settings_accept_env=True,
settings_errors="exit",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may eventually be refactored into a context manager.

Comment on lines +87 to +94
@pytest.mark.skip(
"Does not work: Apparently, the 'responses' mockery is not properly activated when evaluating the notebook"
)
@responses.activate
def test_example_cloud_import_notebook(mocker, mock_cloud_cluster_exists):
"""
Verify the Jupyter Notebook example works.
"""
Copy link
Member Author

@amotl amotl Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meanwhile, we discovered a solution based on testbook, which will also let you execute a notebook on behalf of a pytest test case, but also use corresponding mocking to manage the environment.

Applying that here will make it possible to actually unskip the test case.

It is just a rough sketch, which barely works.
@amotl amotl force-pushed the amo/cloud-second branch 2 times, most recently from da905dc to 8ca895a Compare July 6, 2024 10:26
amotl added 11 commits July 6, 2024 12:46
- Naming things: Use more appropriate names for variables.
- OO: Add `EnvironmentConfiguration`, to manage information about the
  toolkit environment.
- OO: Add `CloudJob` entity, to manage information about a cloud job,
  and to streamline passing of information.
- OO: Refactor low-level methods to `CloudCluster`.
- Capability to configure error handling based on the environment.
- Capability to conveniently acquire configuration settings from the
  environment, for obtaining the CrateDB Cluster identifier or name.
- Tests: Add `DockerSkippingContainer` and `PytestTestcontainerAdapter`.
  Both skip test execution when the Docker daemon is not running, or not
  available to the environment.
- Examples: Improve UX of `cloud_*.py` example programs.
It can be used to acquire corresponding client handles (adapter, dbapi,
sqlalchemy), in order to communicate with the database.
When not logged in using `croud login`, the "create project" operation
needs an organization identifier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant