diff --git a/docs/connect/df.md b/docs/connect/df.md index e1b5454..c484cc5 100644 --- a/docs/connect/df.md +++ b/docs/connect/df.md @@ -2,11 +2,36 @@ (dataframes)= # CrateDB and DataFrame libraries -This documentation section lists DataFrame libraries and frameworks which can -be used together with CrateDB. Hands-on tutorials about them can be found -on the ["connect" section of the CrateDB Guide]. +Data frame libraries and frameworks which can +be used together with CrateDB. +:::::{grid} 1 2 2 2 +:margin: 4 4 0 0 +:padding: 0 +:gutter: 2 + +::::{grid-item-card} {material-outlined}`lightbulb;2em` Tutorials +:link: guide:dataframes +:link-type: ref +Learn how to use CrateDB together with popular open-source data frame +libraries, on behalf of hands-on tutorials and code examples. ++++ +{tag-info}`Dask` {tag-info}`pandas` {tag-info}`Polars` +:::: + +::::{grid-item-card} {material-outlined}`read_more;2em` SQLAlchemy +CrateDB's SQLAlchemy dialect implementation provides fundamental infrastructure +to integrations with Dask, pandas, and Polars. ++++ +[ORM Guides](inv:guide#orm) • +{ref}`ORM Catalog ` +:::: + +::::: + + +(dask)= ## Dask [Dask] is a parallel computing library for analytics with task scheduling. @@ -31,6 +56,7 @@ the Python libraries that you know and love, like NumPy, pandas, and scikit-lear ``` +(pandas)= ## pandas ```{div} @@ -41,11 +67,34 @@ the Python libraries that you know and love, like NumPy, pandas, and scikit-lear [pandas] is a fast, powerful, flexible, and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. +Pandas (stylized as pandas) is a software library written for the Python programming +language for data manipulation and analysis. In particular, it offers data structures +and operations for manipulating numerical tables and time series. + +:::{rubric} Data Model +::: +- Pandas is built around data structures called Series and DataFrames. Data for these + collections can be imported from various file formats such as comma-separated values, + JSON, Parquet, SQL database tables or queries, and Microsoft Excel. +- A Series is a 1-dimensional data structure built on top of NumPy's array. +- Pandas includes support for time series, such as the ability to interpolate values + and filter using a range of timestamps. +- By default, a Pandas index is a series of integers ascending from 0, similar to the + indices of Python arrays. However, indices can use any NumPy data type, including + floating point, timestamps, or strings. +- Pandas supports hierarchical indices with multiple values per data point. An index + with this structure, called a "MultiIndex", allows a single DataFrame to represent + multiple dimensions, similar to a pivot table in Microsoft Excel. Each level of a + MultiIndex can be given a unique name. + +::: + ```{div} :style: "clear: both" ``` +(polars)= ## Polars ```{div} @@ -83,7 +132,8 @@ vectorized query engine, it is open source, and written in Rust. community of developers. Everyone is encouraged to add new features and contribute. It is free to use under the MIT license. -**Data formats** +:::{rubric} Data formats +::: Polars supports reading and writing to many common data formats. This allows you to easily integrate Polars into your existing data stack. @@ -101,7 +151,6 @@ This allows you to easily integrate Polars into your existing data stack. [Apache Arrow]: https://arrow.apache.org/ -["connect" section of the CrateDB Guide]: inv:guide:*:label#connect [Dask]: https://www.dask.org/ [Dask DataFrames]: https://docs.dask.org/en/latest/dataframe.html [Dask Futures]: https://docs.dask.org/en/latest/futures.html diff --git a/docs/connect/orm.md b/docs/connect/orm.md index afee63d..3e93196 100644 --- a/docs/connect/orm.md +++ b/docs/connect/orm.md @@ -1,9 +1,21 @@ (orm)= # CrateDB and ORM libraries -This documentation section lists ORM libraries and frameworks which can -be used together with CrateDB. Hands-on tutorials about them can be found -on the ["connect" section of the CrateDB Guide]. +ORM libraries and frameworks which can +be used together with CrateDB. + + +::::{card} {material-outlined}`lightbulb;2em` Tutorials +:margin: 0 0 5 5 +:shadow: md +:link: guide:orm +:link-type: ref + +Learn how to use CrateDB together with popular open-source ORM libraries. ++++ +{tag}`ORM` {tag-info}`SQLAlchemy` +:::: + ## SQLAlchemy @@ -16,9 +28,9 @@ on the ["connect" section of the CrateDB Guide]. [SQLAlchemy] is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. -It plays an important role, because popular Python-based [DataFrame](df.md) +Python-based [DataFrame](df.md) and [ML](../integrate/ml.md) libraries, and a few [ETL](../integrate/etl.md) -frameworks, are using SQLAlchemy as data abstraction library when connecting to +frameworks, are using SQLAlchemy as database adapter library when connecting to [RDBMS]. ```{div} @@ -26,5 +38,5 @@ frameworks, are using SQLAlchemy as data abstraction library when connecting to ``` -["connect" section of the CrateDB Guide]: inv:guide:*:label#connect [RDBMS]: https://en.wikipedia.org/wiki/RDBMS +[SQLAlchemy]: https://www.sqlalchemy.org/ diff --git a/docs/index.md b/docs/index.md index 8320127..7ef8a2a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,28 +1,33 @@ (index)= +(catalog)= (drivers)= +(frameworks)= (integrations)= -# CrateDB Drivers and Integrations +# CrateDB Ecosystem Catalog +Database drivers, libraries, frameworks, and applications for CrateDB. -## About CrateDB - -CrateDB is a distributed and scalable open-source SQL database for storing and -analyzing massive amounts of data in near real-time, even with complex queries. -It is PostgreSQL-compatible, and based on Lucene. - -Users are operating CrateDB clusters that store information in the range of -billions of records, and terabytes of data, equally accessible without any -retrieval penalty on data point age. +:::{rubric} About CrateDB +::: +CrateDB is a distributed and scalable open-source SQL database based on Lucene, +with PostgreSQL compatibility. +CrateDB clusters store information in the range of billions of records, and +terabytes of data, and run analytics in near real time, even with complex +queries. +CrateDB can be used for enterprise data warehouse workloads, it +works across clouds and scales with your data. ## Connectivity -This section introduces you to the canonical set of database drivers, client- -and developer-applications, and how to configure them to connect to CrateDB. -Just to name a few, it is about the CrateDB Admin UI, `crash`, `psql`, -DataGrip, and DBeaver applications, the Java/JDBC/Python drivers, the SQLAlchemy -and Flink dialects, and more. +The canonical set of database drivers, client- and developer-applications, and +how to configure them to connect to CrateDB. + +Just to name a few, the sections below are about the CrateDB Admin UI, the +Crash CLI terminal program, connecting with PostgreSQL's psql client, the +DataGrip, and DBeaver IDE applications, the Java/JDBC/Python drivers, the +SQLAlchemy and Flink dialects, and more. ::::{grid} 1 2 2 2 :margin: 4 4 0 0 diff --git a/docs/integrate/bi.md b/docs/integrate/bi.md index 5913764..acdd276 100644 --- a/docs/integrate/bi.md +++ b/docs/integrate/bi.md @@ -2,9 +2,21 @@ (bi-tools)= # Business Analytics and Intelligence with CrateDB -This documentation section lists business analytics applications +Business analytics applications and frameworks, which can be used together with CrateDB. +::::{card} {material-outlined}`lightbulb;2em` Tutorials +:margin: 0 0 5 5 +:shadow: md +:link: guide:bi +:link-type: ref + +Guidelines about integrating CrateDB with business analytics and intelligence +software. ++++ +{tag}`BI` {tag}`DataViz` {tag-success}`PowerBI` {tag-success}`Rill` {tag-success}`Tableau` +:::: + (powerbi)= ## Microsoft Power BI @@ -39,7 +51,7 @@ possible to publish your dashboards, in order to share them with others. ```{div} :style: "float: right; margin-left: 0.5em" -[![](https://github.com/rilldata/rill/blob/main/docs/static/img/rill-logo-dark.svg){w=180px}](https://www.rilldata.com/) +[![](https://github.com/rilldata/rill/raw/main/docs/static/img/rill-logo-light.svg){w=180px}](https://www.rilldata.com/) ``` [Rill] is an open-source operational BI framework for effortlessly transforming @@ -57,7 +69,8 @@ This methodology allows for versioning and tracking, thus improving collaboratio on BI projects using code, which is more efficient and scalable than traditional BI tools, also breaking down information and knowledge barriers. -**Rill's design principles** +:::{rubric} Rill's design principles +::: - **Feels good to use** – powered by Sveltekit & DuckDB = conversation-fast, not wait-ten-seconds-for-result-set fast @@ -80,16 +93,13 @@ BI tools, also breaking down information and knowledge barriers. ## Tableau ```{div} -:style: "float: right" +:style: "float: right; margin-left: 0.5em" [![](https://upload.wikimedia.org/wikipedia/en/thumb/0/06/Tableau_logo.svg/500px-Tableau_logo.svg.png?20200509180027){w=180px}](https://www.tableau.com/) ``` [Tableau] is a visual business intelligence and analytics software platform. It expresses data by translating drag-and-drop actions into data queries through an intuitive interface. -[Connecting to CrateDB from Tableau with JDBC] and [Using CrateDB with Tableau] will -guide you through the process of setting it up correctly with CrateDB. - ![](https://cratedb.com/hs-fs/hubfs/08-index.png?width=1536&name=08-index.png){h=200px} ```{seealso} @@ -97,7 +107,6 @@ guide you through the process of setting it up correctly with CrateDB. ``` -[Connecting to CrateDB from Tableau with JDBC]: https://cratedb.com/blog/connecting-to-cratedb-from-tableau-with-jdbc [CrateDB and Tableau]: https://cratedb.com/integrations/cratedb-and-tableau [CrateDB and Power BI]: https://cratedb.com/integrations/cratedb-and-power-bi [PostgreSQL ODBC driver]: https://odbc.postgresql.org/ @@ -106,4 +115,3 @@ guide you through the process of setting it up correctly with CrateDB. [Power Query PostgreSQL connector]: https://learn.microsoft.com/en-us/power-query/connectors/postgresql [Rill]: https://www.rilldata.com/ [Tableau]: https://www.tableau.com/ -[Using CrateDB with Tableau]: https://community.cratedb.com/t/using-cratedb-with-tableau/1192 diff --git a/docs/integrate/etl.md b/docs/integrate/etl.md index 586c3e0..9b03090 100644 --- a/docs/integrate/etl.md +++ b/docs/integrate/etl.md @@ -1,9 +1,20 @@ (etl)= # ETL with CrateDB -Use ETL / data pipeline applications and frameworks for transferring data in -and out of CrateDB. Corresponding tutorials can be found within the -[CrateDB Guide: Integration Tutorials] section of the documentation. +ETL / data pipeline applications and frameworks for transferring data in +and out of CrateDB. + + +::::{card} {material-outlined}`lightbulb;2em` Tutorials +:margin: 0 0 5 5 +:shadow: md +:link: guide:etl +:link-type: ref + +Learn how to integrate CrateDB with popular ETL frameworks and applications. ++++ +{tag}`Extract, Transform, Load` {tag}`Data I/O, Import/Export` {tag}`ETL` {tag}`ELT` +:::: (apache-airflow)= @@ -11,6 +22,12 @@ and out of CrateDB. Corresponding tutorials can be found within the (astronomer)= ## Apache Airflow / Astronomer +```{div} +:style: "float: right" +[![](https://19927462.fs1.hubspotusercontent-na1.net/hub/19927462/hubfs/Partner%20Logos/392x140/Apache-Airflow-Logo-392x140.png?width=784&height=280&name=Apache-Airflow-Logo-392x140.png){w=180px}](https://airflow.apache.org/) + +[![](https://logowik.com/content/uploads/images/astronomer2824.jpg){w=180px}](https://www.astronomer.io/) +``` [Apache Airflow] is an open source software platform to programmatically author, schedule, and monitor workflows, written in Python. [Astronomer] offers managed Airflow services on the cloud of your choice, in @@ -23,18 +40,15 @@ dynamic pipeline generation and on-demand, code-driven pipeline invocation. Pipeline parametrization is using the powerful Jinja templating engine. To extend the system, you can define your own operators and extend libraries to fit the level of abstraction that suits your environment. - ```{div} -:style: "float: right" -[![](https://19927462.fs1.hubspotusercontent-na1.net/hub/19927462/hubfs/Partner%20Logos/392x140/Apache-Airflow-Logo-392x140.png?width=784&height=280&name=Apache-Airflow-Logo-392x140.png){w=180px}](https://airflow.apache.org/) - -[![](https://logowik.com/content/uploads/images/astronomer2824.jpg){w=180px}](https://www.astronomer.io/) +:style: "clear: both" ``` ```{seealso} [CrateDB and Apache Airflow] ``` + :::{dropdown} **Managed Airflow** ```{div} @@ -334,7 +348,6 @@ an SSIS Catalog database to store, run, and manage packages. [CrateDB and Apache Kafka]: https://cratedb.com/integrations/cratedb-and-kafka [CrateDB and Kestra]: https://cratedb.com/integrations/cratedb-and-kestra [CrateDB and Node-RED]: https://cratedb.com/integrations/cratedb-and-node-red -[CrateDB Guide: Integration Tutorials]: inv:guide:*:label#integrate [dbt]: https://www.getdbt.com/ [dbt Cloud]: https://www.getdbt.com/product/dbt-cloud/ [Debezium]: https://debezium.io/ diff --git a/docs/integrate/metrics.md b/docs/integrate/metrics.md index 592a58f..ca3c7d9 100644 --- a/docs/integrate/metrics.md +++ b/docs/integrate/metrics.md @@ -2,9 +2,20 @@ # Monitoring and Metrics with CrateDB Storing metrics data for the long term is a common need in systems monitoring -scenarios. CrateDB offers corresponding integration adapters. Relevant tutorials -can be found within the [CrateDB Guide: Integration Tutorials] section of the -documentation. +scenarios. CrateDB offers corresponding integration adapters. + +::::{card} {material-outlined}`lightbulb;2em` Tutorials +:margin: 0 0 5 5 +:shadow: md +:link: guide:metrics +:link-type: ref + +Learn how to use CrateDB together with popular metrics collection agents, +brokers, and stores. ++++ +{tag}`Logs` {tag}`Metrics` {tag}`Monitoring` {tag}`Telemetry` {tag-info}`Prometheus` {tag-info}`Telegraf` +:::: + (prometheus)= ## Prometheus @@ -21,8 +32,8 @@ Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels. -**Features** - +:::{rubric} Features +::: Prometheus's main features are: - a multi-dimensional data model with time series data identified by metric name and key/value pairs @@ -34,8 +45,8 @@ Prometheus's main features are: - multiple modes of graphing and dashboarding support -**Remote Endpoints and Storage** - +:::{rubric} Remote Endpoints and Storage +::: The [Prometheus remote endpoints and storage] subsystem, based on its [remote write] and [remote read] features, allows to transparently send and receive metric samples. It is primarily intended for long term @@ -75,7 +86,8 @@ events from databases, systems, and IoT sensors. Telegraf is written in Go and compiles into a single binary with no external dependencies, and requires a very minimal memory footprint. -**Overview** +:::{rubric} Overview +::: - **IoT sensors**: Collect critical stateful data (pressure levels, temperature levels, etc.) with popular protocols like MQTT, ModBus, OPC-UA, and Kafka. diff --git a/docs/integrate/ml.md b/docs/integrate/ml.md index 74c0966..7fa4f9b 100644 --- a/docs/integrate/ml.md +++ b/docs/integrate/ml.md @@ -2,9 +2,21 @@ (ml-tools)= # Machine Learning with CrateDB -This documentation section lists machine learning applications and frameworks -which can be used together with CrateDB. Relevant tutorials can be found within -the [CrateDB Guide: Machine Learning Tutorials] section of the documentation. +Machine learning applications and frameworks +which can be used together with CrateDB. + +::::{card} {material-outlined}`lightbulb;2em` Tutorials +:margin: 0 0 5 5 +:shadow: md +:link: guide:ml +:link-type: ref + +Learn how to integrate CrateDB with machine learning frameworks and tools, +for MLOps and Vector database operations. ++++ +{tag}`MLOps` {tag}`Vector Store` {tag}`Embeddings` +{tag}`Hybrid Search` {tag}`LLM` {tag}`RAG` +:::: ## LangChain @@ -86,12 +98,37 @@ of the underlying model architectures and parameters. [![](https://jupyter.org/assets/logos/rectanglelogo-greytext-orangebody-greymoons.svg){w=180px}](https://jupyter.org/) ``` +:::{rubric} scikit-learn +::: +_Machine Learning in Python._ + +- Simple and efficient tools for predictive data analysis +- Accessible to everybody, and reusable in various contexts +- Built on NumPy, SciPy, and matplotlib + +:::{rubric} pandas +::: +_The open source data analysis and manipulation tool._ + +Pandas is a software library written for the Python programming +language for data manipulation and analysis. In particular, it offers data structures +and operations for manipulating numerical tables and time series. + +:::{rubric} Project Jupyter +::: +_Interactive computing across all programming languages._ + +JupyterLab is the latest web-based interactive development environment for notebooks, +code, and data. Its flexible interface allows users to configure and arrange workflows +in data science, scientific computing, computational journalism, and machine learning. +A modular design invites extensions to expand and enrich functionality. + + ```{div} :style: "clear: both" ``` -[CrateDB Guide: Machine Learning Tutorials]: inv:guide:*:label#ml [LangChain]: https://python.langchain.com/ [LangChain adapter for CrateDB]: https://github.com/crate-workbench/langchain [MLflow]: https://mlflow.org/ diff --git a/docs/integrate/visualize.md b/docs/integrate/visualize.md index cb5d010..829a021 100644 --- a/docs/integrate/visualize.md +++ b/docs/integrate/visualize.md @@ -1,9 +1,20 @@ (visualize)= # Visualize data in CrateDB -Use dashboard and other data visualization applications and toolkits for +Dashboard and other data visualization applications and toolkits for visualizing data stored inside CrateDB. +::::{card} {material-outlined}`lightbulb;2em` Tutorials +:margin: 0 0 5 5 +:shadow: md +:link: guide:visualization +:link-type: ref + +Guidelines about data analysis and visualization with CrateDB. ++++ +{tag}`DataViz` {tag}`EDA` {tag}`BI` +:::: + (apache-superset)= (preset)=