Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add mdBook based documentation #2

Merged
merged 3 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Problem

## Summary of changes
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
# optd
Query Optimizer Service

## Documentation

The [docs](docs/) directory contains high-level documentation of optd and RFCs in the mdBook format.
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
book
12 changes: 12 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Development Documentation

The `src` folder contains the documentation on the optd query optimizer in the `mdBook` format.


To view the documentation locally, you can follow the [`mdbook` installation guide](https://rust-lang.github.io/mdBook/guide/installation.html) to set up the environment. After installing `mdBook`, run the following command from the root of the optd repository:
connortsui20 marked this conversation as resolved.
Show resolved Hide resolved

```shell
mdbook serve --open docs/
```

If you want to edit or add a chapter to the book, start from [SUMMARY.md](./src/SUMMARY.md) which lists a table of contents. For more information, please check out the [mdBook documentation](https://rust-lang.github.io/mdBook/format/index.html).
9 changes: 9 additions & 0 deletions docs/book.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[book]
authors = ["Yuchen Liang"]
language = "en"
multilingual = false
src = "src"
title = "The optd Query Optimizer Documentation"

[output.html]
additional-css = ["custom.css"]
5 changes: 5 additions & 0 deletions docs/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.content img {
margin-left: auto;
margin-right: auto;
display: block;
}
16 changes: 16 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Summary

[Overview](./overview.md)

# Architecture

- [Glossary](./architecture/glossary.md)

# Contributor Guide

- [Installaton]()

# RFCs

- [Writing an RFC](./rfcs/README.md)
- [RFC-0001: The Core Objects in a Cascades-style Query Optimizer and How to Store Them]()
74 changes: 74 additions & 0 deletions docs/src/architecture/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Glossary

Definitions in query optimization can get very overloaded. Below is the language optd developers speak.

### Relational operator
A **relation operator** (`RelNode`) describes an operation that can be evaluated to obtain a bag of tuples. In other literature this is also referred to as a query plan. A relational operator can be either logical or physical.

### Scalar operator

A **scalar operator** (`ScalarNode`) describes an operation that can be evaluated to obtain a single value. In other literature this is also referred to as a sql expression or a row expression.

## Cascades

### Expressions

A **logical expression** is a tree/DAG of logical operators.

A **physical expression** is a tree/DAG of physical operators.

### Properties

**Properties** are metadata computed (and sometimes stored) for each node in an expression.
Properties of an expression may be **required** by the original SQL query or **derived** from **physical properties of one of its inputs.**


**Logical properties** describe the structure and content of data returned by an expression.

- Examples: row count, operator type,statistics, whether relational output columns can contain nulls.

**Physical properties** are characteristics of an expression that
impact its layout, presentation, or location, but not its logical content.

- Examples: order and data distribution.


### Equivalence

Two logical expressions are equivalent if the logical properties of the two expressions are the same. They should produce the same set of rows and columns.

Two physical expressions are equivalent if their logical and physical properties are the same.

Logical expression with a required physical property is equivalent to a physical expression if the physical expression has the same logical property and delivers the physical property.


### Group

A **group** consists of equivalent logical expressions.

A **relational group** consists of logically equivalent logical relational operators.

A **scalar group** consists of logically equivalent logical scalar operators.

### Rule

a **rule** in Cascades transforms an expression into equivalent expressions. It has the following interface.

```rust
trait Rule {
/// Checks whether the rule is applicable on the input expression.
fn check_pattern(expr: Expr) -> bool;
/// Transforms the expression into one or more equivalent expressions.
fn transform(expr: Expr) -> Vec<Expr>;
}
```

A **transformation rule** transforms a **part** of the logical expression into logical expressions. This is also called a logical to logical transformation in other systems.

A **implementation rule** transforms a **part** of a logical expression to an equivalent physical expression with physical properties.

In Cascades, you don't need to materialize the entire query tree when applying rules. Instead, you can materialize expressions on demand while leaving unrelated parts of the tree as group identifiers.

In other systems, there are physical to physical expression transformation for execution engine specific optimization, physical property enforcement, or distributed planning. At the moment, we are **not** considering physical-to-physical transformations.

**Enforcer rule:** *TODO!*
5 changes: 5 additions & 0 deletions docs/src/citations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[1] Bailu Ding, Vivek Narasayya and Surajit Chaudhuri (2024), "Extensible Query Optimizers in Practice", Foundations and Trends® in Databases: Vol. 14: No. 3-4, pp 186-402. http://dx.doi.org/10.1561/1900000077

[2] Guido Moerkotte, Pit Fender, and Marius Eich. 2013. On the correct and complete enumeration of the core search space. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 493–504. https://doi.org/10.1145/2463676.2465314

[3] Florian M. Waas and Joseph M. Hellerstein. 2009. Parallelizing extensible query optimizers. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09). Association for Computing Machinery, New York, NY, USA, 871–878. https://doi.org/10.1145/1559845.1559938
13 changes: 13 additions & 0 deletions docs/src/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Introduction

optd is a database query optimizer service。The project is in active development.

## Our Wishlist

- Correct and complete enumeration of the search space.
- An accurate cost model powered with advanced statistics that can differentiate plans under a mix of optimization objectives.
- An efficient search algorithm to navigate the vast search space.
- A persistent storage (cache) of query optimizer state that allows us to reuse past optimizations for future queries.
- An explainable, self-correcting, and human-assisted optimization process by producing and consuming a trail of breadcrumbs that could explain every decision that the optimizer makes.
- An intelligent scheduler that exploit parallelism in modern hardwares to boost search performance.
- An extensible operator and rule system.
27 changes: 27 additions & 0 deletions docs/src/rfcs/0000-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
- Feature Name: (fill me in with a unique feature name)
- Authors: (fill me in with the name of the authors)
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: [cmu-db/optd#0000](https://github.com/cmu-db/optd/pull/0000)
- Tracking Issue: [cmu-db/optd#0000](https://github.com/cmu-db/optd/issues/0000)

## Summary

## Motivation

## Non Goals (if relevant)

## Impacted components (e.g. core, memo table, representation, rule engine, etc.)

## Proposed implementation

### Reliability, failure modes and corner cases (if relevant)

### Scalability (if relevant)

### Unresolved questions (if relevant)

## Alternative implementation (if relevant)

## Pros/cons of proposed approaches (if relevant)

## Definition of Done (if relevant)
7 changes: 7 additions & 0 deletions docs/src/rfcs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# RFCs

This section contains RFCs for features and technical concepts proposed to be integrated into the system. In some cases, they could be retroactive and document why certain design decisions were made. Writing RFCs enables us to early validate our concepts and keep peers informed around the design of codebase. Since it is not our goal to keep these documents up to date, please refer to the Architecture section if you are looking for the most up-to-date description of the system. However, with context, these RFCs should still provide useful insights into the reasoning and thought process behind certain design decisions.

To write a new RFC, copy `docs/rfcs/0000-templated.md` to a new file and start editing the header and all the other sections.

For your RFC PR, you should also make sure to edit [SUMMARY.md](../SUMMARY.md) to make it visible in the mdBook docs. Make sure your RFC numbering does not collides with the ones other people wrote and have not been merged yet.
Loading