Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columnar data documentation (Terminology) #783

Merged
merged 12 commits into from
Oct 18, 2024
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,9 @@
# label references for depth of headers: label name in anchor slug structure
myst_heading_anchors = 4
# execute jupter notebooks output before building webpage
jupyter_execute_notebooks = "off"
nb_execution_mode = "off"
nitbharambe marked this conversation as resolved.
Show resolved Hide resolved
nb_execution_excludepatterns = ["*/_build/*"]

# Extentions in myst
myst_enable_extensions = [
"dollarmath",
Expand Down
114 changes: 98 additions & 16 deletions docs/user_manual/dataset-terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,23 +10,105 @@ Some terms regarding the data structures are explained here, including the defin

## Data structures

- **Dataset:** Either a single or a batch dataset.
- **SingleDataset:** A data type storing input data (i.e. all elements of all components) for a single scenario.
- **BatchDataset:** A data type storing update and or output data for one or more scenarios. A batch dataset can contain sparse or dense data, depending on the component.
- **DataArray** A data array can be a single or a batch array. It is a numpy structured array.
- **SingleArray** A dictionary where the keys are the component types and the values are one-dimensional structured numpy arrays.
- **BatchArray:** An array of dictionaries where the keys are the component types and the values are two-dimensional structured numpy arrays.
- **DenseBatchArray:** A two-dimensional structured numpy array containing a list of components of the same type for each scenario.
- **SparseBatchArray:** A dictionary with a one-dimensional numpy int64 array and a one-dimensional structured numpy arrays.

### Type of Dataset

The types of `Dataset` include the following: `input`, `update`, `sym_output`, `asym_output`, and `sc_output`:
Exemplery datasets attributes are given in a dataset containing a `line` component.

- **input:** Contains attributes relevant to configuration of grid.
```{mermaid}
mgovers marked this conversation as resolved.
Show resolved Hide resolved
graph TD
mgovers marked this conversation as resolved.
Show resolved Hide resolved
subgraph Other numpy arrays
IndexPointer
SingleColumn
BatchColumn
end

subgraph Datasets
Dataset --> SingleDataset
Dataset --> BatchDataset
end


click Dataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.Dataset"
click SingleDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleDataset"
click BatchDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchDataset"

click IndexPointer href "../api_reference/python-api-reference.html#power_grid_model.data_types.IndexPointer"
click SingleColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumn"
click BatchColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumn"
```

```{mermaid}
graph TD
subgraph Dataset values
ComponentData --> DataArray
ComponentData --> ColumnarData

DataArray --> SingleArray
DataArray --> BatchArray

BatchArray --> DenseBatchArray
BatchArray --> SparseBatchArray

ColumnarData --> SingleColumnarData
ColumnarData --> BatchColumnarData

BatchColumnarData --> DenseBatchColumnarData
BatchColumnarData --> SparseBatchColumnarData
end

click ComponentData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ComponentData"
click DataArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DataArray"
click ColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ColumnarData"
click SingleArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleArray"
click BatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchArray"
click DenseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchArray"
click SparseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchArray"
click SingleColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumnarData"
click BatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumnarData"
click DenseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchColumnarData"
click SparseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchColumnarData"

```

- **{py:class}`Dataset <power_grid_model.data_types.Dataset>`:** Either a single or a batch dataset. it is a dictionary with keys as the component types (eg. `line`, `node`, etc) and values as **ComponentData**
- **{py:class}`SingleDataset <power_grid_model.data_types.SingleDataset>`:** A data type storing input data (i.e. all elements of all components) for a single scenario.
- **{py:class}`BatchDataset <power_grid_model.data_types.BatchDataset>`:** A data type storing update and or output data for one or more scenarios. A batch dataset can contain sparse or dense data, depending on the component.

- **{py:class}`ComponentData <power_grid_model.data_types.ComponentData>`:** The data corresponding to the component.
- **{py:class}`DataArray <power_grid_model.data_types.DataArray>`:** A data array can be a single or a batch array. It is a numpy structured array.
- **{py:class}`SingleArray <power_grid_model.data_types.SingleArray>`:** A 1D numpy structured array corresponding to a single dataset.
- **{py:class}`BatchArray <power_grid_model.data_types.BatchArray>`:** Multiple batches of data can be represented in sparse or dense forms.
- **{py:class}`DenseBatchArray <power_grid_model.data_types.DenseBatchArray>`:** A 2D structured numpy array containing a list of components of the same type for each scenario.
- **{py:class}`SparseBatchArray <power_grid_model.data_types.SparseBatchArray>`:** A typed dictionary with a 1D numpy array of `Indexpointer` type under `indptr` key and `SingleArray` under `data` key which is all components flattened over all batches.
- **{py:class}`ColumnarData <power_grid_model.data_types.ColumnarData>`:** A dictionary of attributes as keys and individual numpy arrays as values.
- **{py:class}`SingleColumnarData <power_grid_model.data_types.SingleColumnarData>`:** A dictionary of attributes as keys and `SingleColumn` as values in a single dataset.
- **{py:class}`BatchColumnarData <power_grid_model.data_types.BatchColumnarData>`:** Multiple batches of data can be represented in sparse or dense forms.
- **{py:class}`DenseBatchColumnarData <power_grid_model.data_types.DenseBatchColumnarData>`:** A dictionary of attributes as keys and 2D/3D numpy array of `BatchColumn` type as values in a single dataset.
- **{py:class}`SparseBatchColumnarData <power_grid_model.data_types.SparseBatchColumnarData>`:** A typed dictionary with a 1D numpy array of `Indexpointer` type under `indptr` key and `SingleColumn` under `data` which is all components flattened over all batches.

- **{py:class}`IndexPointer <power_grid_model.data_types.IndexPointer>`:** A 1D numpy array of int64 type used to specify sparse batches. It indicates the range of components within a scenario. For example, an Index pointer of [0, 1, 3, 3] indicates 4 batches with element indexed with 0 in 1st batch, [1, 2, 3] in 2nd batch and no elements in 3rd batch.
- **{py:class}`SingleColumn <power_grid_model.data_types.SingleColumn>`:** A 1D/2D numpy array of values corresponding to a specific attribute.
- **{py:class}`BatchColumn <power_grid_model.data_types.BatchColumn>`:** A 2D/3D numpy array of values corresponding to a specific attribute.

### Dimensions of numpy arrays

The dimensions of numpy arrays and the interpretation of each dimension is as follows.

| **Data Type** | **1D** |**2D** | **3D** |
|--------------------------|-----------------------------------|-------------------------------------------------------|-------------------------------------------------------------------------------|
| **SingleArray** | Corresponds to a single dataset. | &#10060; | &#10060; |
| **DenseBatchArray** | &#10060; | Batch number $\times$ Component within that batch | &#10060; |
| **SingleColumn** | Component within that batch. | Component within that batch $\times$ Phases &#10024; | &#10060; |
| **BatchColumn** | &#10060; | Batch number $\times$ Component within that batch | Batch number $\times$ Component within that batch $\times$ Phases &#10024; |

```{note}
&#10024; The "Phases" dimension is optional and is available only when the attributes are asymmetric.
```

### Type of Dataset

The types of `Dataset` include the following: `input`, `update`, `sym_output`, `asym_output`, and `sc_output`. They are included under the enum {py:class}`DatasetType <power_grid_model.typing.DatasetType>`.
Exemplary datasets attributes are given in a dataset containing a `line` component.

- **input:** Contains attributes relevant to configuration of grid.
- Example: `id`, `from_node`, `from_status`
- **update:** Contains attributes relevant to multiple scenarios.
- **update:** Contains attributes relevant to multiple scenarios.
nitbharambe marked this conversation as resolved.
Show resolved Hide resolved
- Example: `from_status`,`to_status`
- **sym_output:** Contains attributes relevant to symmetrical steady state output of power flow or state estimation calculation.
- Example: `p_from`, `p_to`
Expand Down
Loading