Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature - columnar performance docs #801

Merged
merged 14 commits into from
Oct 23, 2024
14 changes: 7 additions & 7 deletions docs/user_manual/calculations.md
Original file line number Diff line number Diff line change
Expand Up @@ -659,9 +659,9 @@ Internally, to achieve an optimal regulated tap position, the control algorithm

Given the discrete nature of the finite tap ranges, we use the following search methods to find the next tap position along the exploitation direction.

| Search method | Description |
| ------------- | -------------------------------------------------------------------------------------- |
| linear search | Start with an initial guess and do a local search with step size 1 for each iteration step. |
| Search method | Description |
| ------------- | ----------------------------------------------------------------------------------------------- |
| linear search | Start with an initial guess and do a local search with step size 1 for each iteration step. |
| binary search | Start with a large search region and reduce the search region by half for every iteration step. |


Expand All @@ -675,9 +675,9 @@ The framework for creating the batches is the same for all types of calculations
For every component, the attributes that can be updated in a batch scenario are mentioned in [Components](components.md).
Examples of batch calculations for timeseries and contingency analysis are given in [Power Flow Example](../examples/Power%20Flow%20Example.ipynb)

The same method as for single calculations, `calculate_power_flow`, can be used to calculate a number of scenarios in one go.
To do this, you need to supply an `update_data` argument.
This argument contains a dictionary of 2D update arrays (one array per component type).
The same method as for single calculations, {py:class}`power_grid_model.PowerGridModel.calculate_power_flow`, can be used to calculate a number of scenarios in one go.
To do this, you need to supply an `update_data` keyword argument.
This keyword argument contains a dictionary of 2D update arrays (one array per component type).

The performance for different batches vary. power-grid-model automatically makes efficient calculations whenever possible. See the [Performance Guide](performance-guide.md#topology-caching) for ways to optimally use the performance optimizations.

Expand Down Expand Up @@ -736,7 +736,7 @@ independent_update_data = {'line': line_update}
The batch calculation supports shared memory multi-threading parallel computing.
The common internal states and variables are shared as much as possible to save memory usage and avoid copy.

You can set `threading` parameter in `calculate_power_flow()` or `calculate_state_estimation()` to enable/disable parallel computing.
You can set the `threading` keyword argument in the `calculate_*` functions (like {py:class}`calculate_power_flow() <power_grid_model.PowerGridModel.calculate_power_flow>`) to enable/disable parallel computing.

- `threading=-1`, use sequential computing (default)
- `threading=0`, use number of threads available from the machine hardware (recommended)
Expand Down
51 changes: 51 additions & 0 deletions docs/user_manual/performance-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,57 @@ SPDX-License-Identifier: MPL-2.0

# Guidelines for performance enhancement

The `power-grid-model` is a library that shines in its ability to handle calculations at scale.
It remains performant, even when doing calculations with one or a combination of the following extremes (non-exhaustive):

- Large grids
- Batch calculations with many scenarios
- Many changes in the grid in each scenario

To achieve that high performance, several optimizations to the code are made.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
To use those optimizations to their fullest extend, we recommend our users to follow the following guidelines.
mgovers marked this conversation as resolved.
Show resolved Hide resolved

## Data validity

Many of our optimizations rely on assuming input data validity and the fact that the provided grid is reasonably close to realistic.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
Non-convergence, underdetermined equations (sparse matrices) or other unexpected behavior may therefore be encountered when the data is not realistic.
mgovers marked this conversation as resolved.
Show resolved Hide resolved

To keep the PGM performant, checks on hard physical bounds are offloaded to a separate tool, the [data validator](data-validator.md).
mgovers marked this conversation as resolved.
Show resolved Hide resolved
However, these checks are extremely expensive and should therefore not be used in production environments at scale when performance matters.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
Instead, we recommend using the data validator specifically for debugging purposes.

```{note}
Some combinations of input data are not forbidden by physics, but still pose unrealistic conditions, e.g. a source with a very low short-circuit power.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
These cases may result in unexpected behavior of the calculation core under the optimizations in the code.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
Vagueness and case-dependence make it hard to check what can be considered "unrealistic", and the [data validator](data-validator.md) will therefore not catch such cases.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
We recommend our users to provide reasonably realistic scenarios to prevent these edge cases from happening.
```

## Data format

The data format of input, output and update data can have a big effect on memory and computational cost.

### Input/update data volume

Row-based data (created, e.g., using {py:class}`power_grid_model.initialize_array` in Python) constructs input/update data with all attributes for a given dataset type.
However, many component attributes are optional.
If your use case does not depend on these attributes, this means that a lot of data is needlessly created and initialized.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
If you are running on a system on which memory is the bottle-neck, using a columnar data format may reduce the memory burden, at the cost of a slight computational overhead during the calculations.
mgovers marked this conversation as resolved.
Show resolved Hide resolved

### Output data volume

For most use cases, only certain output values are relevant.
For instance, if you are only interested in line loading, outputting all other components and attributes results in unnecessary overhead.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
The output data may be a significant, if not the dominant, contributor to memory load, particularly when running batch calculations with many scenarios.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
We therefore recommend restricting the output data to only the components and attributes that are used by the user in such production environments.
In Python, it is possible to do so by using the `output_component_types` keyword argument in the `calculate_*` functions (like {py:class}`power_grid_model.PowerGridModel.calculate_power_flow`)

### Database integration

Most databases store their data in a columnar data format.
Copying, reserving unused memory and cache misses may cause both unnecessary memory and computational overhead.
mgovers marked this conversation as resolved.
Show resolved Hide resolved
Integration of the PGM with databases using a columnar data format is therefore often not only easier and more natural, but also more performant.
mgovers marked this conversation as resolved.
Show resolved Hide resolved

## Batch calculations

Depending on the details of the batch, a number of performance optimizations are possible:
Expand Down
Loading