PowerGridModel · mgovers · Oct 23, 2024 · Oct 22, 2024 · Oct 22, 2024 · Oct 22, 2024
diff --git a/docs/user_manual/calculations.md b/docs/user_manual/calculations.md
@@ -659,9 +659,9 @@ Internally, to achieve an optimal regulated tap position, the control algorithm
 
 Given the discrete nature of the finite tap ranges, we use the following search methods to find the next tap position along the exploitation direction.
 
-| Search method | Description                                                                            |
-| ------------- | -------------------------------------------------------------------------------------- |
-| linear search | Start with an initial guess and do a local search with step size 1 for each iteration step. |
+| Search method | Description                                                                                     |
+| ------------- | ----------------------------------------------------------------------------------------------- |
+| linear search | Start with an initial guess and do a local search with step size 1 for each iteration step.     |
 | binary search | Start with a large search region and reduce the search region by half for every iteration step. |
 
 
@@ -675,9 +675,9 @@ The framework for creating the batches is the same for all types of calculations
 For every component, the attributes that can be updated in a batch scenario are mentioned in [Components](components.md).
 Examples of batch calculations for timeseries and contingency analysis are given in [Power Flow Example](../examples/Power%20Flow%20Example.ipynb)
 
-The same method as for single calculations, `calculate_power_flow`, can be used to calculate a number of scenarios in one go.
-To do this, you need to supply an `update_data` argument. 
-This argument contains a dictionary of 2D update arrays (one array per component type).
+The same method as for single calculations, {py:class}`power_grid_model.PowerGridModel.calculate_power_flow`, can be used to calculate a number of scenarios in one go.
+To do this, you need to supply an `update_data` keyword argument. 
+This keyword argument contains a dictionary of 2D update arrays (one array per component type).
 
 The performance for different batches vary. power-grid-model automatically makes efficient calculations whenever possible. See the [Performance Guide](performance-guide.md#topology-caching) for ways to optimally use the performance optimizations.
 
@@ -736,7 +736,7 @@ independent_update_data = {'line': line_update}
 The batch calculation supports shared memory multi-threading parallel computing. 
 The common internal states and variables are shared as much as possible to save memory usage and avoid copy.
 
-You can set `threading` parameter in `calculate_power_flow()` or `calculate_state_estimation()` to enable/disable parallel computing.
+You can set the `threading` keyword argument in the `calculate_*` functions (like {py:class}`calculate_power_flow() <power_grid_model.PowerGridModel.calculate_power_flow>`) to enable/disable parallel computing.
 
 - `threading=-1`, use sequential computing (default)
 - `threading=0`, use number of threads available from the machine hardware (recommended)

diff --git a/docs/user_manual/performance-guide.md b/docs/user_manual/performance-guide.md
@@ -6,6 +6,57 @@ SPDX-License-Identifier: MPL-2.0
 
 # Guidelines for performance enhancement
 
+The `power-grid-model` is a library that shines in its ability to handle calculations at scale.
+It remains performant, even when doing calculations with one or a combination of the following extremes (non-exhaustive):
+
+- Large grids
+- Batch calculations with many scenarios
+- Many changes in the grid in each scenario
+
+To achieve that high performance, several optimizations are made.
+To use those optimizations to the fullest, we recommend our users to follow the following guidelines.
+
+## Data validity
+
+Many of our optimizations assume input data validity and rely on the fact that the provided grid is reasonably close to realistic.
+Non-convergence, underdetermined equations or other unexpected behavior may therefore be encountered when the data is not realistic.
+
+To keep the PGM performant, checks on hard physical bounds are offloaded to a separate tool, i.e., the [data validator](data-validator.md).
+However, these checks can be prohibitively expensive and application at scale in production environments is therefore not recommended when performance matters.
+Instead, we recommend using the data validator specifically for debugging purposes.
+
+```{note}
+Some combinations of input data are not forbidden by physics, but still pose unrealistic conditions, e.g., a source with a very low short-circuit power.
+These cases may result in unexpected behavior of the calculation core.
+Vagueness and case-dependence make it hard to check what can be considered "unrealistic", and the [data validator](data-validator.md) will therefore not catch such cases.
+We recommend our users to provide reasonably realistic scenarios to prevent these edge cases from happening.
+```
+
+## Data format
+
+The data format of input, output and update data can have a big effect on memory and computational cost.
+
+### Input/update data volume
+
+Row-based data (created, e.g., using {py:class}`power_grid_model.initialize_array` in Python) constructs input/update data with all attributes for a given dataset type.
+However, many component attributes are optional.
+If your use case does not depend on these attributes, a lot of data is needlessly created and initialized.
+If you are running on a system where memory is the bottle-neck, using a columnar data format may reduce the memory footprint. This may or may not induce a slight computational overhead during calculations.
+
+### Output data volume
+
+For most use cases, only certain output values are relevant.
+For example, if you are only interested in line loading, outputting all other components and attributes results in unnecessary overhead.
+The output data may be a significant, if not the dominant, contributor to memory load, particularly when running large batch calculations.
+We therefore recommend restricting the output data to only the components and attributes that are used by the user in such production environments.
+In Python, it is possible to do so by using the `output_component_types` keyword argument in the `calculate_*` functions (like {py:class}`power_grid_model.PowerGridModel.calculate_power_flow`)
+
+### Database integration
+
+Most databases store their data in a columnar data format.
+Copying, reserving unused memory, and cache misses can lead to unnecessary memory usage and computational overhead.
+With the introduction of columnar data input to PGM, integrating with databases using this format becomes easier, more natural, and more efficient.
+
 ## Batch calculations
 
 Depending on the details of the batch, a number of performance optimizations are possible: