Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(load): add arrow endpoints #2200

Open
wants to merge 44 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ad63883
feat: first commit
TheoPascoli Oct 24, 2024
62f1b1d
feat: first commit
TheoPascoli Oct 24, 2024
cc01641
Merge branch 'dev' into feat/add-load-endpoints-with-arrow
TheoPascoli Oct 24, 2024
d73c8de
Merge branch 'dev' into feat/add-load-endpoints-with-arrow
TheoPascoli Oct 25, 2024
998c04e
feat(bc): use `update_config` instead of `update_bc` for multiple upd…
MartinBelthle Oct 25, 2024
f4206fe
build(python): bump python version to use v3.11 (#2164)
MartinBelthle Oct 29, 2024
f8b0f8a
feat(ts-gen): display progress bar via websockets (#2194)
MartinBelthle Oct 29, 2024
ebd2df4
feat: first commit
TheoPascoli Oct 24, 2024
9a3591a
feat: first commit
TheoPascoli Oct 24, 2024
fddf3b8
feat: first commit
TheoPascoli Oct 24, 2024
582aed0
feat: first commit
TheoPascoli Oct 24, 2024
72ec467
Merge branch 'feat/add-load-endpoints-with-arrow' of https://github.c…
TheoPascoli Oct 30, 2024
43a5e40
feat: Refactor load management to use LoadInfoDTO
TheoPascoli Oct 30, 2024
4e74852
feat: refactor load series API response model
TheoPascoli Oct 30, 2024
c99cdb1
feat: add support for pyarrow in load management
TheoPascoli Oct 30, 2024
0a4e6f5
feat: refactor load series api response model
TheoPascoli Oct 30, 2024
1b6dc2b
feat: add support for pyarrow in load management
TheoPascoli Oct 30, 2024
cae2930
Merge remote-tracking branch 'origin/feat/add-load-endpoints-with-arr…
TheoPascoli Oct 30, 2024
1b86d9a
feat: fix whitespace issue in load management module
TheoPascoli Oct 30, 2024
f45f065
feat: refactor load management response
TheoPascoli Oct 31, 2024
6506459
Merge remote-tracking branch 'origin/dev' into feat/add-load-endpoint…
TheoPascoli Oct 31, 2024
213cbaf
feat: refactor `get_load_matrix` method to improve readability
TheoPascoli Nov 4, 2024
7b5e0a0
feat: add integration test for load series endpoint
TheoPascoli Nov 4, 2024
37b7108
feat: add endpoint to update load series data
TheoPascoli Nov 6, 2024
1ef1ff1
feat: refactor LoadDTO and add LoadProperties model
TheoPascoli Nov 6, 2024
4a234e6
feat: refactor link path handling in link management.
TheoPascoli Nov 6, 2024
c0a82de
Merge remote-tracking branch 'origin/dev' into feat/add-load-endpoint…
TheoPascoli Nov 7, 2024
e91dc71
Merge remote-tracking branch 'origin/dev' into feat/add-load-endpoint…
TheoPascoli Nov 7, 2024
997803a
feat: refactor load series endpoint paths and add data conversion
TheoPascoli Nov 7, 2024
255b104
Merge branch 'dev' into feat/add-load-endpoints-with-arrow
TheoPascoli Nov 12, 2024
aff9c83
feat: remove JSON format support for load matrix endpoints
TheoPascoli Nov 12, 2024
d63cb8e
feat: add docstrings and fix IO imports in load management
TheoPascoli Nov 12, 2024
a522a22
feat: add docstrings and fix IO imports in load management
TheoPascoli Nov 12, 2024
8fe2863
Merge remote-tracking branch 'origin/dev' into feat/add-load-endpoint…
TheoPascoli Nov 12, 2024
9d55b36
Merge remote-tracking branch 'origin/feat/add-load-endpoints-with-arr…
TheoPascoli Nov 12, 2024
5df3960
feat: add docstrings and fix IO imports in load management
TheoPascoli Nov 12, 2024
09bc218
Merge branch 'dev' into feat/add-load-endpoints-with-arrow
TheoPascoli Nov 12, 2024
7e3b03a
feat: remove web considerations that were inside the business layer
TheoPascoli Nov 28, 2024
6277ae0
Merge remote-tracking branch 'origin/dev' into feat/add-load-endpoint…
TheoPascoli Dec 3, 2024
5fd6880
feat: set up GET endpoint that returns a arrow matrix with MatrixInde…
TheoPascoli Dec 5, 2024
feb957f
feat: remove useless load_model.py
TheoPascoli Dec 5, 2024
cb3d3f4
feat: remove useless load_model.py
TheoPascoli Dec 5, 2024
fe6acbd
feat: change way of dealing with the writing of the feather file
TheoPascoli Dec 5, 2024
3218843
Merge branch 'dev' into feat/add-load-endpoints-with-arrow
TheoPascoli Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions antarest/study/business/arrow_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Copyright (c) 2024, RTE (https://www.rte-france.com)
#
# See AUTHORS.txt
#
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
#
# SPDX-License-Identifier: MPL-2.0
#
# This file is part of the Antares project.
import os
import tempfile
import typing as t
from io import BytesIO

import pandas as pd
import pyarrow as pa
from pyarrow import feather
from pyarrow.feather import write_feather


def dataframe_to_bytes(df: pd.DataFrame, metadata: t.Optional[t.Dict[str | bytes, str | bytes]]) -> bytes:
table: pa.Table = pa.Table.from_pandas(df, preserve_index=False)

if metadata:
metadata_bytes = {str(k): str(v) for k, v in metadata.items()}
schema_metadata: t.Dict[str | bytes, str | bytes] = {k: v for k, v in metadata_bytes.items()}
table = table.replace_schema_metadata(schema_metadata)

buffer = BytesIO()
write_feather(df=table, dest=buffer) # type:ignore

return buffer.getvalue()


def bytes_to_dataframe(buffer: bytes) -> pd.DataFrame:
data = BytesIO(buffer)
table = feather.read_table(data)

df = table.to_pandas()

metadata = table.schema.metadata
if metadata:
df.metadata = {k.decode("utf8"): v.decode("utf8") for k, v in metadata.items()}

return df
55 changes: 55 additions & 0 deletions antarest/study/business/load_management.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright (c) 2024, RTE (https://www.rte-france.com)
#
# See AUTHORS.txt
#
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
#
# SPDX-License-Identifier: MPL-2.0
#
# This file is part of the Antares project.

import typing as t

import pandas as pd

from antarest.study.model import MatrixIndex, Study
from antarest.study.storage.rawstudy.model.filesystem.matrix.input_series_matrix import InputSeriesMatrix
from antarest.study.storage.storage_service import StudyStorageService
from antarest.study.storage.utils import get_start_date

LOAD_PATH = "input/load/series/load_{area_id}"
matrix_columns = ["ts-0"]


class LoadManager:
def __init__(self, storage_service: StudyStorageService) -> None:
self.storage_service = storage_service

def get_load_matrix(self, study: Study, area_id: str) -> t.Tuple[pd.DataFrame, t.Dict[str | bytes, str | bytes]]:
file_study = self.storage_service.get_storage(study).get_raw(study)
load_path = LOAD_PATH.format(area_id=area_id).split("/")

node = file_study.tree.get_node(load_path)

if not isinstance(node, InputSeriesMatrix):
raise TypeError(f"Expected node of type 'InputSeriesMatrix', but got '{type(node).__name__}'")

matrix_data = InputSeriesMatrix.parse(node, return_dataframe=True)

matrix_df = t.cast(pd.DataFrame, matrix_data)
matrix_df.columns = matrix_df.columns.map(str)

matrix_df.columns = pd.Index(matrix_columns)

matrix_index: MatrixIndex = get_start_date(file_study)

metadata: t.Dict[str | bytes, str | bytes] = {
"start_date": str(matrix_index.start_date),
"steps": str(matrix_index.steps),
"first_week_size": str(matrix_index.first_week_size),
"level": str(matrix_index.level),
}

return matrix_df, metadata
6 changes: 6 additions & 0 deletions antarest/study/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import typing as t
import uuid
from datetime import datetime, timedelta
from enum import StrEnum
from pathlib import Path

from antares.study.version import StudyVersion
Expand Down Expand Up @@ -550,6 +551,11 @@ def suffix(self) -> str:
return mapping[self]


class MatrixFormat(StrEnum):
JSON = "json"
ARROW = "arrow"


class StudyDownloadDTO(AntaresBaseModel):
"""
DTO used to download outputs
Expand Down
2 changes: 2 additions & 0 deletions antarest/study/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
from antarest.study.business.district_manager import DistrictManager
from antarest.study.business.general_management import GeneralManager
from antarest.study.business.link_management import LinkManager
from antarest.study.business.load_management import LoadManager
from antarest.study.business.matrix_management import MatrixManager, MatrixManagerError
from antarest.study.business.model.link_model import LinkBaseDTO, LinkDTO
from antarest.study.business.optimization_management import OptimizationManager
Expand Down Expand Up @@ -398,6 +399,7 @@ def __init__(
self.adequacy_patch_manager = AdequacyPatchManager(self.storage_service)
self.advanced_parameters_manager = AdvancedParamsManager(self.storage_service)
self.hydro_manager = HydroManager(self.storage_service)
self.load_manager = LoadManager(self.storage_service)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking for the other matrices, what is your vision here ? For hydro for instance, we want to add the code inside the HydroManager or do we want to create a specific manager on the side ? I personally prefer the 1st option

self.allocation_manager = AllocationManager(self.storage_service)
self.properties_manager = PropertiesManager(self.storage_service)
self.renewable_manager = RenewableManager(self.storage_service)
Expand Down
29 changes: 28 additions & 1 deletion antarest/study/web/study_data_blueprint.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

import typing_extensions as te
from fastapi import APIRouter, Body, Depends, Query
from starlette.responses import RedirectResponse
from starlette.responses import RedirectResponse, Response

from antarest.core.config import Config
from antarest.core.jwt import JWTUser
Expand Down Expand Up @@ -53,6 +53,7 @@
ThermalClusterOutput,
ThermalManager,
)
from antarest.study.business.arrow_utils import dataframe_to_bytes
from antarest.study.business.binding_constraint_management import (
ConstraintCreation,
ConstraintFilters,
Expand Down Expand Up @@ -543,6 +544,32 @@ def update_inflow_structure(
study = study_service.check_study_access(uuid, StudyPermissionType.WRITE, params)
study_service.hydro_manager.update_inflow_structure(study, area_id, values)

@bp.get(
"/studies/{uuid}/{area_id}/load/series",
tags=[APITag.study_data],
summary="Get load series data",
)
def get_load_series(
uuid: str,
area_id: str,
current_user: JWTUser = Depends(auth.get_current_user),
) -> Response:
"""Return the load matrix in ARROW format."""
logger.info(
msg=f"Getting load series data for area {area_id} of study {uuid}",
extra={"user": current_user.id},
)
params = RequestParameters(user=current_user)
study = study_service.check_study_access(uuid, StudyPermissionType.READ, params)

try:
df, metadata = study_service.load_manager.get_load_matrix(study, area_id)
except TypeError as e:
return Response(content=str(e), status_code=400)

buffer = dataframe_to_bytes(df, metadata)
return Response(content=buffer, media_type="application/vnd.apache.arrow.file")

@bp.put(
"/studies/{uuid}/matrix",
tags=[APITag.study_data],
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ pandas~=2.2.3
paramiko~=3.4.1
plyer~=2.0.0
psycopg2-binary~=2.9.9
pyarrow~=18.1.0
pyarrow-stubs~=10.0.1.7
py7zr~=0.20.6
python-json-logger~=2.0.7
PyYAML~=5.3.1
Expand Down
50 changes: 50 additions & 0 deletions tests/integration/study_data_blueprint/test_load.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright (c) 2024, RTE (https://www.rte-france.com)
#
# See AUTHORS.txt
#
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
#
# SPDX-License-Identifier: MPL-2.0
#
# This file is part of the Antares project.
from io import BytesIO

import pytest
from starlette.testclient import TestClient

from antarest.study.business.arrow_utils import bytes_to_dataframe
from tests.integration.prepare_proxy import PreparerProxy


@pytest.mark.unit_test
class TestLoad:
@pytest.mark.parametrize("study_type", ["raw", "variant"])
def test_load(self, client: TestClient, user_access_token: str, study_type: str) -> None:
client.headers = {"Authorization": f"Bearer {user_access_token}"} # type: ignore

preparer = PreparerProxy(client, user_access_token)
study_id = preparer.create_study("foo", version=880)

if study_type == "variant":
study_id = preparer.create_variant(study_id, name="Variant 1")

area1_id = preparer.create_area(study_id, name="Area1")["id"]

# Test simple get ARROW

res = client.get(f"/v1/studies/{study_id}/{area1_id}/load/series")
assert res.status_code == 200
assert res.headers["content-type"] == "application/vnd.apache.arrow.file"

df = bytes_to_dataframe(res.content)

column_name = ["ts-0"]
assert column_name == list(df.columns)
assert df.metadata == {
"first_week_size": "7",
"level": "hourly",
"start_date": "2018-01-01 00:00:00",
"steps": "8760",
}
Loading