Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Handling of Duplicate Columns with different Units #250

Closed
wfjvdham opened this issue May 8, 2024 · 12 comments
Closed

[BUG] Handling of Duplicate Columns with different Units #250

wfjvdham opened this issue May 8, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@wfjvdham
Copy link

wfjvdham commented May 8, 2024

Describe the bug

Currently columns are checked for duplicated and renamed if the combination of column name and unit is the same (in ExcelFileStore._group_columns_by_index()). When only the column name is the same but the unit is different the columns are read as is. This can cause a problem later when using TabularData()._apply_unit_conversion()

To Reproduce

If for example an excel sheet contains two columns:

  • P with unit MW and with numerical values
  • P without unit and with string values

Those are not renamed because they are not identified as being a duplicate column. But here:

The value of field is just P so the content of both columns is returned giving an error because string values can not be converted to W.

Expected behavior

The problem can be solved by changing table_data[field] *= multiplier by table_data[(field, unit)] *= multiplier but I am not sure if two columns with the same name should exists (even though they have different units). Maybe better would be to do the grouping on only the first value of the index in the ExcelFileStore._group_columns_by_index() function.

@wfjvdham wfjvdham added the bug Something isn't working label May 8, 2024
@mgovers
Copy link
Member

mgovers commented May 8, 2024

Hi Wim,

Thank you for your clear report and repro case. We will investigate.

@Jerry-Jinfeng-Guo
Copy link
Contributor

Dear Wim @wfjvdham ,

Thank you for reporting the issue.

This is an known issue for us. The issue lies more on the Vision side as we assume the users would provide input up to specs as per Pandas dictation. Since the excel handling within PGM-io was built on top of Pandas, the solution for now is to not include the extra information in an additional P field when exporting from Vision. We will make a more permanent solution in the coming sprints based on priorities.

@wfjvdham
Copy link
Author

wfjvdham commented May 8, 2024

I can also make the solution, what would be the right way to do it?

  • Rename the columns internally to P and P_1
  • Change table_data[field] *= multiplier by table_data[(field, unit)] *= multiplier

@mgovers
Copy link
Member

mgovers commented May 8, 2024

what would be the right way to do it?

* Rename the columns internally to `P` and `P_1`

* Change `table_data[field] *= multiplier` by `table_data[(field, unit)] *= multiplier`

it is on our agenda for the next refinement this afternoon. If we get to a design, we will post it here.

I can also make the solution

After that, whoever wants to pick it up can do that (based on priorities and availability between team AKI and team DGC). It's good to know that that's an option as well!

@mgovers
Copy link
Member

mgovers commented May 8, 2024

@wfjvdham the conclusion of the refinement is that there's some unknowns surrounding the (un)stability of the Vision Excel export. Next sprint, we will first do some more investigation to this before we finalize a design. Implementation will probably come the sprint after that, as agreed with Jaap.

@Jerry-Jinfeng-Guo
Copy link
Contributor

@wfjvdham Please find explicit addressing of this issue in the documentation as well. #251 N.B., the solution offered in the documentation is based on the user feedback from GOAT, who up until this point has experienced no problems with this solution.

@wfjvdham
Copy link
Author

Ok thanks 👍

@Jaap-van-Wijck
Copy link

Hi @Jerry-Jinfeng-Guo @petersalemink95 , do you have any updates about this bug?

I read #251 and it does not seem to provide a solution for us. Our team automates netchecks for K&O, and many usefull/critical information is stored in the "specifics" / "Bijzonderheden" tabs in Vision. For example MSR ID's, HLD ID's and "special contract types".

Wim has made a suggestion above how to solve this bug. Are there reasons not to implement this solution? If develop-capacity is an issue, we can also create a PR so that you only need to review. We can also create a feature branch to test the changes on all Vision files of all regions before merging to main. I prefer working (on the long term) from the main branch of PGM io for AKI, instead of using a feature branch ...

@Jerry-Jinfeng-Guo
Copy link
Contributor

Jerry-Jinfeng-Guo commented Jul 1, 2024

Hi @Jaap-van-Wijck , thank you for letting us know that disabling "specifics" is not enough for you. Internally renaming P and P_1 is probably not the best way to go given the unstable nature of Vision. On top of this, not all teams depend on the "specifics" at the same level, the introduced logic can be a potential source of bug.

I would like to ask you the following questions:

  • This might be first thought to you, but have you tried clean up the sheet before feeding to pgm-io?
  • There is a terms_changed input parameter that could be used (to change column names) when instancing the VisionExcel converter, have you tried that?
  • One other approach you could try is to play with the yaml configuration file. In *_loads, you could find P under multiply field. Have you tried this?

If above is not up to your expectation, please let us know and we will bring that up during the team meetings.

@wfjvdham
Copy link
Author

wfjvdham commented Jul 1, 2024

@Jerry-Jinfeng-Guo

  • Cleaning up the sheet before feeding it to pgm-io manually you mean? We want to do the import automatically so that is not an option for us.
  • Using terms_changed is not an option I think, because there are two columns with the same name so how can I rename only 1 of them?
  • We use P under the multiply field, the problem is that is also tries to convert the other P column which contains strings.

Before I suggested 2 options, 1 was renaming one of the columns. If that is not the best way to go what about the other option I suggested:

  • Change table_data[field] *= multiplier by table_data[(field, unit)] *= multiplier

There nothing changes to the names of the columns

@mgovers mgovers moved this to Q3 2024 in Power Grid Model Jul 3, 2024
@Jerry-Jinfeng-Guo
Copy link
Contributor

Hi @wfjvdham @Jaap-van-Wijck , please see: #261

@Jerry-Jinfeng-Guo
Copy link
Contributor

Closed by #261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Q3 2024
Development

No branches or pull requests

4 participants