Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine engine_options and read_options into a single parameter in read_excel #17265

Open
mcrumiller opened this issue Jun 28, 2024 · 7 comments
Assignees
Labels
A-io-spreadsheet Area: reading/writing Excel/ODS files enhancement New feature or an improvement of an existing feature

Comments

@mcrumiller
Copy link
Contributor

mcrumiller commented Jun 28, 2024

Description

Somewhat related to #17263.

read_excel has two parameters, engine_options and read_options:

  • engine_options - Additional options passed to the underlying engine’s primary parsing constructor (given below), if supported
  • read_options - Options passed to the underlying engine method that reads the sheet data

This is really confusing from a user perspective. What's the difference between engine options that parse the data and engine options that reads the data? IMO, it would be easier to provide a universal set of options that are converted to the engine-specific options behind the scenes. All engines provide the same functionality for reading at this point, so it would be nice to instead have the following parameters (feel free to add):

  • sheet_name - specifies sheet by name
  • sheet_id - specifies sheet by ID, 1-indexed. If 0, return all sheets as a dict.
  • range - e.g. B3:AC52 or something like [(2,5), (83, 15)]
  • dtypes - dtypes of columns (already available as "schema_overrides")
@mcrumiller mcrumiller added the enhancement New feature or an improvement of an existing feature label Jun 28, 2024
@stinodego
Copy link
Member

I'm pretty sure @alexander-beedie already has a design for this (see #15808 (comment)). Not sure it will make it into 1.0.0 though.

@stinodego stinodego added the A-io-spreadsheet Area: reading/writing Excel/ODS files label Jun 28, 2024
@alexander-beedie
Copy link
Collaborator

I do, and it's "mostly" option 3, where the most common options are all exposed at the top-level and we convert them to engine-specific calls ourselves so the user doesn't have to (as we're now doing with schema_overrides, for example).

We will still maintain read_options and engine_options for what should become a vanishingly small number of cases not handled by the top-level parameters.

@mcrumiller
Copy link
Contributor Author

Thanks; should I go ahead and close this one?

@alexander-beedie alexander-beedie self-assigned this Jun 28, 2024
@alexander-beedie
Copy link
Collaborator

Thanks; should I go ahead and close this one?

It's ok, can leave it here so I have something to focus on and close when it's done (hopefully very shortly) 🤣

@alexander-beedie
Copy link
Collaborator

FYI: #17263 now contains a common columns param - more to come...

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 7, 2024

...and just added a common has_header param for all engines too👌

@acxz
Copy link

acxz commented Sep 7, 2024

I'd like to comment on the suggestion for range with a +1.
Often times spreadsheets have extra information in additional cells, so it's important to build the dataframe from a specified range or something like an excel table (in calamine and usage)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-spreadsheet Area: reading/writing Excel/ODS files enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants