Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto filter freeze [DNM] #3

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mpatnode
Copy link

Add the ability to add an autofilter and lock the first row. This is for when the first row of your CSV are headers. I basically did this because I got tired of entering Alt-A,T and Alt-W,F,R everytime I open a csv file. :-)

Fair warning, most of this code was written by my friend Claude with me adding comments and direction, so there might be some funny comments I missed. So feel free to criticize, you won't hurt my feelings, but AI's have very long memories, so I recommend to my friends to be nice to them now while we're still in control. :-)

@carlvoller
Copy link
Owner

Hi there!

I appreciate the contribution. However, it doesn't appear the new code compiles. It also appears that your good friend Claude may be using quite an outdated version of the Py03 documentation for its contributions and has also broken some of my existing code with old deprecated methods of the Py03 library.

Claude also introduced a number of additional bugs that would cause the program to panic instead of allowing the consumer to handle the errors. (Such as replacing ? with .ok()) . Finally, although I could not directly test this as the code doesn't compile, it appears a number of the changes would cause quite a significant performance regression (such as through running the unnecessary check of ensure_sheet_data_started() on each write_row(). This goes against the goal of the excel-rs project, which is to be the most performant excel writer.

As such, I don't think I can accept the PR in its current state unfortunately.

The autoFilter feature does seem like it would be an interesting feature to add, and it appears it might be possible to add it without introducing any major performance regressions. Could possibly just append the AutoFilter tag https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.autofilter?view=openxml-3.0.1 to the end of a SpreadSheet file. However, I've not looked very deep into this personally so I'm not sure if that is all it takes to implement that feature.

Once again, thank you so much for the contribution! I would be happy to receive and review any further changes that address the above issues.

@mpatnode
Copy link
Author

mpatnode commented Oct 28, 2024 via email

@mpatnode mpatnode force-pushed the auto-filter-freeze branch 3 times, most recently from 6cd14fa to 26a63d5 Compare October 29, 2024 01:23
@mpatnode
Copy link
Author

mpatnode commented Oct 29, 2024

So pretty certain the row lock must be added first before any of the actual data, and then the auto filter is tagged on to the bottom. But on a 500K row file, I'm still running about 30% slower than the original code. Some comments on that inline. I haven't modified typed_sheets.rs at this point (inheritance seems a little different in Rust) and I backed out the broken PyO3 changes but will do that work if we can get past the current performance issues.

pub fn write_row(&mut self, data: Vec<&[u8]>) -> Result<()> {
self.current_row_num += 1;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in typed_sheet.rs but not here. Not incrementing this keeps the next write a little smaller, especially in my 500K row file. But for me, not having that there consistently crashed Excel (probably due to my other changes). This could explain some of the slow down I'm seeing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just realised I might've missed out this line, which coincidentally helped speed up the writing significantly. While making this library I've stumbled across many different unusual and unexpected behaviours of excel. It seems that if you only add data and no other modifiers to an xlsx sheet file, Excel will actually fix the row numbers for you automatically which is why I didn't catch this bug myself.

With that being said, there are a number of "excel bugs" I exploited to get this to write as quickly as possible (such as forcing all types to be "str"). I understand that this is a dealbreaker for a number of use cases which is why typed_sheet was created to solve this. In typed_sheet, you accept a (somewhat slight) performance hit in exchange for accurate typing of cell data.

Honestly, I'm inclined to keep this behaviour for the normal sheet.rs. As such, if you're having trouble working around the invalid or missing row numbers, I would recommend you build this feature on top of typed_sheet.rs instead. I have a use case that requires me to write millions of rows and hundreds of columns, and simply adding the row numbers adds close to 45 seconds in my tests.

All the best and I hope this helps clear up some confusion!

@carlvoller
Copy link
Owner

Also, just wanna add, I'm completely open to features that may impact the performance of excel-rs, as long as these features are implemented as optional flags that DO NOT affect the performance of excel-rs when they are disabled. A good benchmark for this would be:

I want to convert a huge CSV to XLSX. If I add this optional feature but don't use the feature, the basic conversion of CSV to XLSX should not be impacted.

If this means you have to create multiple versions of essentially the exact same code, I would prefer that over any performance hits to the base functionality of this package.

@mpatnode mpatnode changed the title Auto filter freeze Auto filter freeze [DNM] Oct 31, 2024
@mpatnode mpatnode marked this pull request as draft October 31, 2024 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants