Auto filter freeze [DNM] #3

mpatnode · 2024-10-28T00:03:59Z

Add the ability to add an autofilter and lock the first row. This is for when the first row of your CSV are headers. I basically did this because I got tired of entering Alt-A,T and Alt-W,F,R everytime I open a csv file. :-)

Fair warning, most of this code was written by my friend Claude with me adding comments and direction, so there might be some funny comments I missed. So feel free to criticize, you won't hurt my feelings, but AI's have very long memories, so I recommend to my friends to be nice to them now while we're still in control. :-)

carlvoller · 2024-10-28T08:33:32Z

Hi there!

I appreciate the contribution. However, it doesn't appear the new code compiles. It also appears that your good friend Claude may be using quite an outdated version of the Py03 documentation for its contributions and has also broken some of my existing code with old deprecated methods of the Py03 library.

Claude also introduced a number of additional bugs that would cause the program to panic instead of allowing the consumer to handle the errors. (Such as replacing ? with .ok()) . Finally, although I could not directly test this as the code doesn't compile, it appears a number of the changes would cause quite a significant performance regression (such as through running the unnecessary check of ensure_sheet_data_started() on each write_row(). This goes against the goal of the excel-rs project, which is to be the most performant excel writer.

As such, I don't think I can accept the PR in its current state unfortunately.

The autoFilter feature does seem like it would be an interesting feature to add, and it appears it might be possible to add it without introducing any major performance regressions. Could possibly just append the AutoFilter tag https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.autofilter?view=openxml-3.0.1 to the end of a SpreadSheet file. However, I've not looked very deep into this personally so I'm not sure if that is all it takes to implement that feature.

Once again, thank you so much for the contribution! I would be happy to receive and review any further changes that address the above issues.

mpatnode · 2024-10-28T14:18:18Z

How embarrassing. :-( mp

…

On Mon, Oct 28, 2024, 1:33 AM Carl Ian Voller ***@***.***> wrote: Hi there! I appreciate the contribution. However, it doesn't appear the new code compiles. It also appears that your good friend Claude may be using quite an outdated version of the Py03 documentation for its contributions and has also broken some of my existing code with old deprecated methods of the Py03 library. Claude also introduced a number of additional bugs that would cause the program to panic instead of allowing the consumer to handle the errors. (Such as replacing ? with .ok()) . Finally, although I could not directly test this as the code doesn't compile, it appears a number of the changes would cause quite a significant performance regression (such as through running the unnecessary check of ensure_sheet_data_started() on each write_row(). This goes against the goal of the excel-rs project, which is to be the most performant excel writer. As such, I don't think I can accept the PR in its current state unfortunately. The autoFilter feature does seem like it would be an interesting feature to add, and it appears it might be possible to add it without introducing any major performance regressions. Could possibly just append the AutoFilter tag https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.autofilter?view=openxml-3.0.1 to the end of a SpreadSheet file. However, I've not looked very deep into this personally so I'm not sure if that is all it takes to implement that feature. Once again, thank you so much for the contribution! I would be happy to receive and review any further changes that address the above issues. — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANXRUCVJBJP4DSIYQNS5ETZ5XZHHAVCNFSM6AAAAABQWJNJZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBQHA3DQOJVHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

mpatnode · 2024-10-29T01:29:31Z

So pretty certain the row lock must be added first before any of the actual data, and then the auto filter is tagged on to the bottom. But on a 500K row file, I'm still running about 30% slower than the original code. Some comments on that inline. I haven't modified typed_sheets.rs at this point (inheritance seems a little different in Rust) and I backed out the broken PyO3 changes but will do that work if we can get past the current performance issues.

mpatnode · 2024-10-29T01:33:44Z

crates/excel-rs-xlsx/src/sheet.rs

    pub fn write_row(&mut self, data: Vec<&[u8]>) -> Result<()> {
+        self.current_row_num += 1;


This is done in typed_sheet.rs but not here. Not incrementing this keeps the next write a little smaller, especially in my 500K row file. But for me, not having that there consistently crashed Excel (probably due to my other changes). This could explain some of the slow down I'm seeing.

Yeah I just realised I might've missed out this line, which coincidentally helped speed up the writing significantly. While making this library I've stumbled across many different unusual and unexpected behaviours of excel. It seems that if you only add data and no other modifiers to an xlsx sheet file, Excel will actually fix the row numbers for you automatically which is why I didn't catch this bug myself.

With that being said, there are a number of "excel bugs" I exploited to get this to write as quickly as possible (such as forcing all types to be "str"). I understand that this is a dealbreaker for a number of use cases which is why typed_sheet was created to solve this. In typed_sheet, you accept a (somewhat slight) performance hit in exchange for accurate typing of cell data.

Honestly, I'm inclined to keep this behaviour for the normal sheet.rs. As such, if you're having trouble working around the invalid or missing row numbers, I would recommend you build this feature on top of typed_sheet.rs instead. I have a use case that requires me to write millions of rows and hundreds of columns, and simply adding the row numbers adds close to 45 seconds in my tests.

All the best and I hope this helps clear up some confusion!

carlvoller · 2024-10-29T05:30:42Z

Also, just wanna add, I'm completely open to features that may impact the performance of excel-rs, as long as these features are implemented as optional flags that DO NOT affect the performance of excel-rs when they are disabled. A good benchmark for this would be:

I want to convert a huge CSV to XLSX. If I add this optional feature but don't use the feature, the basic conversion of CSV to XLSX should not be impacted.

If this means you have to create multiple versions of essentially the exact same code, I would prefer that over any performance hits to the base functionality of this package.

mpatnode force-pushed the auto-filter-freeze branch 3 times, most recently from 6cd14fa to 26a63d5 Compare October 29, 2024 01:23

mpatnode commented Oct 29, 2024

View reviewed changes

mpatnode added 2 commits October 30, 2024 21:10

Working filter and row lock

5d14141

Expose Filter/Rowlock to Python

338d354

mpatnode force-pushed the auto-filter-freeze branch from 26a63d5 to 338d354 Compare October 31, 2024 04:10

mpatnode changed the title ~~Auto filter freeze~~ Auto filter freeze [DNM] Oct 31, 2024

mpatnode marked this pull request as draft October 31, 2024 04:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto filter freeze [DNM] #3

Auto filter freeze [DNM] #3

mpatnode commented Oct 28, 2024

carlvoller commented Oct 28, 2024

mpatnode commented Oct 28, 2024 via email •

edited

Loading

mpatnode commented Oct 29, 2024 •

edited

Loading

mpatnode Oct 29, 2024

carlvoller Oct 29, 2024

carlvoller commented Oct 29, 2024

		pub fn write_row(&mut self, data: Vec<&[u8]>) -> Result<()> {
		self.current_row_num += 1;

Auto filter freeze [DNM] #3

Are you sure you want to change the base?

Auto filter freeze [DNM] #3

Conversation

mpatnode commented Oct 28, 2024

carlvoller commented Oct 28, 2024

mpatnode commented Oct 28, 2024 via email • edited Loading

mpatnode commented Oct 29, 2024 • edited Loading

mpatnode Oct 29, 2024

Choose a reason for hiding this comment

carlvoller Oct 29, 2024

Choose a reason for hiding this comment

carlvoller commented Oct 29, 2024

mpatnode commented Oct 28, 2024 via email •

edited

Loading

mpatnode commented Oct 29, 2024 •

edited

Loading