-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto filter freeze [DNM] #3
base: main
Are you sure you want to change the base?
Conversation
Hi there! I appreciate the contribution. However, it doesn't appear the new code compiles. It also appears that your good friend Claude may be using quite an outdated version of the Py03 documentation for its contributions and has also broken some of my existing code with old deprecated methods of the Py03 library. Claude also introduced a number of additional bugs that would cause the program to panic instead of allowing the consumer to handle the errors. (Such as replacing As such, I don't think I can accept the PR in its current state unfortunately. The autoFilter feature does seem like it would be an interesting feature to add, and it appears it might be possible to add it without introducing any major performance regressions. Could possibly just append the AutoFilter tag https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.autofilter?view=openxml-3.0.1 to the end of a SpreadSheet file. However, I've not looked very deep into this personally so I'm not sure if that is all it takes to implement that feature. Once again, thank you so much for the contribution! I would be happy to receive and review any further changes that address the above issues. |
How embarrassing. :-(
mp
…On Mon, Oct 28, 2024, 1:33 AM Carl Ian Voller ***@***.***> wrote:
Hi there!
I appreciate the contribution. However, it doesn't appear the new code
compiles. It also appears that your good friend Claude may be using quite
an outdated version of the Py03 documentation for its contributions and has
also broken some of my existing code with old deprecated methods of the
Py03 library.
Claude also introduced a number of additional bugs that would cause the
program to panic instead of allowing the consumer to handle the errors.
(Such as replacing ? with .ok()) . Finally, although I could not directly
test this as the code doesn't compile, it appears a number of the changes
would cause quite a significant performance regression (such as through
running the unnecessary check of ensure_sheet_data_started() on each
write_row(). This goes against the goal of the excel-rs project, which is
to be the most performant excel writer.
As such, I don't think I can accept the PR in its current state
unfortunately.
The autoFilter feature does seem like it would be an interesting feature
to add, and it appears it might be possible to add it without introducing
any major performance regressions. Could possibly just append the
AutoFilter tag
https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.autofilter?view=openxml-3.0.1
to the end of a SpreadSheet file. However, I've not looked very deep into
this personally so I'm not sure if that is all it takes to implement that
feature.
Once again, thank you so much for the contribution! I would be happy to
receive and review any further changes that address the above issues.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANXRUCVJBJP4DSIYQNS5ETZ5XZHHAVCNFSM6AAAAABQWJNJZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBQHA3DQOJVHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
6cd14fa
to
26a63d5
Compare
So pretty certain the row lock must be added first before any of the actual data, and then the auto filter is tagged on to the bottom. But on a 500K row file, I'm still running about 30% slower than the original code. Some comments on that inline. I haven't modified typed_sheets.rs at this point (inheritance seems a little different in Rust) and I backed out the broken PyO3 changes but will do that work if we can get past the current performance issues. |
pub fn write_row(&mut self, data: Vec<&[u8]>) -> Result<()> { | ||
self.current_row_num += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in typed_sheet.rs but not here. Not incrementing this keeps the next write a little smaller, especially in my 500K row file. But for me, not having that there consistently crashed Excel (probably due to my other changes). This could explain some of the slow down I'm seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just realised I might've missed out this line, which coincidentally helped speed up the writing significantly. While making this library I've stumbled across many different unusual and unexpected behaviours of excel. It seems that if you only add data and no other modifiers to an xlsx sheet file, Excel will actually fix the row numbers for you automatically which is why I didn't catch this bug myself.
With that being said, there are a number of "excel bugs" I exploited to get this to write as quickly as possible (such as forcing all types to be "str"). I understand that this is a dealbreaker for a number of use cases which is why typed_sheet was created to solve this. In typed_sheet, you accept a (somewhat slight) performance hit in exchange for accurate typing of cell data.
Honestly, I'm inclined to keep this behaviour for the normal sheet.rs. As such, if you're having trouble working around the invalid or missing row numbers, I would recommend you build this feature on top of typed_sheet.rs instead. I have a use case that requires me to write millions of rows and hundreds of columns, and simply adding the row numbers adds close to 45 seconds in my tests.
All the best and I hope this helps clear up some confusion!
Also, just wanna add, I'm completely open to features that may impact the performance of excel-rs, as long as these features are implemented as optional flags that DO NOT affect the performance of excel-rs when they are disabled. A good benchmark for this would be: I want to convert a huge CSV to XLSX. If I add this optional feature but don't use the feature, the basic conversion of CSV to XLSX should not be impacted. If this means you have to create multiple versions of essentially the exact same code, I would prefer that over any performance hits to the base functionality of this package. |
26a63d5
to
338d354
Compare
Add the ability to add an autofilter and lock the first row. This is for when the first row of your CSV are headers. I basically did this because I got tired of entering Alt-A,T and Alt-W,F,R everytime I open a csv file. :-)
Fair warning, most of this code was written by my friend Claude with me adding comments and direction, so there might be some funny comments I missed. So feel free to criticize, you won't hurt my feelings, but AI's have very long memories, so I recommend to my friends to be nice to them now while we're still in control. :-)