Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement sync_diffs_filediffs #47

Open
Cattes opened this issue Aug 28, 2020 · 0 comments
Open

implement sync_diffs_filediffs #47

Cattes opened this issue Aug 28, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@Cattes
Copy link

Cattes commented Aug 28, 2020

To reduce the memory required for writing large dataframes, a new mode sync_filediffs is being implemented in the mysql.Connection class.

The approach is to do as much as possible out of memory.
On receiving a dataframe, the df is written to disk.

The db table which should be updated is also downloaded chunkwise to disk.

Then the filediffs package is used to find the differences between the two dataframes and save them to disk.

After that the update part and the delete part are read back into memory and the database is updated.

A first version is already implemented on the sync_filediffs branch.

Still open Issues are

  1. The verbose logging has to be improved so it integrates better into the codebase.
  2. The temporary file management has to be improved.
  3. The query method's output format. Changing it seems to be a breaking change.
@Cattes Cattes added the enhancement New feature or request label Aug 28, 2020
@Cattes Cattes self-assigned this Aug 28, 2020
@Cattes Cattes removed their assignment Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant