Extraction starts from the beginning in the retry cycle in case of an error #1

vertuk · 2024-02-05T16:16:05Z

It goes like this:

$ vkimexp -b brave [ID]
Estimating…  1377 queries, 137565 messages
------------------------------------------------------------------------------
S     1/1377  0.07%  offset 137530…    213b       0   
S     2/1377  0.15%  offset 137430…    213b       0   
S     3/1377  0.22%  offset 137330…    213b       0   
S     4/1377  0.29%  offset 137230…   110kb      77   (5)I                  
S     5/1377  0.36%  offset 137130…   133kb     100   III                   
S     6/1377  0.44%  offset 137030…   132kb     100   IIP                   
S     7/1377  0.51%  offset 136930…   166kb     100   (6)I(7)P              
S     8/1377  0.58%  offset 136830…   170kb     100   (6)I(5)P
...
S   190/1377  13.8%  offset 118630…   124kb     100   P                     
S   191/1377  13.9%  offset 118530…   135kb     100   (5)P                  
S   192/1377  13.9%  offset 118430…   134kb     100   (19)I(10)P            
S   193/1377  14.0%  offset 118330…   151kb     100   (10)P                 
·   194/1377  14.1%  offset 118230…   139kb     100+  IPP·                  expected str, bytes or os.PathLike object, not NoneType
Attempt 2/10, will retry in 3.8 seconds...
Estimating…  1377 queries, 137565 messages
------------------------------------------------------------------------------
S     1/1377  0.07%  offset 137530…    213b       0   
S     2/1377  0.15%  offset 137430…    213b       0   
S     3/1377  0.22%  offset 137330…    213b       0   
S     4/1377  0.29%  offset 137230…   110kb      77   (5)I                  
S     5/1377  0.36%  offset 137130…   133kb     100   III                   
S     6/1377  0.44%  offset 137030…   132kb     100   IIP                   
S     7/1377  0.51%  offset 136930…   166kb     100   (6)I(7)P

You see? It starts to cycle through the data from the beginning, not from the place it got an error (as i would imagine it should). And this repeats until it retries it 10 times, as indicated in Attempt 2/10, will retry in 3.8 seconds...
Thankfully it doesn't redownload everything every time, just kinda goes through it probably just checking existence and integrity of already downloaded data about one line a second, sometimes up to 10, maybe 30 seconds a line, but it still adds up.
And it doesn't go past such place until it runs out of retries.

My setup:
OS: Ubuntu 23.10
Python: Python 3.12.0 (main, Oct 4 2023, 06:27:34) [GCC 13.2.0]

The text was updated successfully, but these errors were encountered:

delameter · 2024-02-05T16:59:27Z

That is an expected behaviour, because there is no mechanism for resuming failed export (yet). The good news are that I was going to implement it, but not at the time of first release, rather a bit later.

Tracking and therefore preserving media files (photos etc) is much simpler because 1) each of these has a unique hash from the start and 2) this hash does not change.

Resuming the message export is a bit more complicated, because the app must keep all earlier messages somewhere in the memory or on the disk, otherwise the output files will contain less data than it potentionally could; consider this example: the export process downloaded 3 pages of the history and failed, then the next attempt started from page 4, which contain a reply to a message from page 3. App from the first attempt had this message from pg.3 in memory and could insert the quotation of earlier message near the later one. But the app after restart was downloading the messages right from page 4 and therefore is not able to get data for quoted message.

So correct implementation should not only resume the export from some preserved page number, but also recover full state of the previous export attempt. Which means that this state should be also written somewhere to begin with. It is not really hard to implement this, rather it's hard to make the feature flawless and elegant from the start. Thats why it was deferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extraction starts from the beginning in the retry cycle in case of an error #1

Extraction starts from the beginning in the retry cycle in case of an error #1

vertuk commented Feb 5, 2024

delameter commented Feb 5, 2024 •

edited

Loading

Extraction starts from the beginning in the retry cycle in case of an error #1

Extraction starts from the beginning in the retry cycle in case of an error #1

Comments

vertuk commented Feb 5, 2024

delameter commented Feb 5, 2024 • edited Loading

delameter commented Feb 5, 2024 •

edited

Loading