-
Notifications
You must be signed in to change notification settings - Fork 4
WantedIndex
Notes to self on strange-seeming development choices, a.k.a. This is Why You Did That Stupid Thing
The processing of each subscription channel needs to deal with two different information sources, the feed and locally stored information (from an earlier feed grab). Then the processing needs to answer two questions for each entry:
- Do we want the file?
- Do we already have the file?
While it is possible to run through the feed and locally stored information separately, this will lead to a doubling of effort. Instead we amalgate the information into one combo feed and in so doing, we collapse the decision tree. Since some podcasts keep short memories it is likely that at some point previously downloaded entries will no longer be included in the available feed. We therefore need to store that information locally and to integrate it with the information contained in the feed.
Poca relies on the order of entries in lists to tell which is the more recent. Therefore we add the previously downloaded entries that are no longer in the current feed to the end of the list, assuming that they are older. Should space requirements dictate that there is no longer sufficient space for the new as well as the old files, the latter should be the first to go.
Determining the answer to the first question is simply a matter of adding entries to the wanted list, from more recent to less recent, until the disk space quota would be broken by adding more. As for the second question, we look for the wanted entries in the 'have already' list and download if it isn't there already.
Originally poca created a new history file every time new files were downloaded. This wasn't optimal seeing as a crash or disconnect would mean that the pre-existing files would then be purged from the database (as they came later in the list and would only get added once the downloads were finished. Which would never happen in case the program got interrupted.
Therefore, rather than creating a new history file, we work with the old one, removing old entries and inserting new ones in the correct place. This 'correct place' is the entry's place - or index - in the Wanted.lst. This will insert newer files before older ones.
Suppose the history contains two old files (jar.lst[0] and jar.lst[1]) but the wanted.lst has a new file (wanted.lst[0]) and has the two older files at wanted.lst[1] and wanted.lst[2] respectively. The new file will get inserted in jar.lst at it's wanted index, i.e. 0. This will push the old files in jar.lst to the same index position as in wanted.
Suppose further that the user has increased the max_number so that now the sub has room for 4 episodes. This will get us a further older episode with a wanted index number of 3. When this is downloaded it will then get inserted in jar.lst at position 3 - i.e. after the old episodes which are now at 1 and 2, respectively.
Take the same case but now there are two new episodes. However, now the first download fails. This is a problem because the wanted index is now out of joint: The second file's wanted index is 1. Inserting this file at 1 would push it in between the two old files, i.e. not in the order the episodes actually have been published. Therefore we count the number of failed downloads and subtract this number from every wantedindex. As there can be no failures before the first download it does not affect it. The second download will get it's index reduced by 1 so it's 0. Inserting it at index 0 puts it rightly first, before the old files.
Note that none of this affect the index of the files already in the history file as they are not re-written but simply remain in place.