Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the local computer index if possible #131

Closed
Hugal31 opened this issue Oct 9, 2020 · 7 comments
Closed

Use the local computer index if possible #131

Hugal31 opened this issue Oct 9, 2020 · 7 comments
Labels

Comments

@Hugal31
Copy link

Hugal31 commented Oct 9, 2020

As it is possible for .deb files with --cache-dir, it would be nice to have the possibility to use the index files in /var/lib/apt/list to avoid to re-download them.

This would allow to benefit from the patch-delta update algorithm of apt (see #29) without implementing it, if the two computers share some source lists.

@rickysarraf
Copy link
Owner

Well.. .debs are that way. When we generate the signature, the apt db has all the metadata about a .deb file's integrity. It is self-contained and doesn't change its form.

In case of the apt metadata, it is not that simple. We'll have to extract the base signed file, that stores the entire checksum, then correlate it with unpacked data, with varying names. And then see if anything is worthy of re-usability.

It is a very unusual use case.

@rickysarraf
Copy link
Owner

Or the other way would be to just pick everything as is, and then, on the no-network machine, simply run it. If the files are tampered with, the signatures would be invalid.

This wouldn't really be a cache kind operation, but rather simply a copy-data kind operation, which would bridge/replace the entire download_from_web operation with a copy_from_local_apt kind operation.

Sounds like a decent feature. But I won't have any near term time to get this done. I anyone else would like to propose this as a PR, that's welcome.

@Hugal31
Copy link
Author

Hugal31 commented Oct 12, 2020

I have a script to apply apt-offline on multiple machines, and it happen that the "host" machine (the one connected to internet) has the same (or almost) archives list than the "no-network" machines.

What I do, I run apt update and apt-offline set --update for the "host" machine, and then compare the "signature" (.sig) with the signature of the "no-network" machines using for instance comm -12 to detect common files. Then I remove those common files from the "no-network" machine sig file to download what I am missing, and merge the two sets of file. It's a bit hacky, but it manage to reduce the time of a "no-op" apt update from 15 minutes to 30 seconds with my internet connection.

Of course, the ideal would be to implement #29.

@sanjarcode
Copy link

sanjarcode commented Feb 28, 2021

This would be a great feature. I am running Ubuntu 20.04 and run several VM's of the same. It is becoming wasteful to redownload updates for all my VMs which are just one update behind the host OS.

Can apt-offline take a folder(for example var/cache/apt as an argument, and include only packages that are secure and required as per the .sig file? This would be very efficient.

@rickysarraf
Copy link
Owner

rickysarraf commented Mar 1, 2021

I'm going to mark this closed. But I'll give you my thought of the best way to achieve a workflow where you have a heterogeneous setup.

Machine 1: Ubutnu Blah
Machine 2: Debian Unstable
Machine 3: Debian Testing
Machine 4: Debian GNU/BSD
Machine 5: Debian GNU/Hurd

  1. Run apt-offline set /tmp/machine-num.sig --update on each machine.
  2. Collate the sig data from all 5 machine into a single file
  3. Take the file to the machine with network
  4. Download all data with apt-offline get /tmp/collated.sig --bundle /tmp/collated-data.zip
  5. Bring back that data file to each machine on your setup
  6. Every machine will read the data and only pick the ones that are relevant to them.

Given this workflow will fulfill this use case, I see no point in over-engineering apt-offline. Please do feel free to re-open in case you have a different point.

@sanjarcode
Copy link

sanjarcode commented Mar 2, 2021

I think you missed a point @rickysarraf. The workflow you provided does save internet data, but there's still one part left. Can apt-offline do this(pseudocode to save wordy sentences):

# at the 'online computer'

for pckg in collated.sig:
    if pckg in host_pc:
        if checksum(local_pckg) == checksum_pckg_at_soure: # uses very less data
          use the local pckg(i.e skip download)
    else:
           continue as usual     # uses significant data

This will significantly minimize internet consumption, especially for a more homogeneous setup.

It'd be great if you could provide a workflow(if this doesn't seem like a worthy feature).
Thanks for your previous answer.👍️

@rickysarraf
Copy link
Owner

I think you missed a point @rickysarraf. The workflow you provided does save internet data, but there's still one part left. Can apt-offline do this(pseudocode to save wordy sentences):

# at the 'online computer'

for pckg in collated.sig:
    if pckg in host_pc:
        if checksum(local_pckg) == checksum_pckg_at_soure: # uses very less data
          use the local pckg(i.e skip download)
    else:
           continue as usual     # uses significant data

No. That apt-offline doesn't do. Because the downloaded indices, for which the checksum are available in digital signatures, are in some sort of compressed format. OTOH, the downlaoded indices, after verification, do have the possibility of being unpacked and saved in the /var/lib/apt/lists/ db, for performance reasons.

This will significantly minimize internet consumption, especially for a more homogeneous setup.

You are picking a very very corner case. But I do have the same scenario and I'll document what I do best to make use of every bit of the internet traffic I have.

It'd be great if you could provide a workflow(if this doesn't seem like a worthy feature).
Thanks for your previous answer.

On the network machine, setup an apt proxy. You could use whatever you like (and also test its integration with apt-offline). To my knowledge, available options are:

  • apt-cacher
  • apt-cacher-ng
  • approx

Now, when you invoke apt-offline on the networked machine, invoke it with the proxy option, as in:

apt-offline get /tmp/collate.sig --bundle collate.zip --proxy-host network-host-name --proxy-port port-no

That should do the part about your last request of optimum utilization of network resources. I do use a setup derived of the same. Keep in mind that the mentioned proxies do have some issues (remote proxy side), when used in combination with apt-offline. But that is a different topic altogether.

And if you come across any other proxies that you are able to use in this setup, please do share with me too. I'm always on the look to explore new proxy servers for this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants