Provide easier workflow for uploading rpms into a repository #994

jlsherrill · 2024-06-18T12:48:04Z

Summary

Today with pulp-cli, if a user wants to upload 3 rpms, they have to run:

pulp artifact upload --file "$PKG1"
pulp artifact upload --file "$PKG2"
pulp artifact upload --file "$PKG3"

Then for each of those you have to grab the href of each artifact and fetch the sha256:

pulp show --href "${ARTIFACT_HREF_1}" | jq -r '.sha256')
pulp show --href "${ARTIFACT_HREF_2}" | jq -r '.sha256')
pulp show --href "${ARTIFACT_HREF_3}" | jq -r '.sha256'

Then for each of those, they have to use the sha256 to create a content unit and grab the unit href:

PACKAGE_HREF1=$(pulp rpm content create --sha256 "${ARTIFACT_SHA256}" | jq -r '.pulp_href')
PACKAGE_HREF2=$(pulp rpm content create --sha256 "${ARTIFACT_SHA256}" | jq -r '.pulp_href')
PACKAGE_HREF3=$(pulp rpm content create --sha256 "${ARTIFACT_SHA256}" | jq -r '.pulp_href')

Finally you can add these to a repository:

TASK_HREF=$(pulp rpm repository content modify
--repository "${REPO_NAME}"
--add-content "[{"pulp_href": "${PACKAGE_HREF1}"}, {"pulp_href": "${PACKAGE_HREF2}"}, {"pulp_href": "${PACKAGE_HREF3}"}]"
2>&1 >/dev/null | awk '{print $4}')

This is a LOT of commands to run for a user and there isn't an easy way for me to easily give a couple commands to a user to run to upload an arbitrary list of rpms. Having a command that handled everything and simply took in a list of rpms and outputted a repo version would be much simpler.

An example of what i'm thinking of:

# pulp rpm repository content upload  --repository "${REPO_NAME}"  --files=*.rpm

A new repository version has been created at  /pulp/api/v3/repositories/rpm/abcdef/versions/5

This command would need to handle things such as:

An rpm already exists (such as if i interrupt the command half way through and restart it)

Examples

The text was updated successfully, but these errors were encountered:

mdellweg · 2024-06-18T14:02:53Z

pulp rpm content upload does exactly that.

ggainey · 2024-06-18T14:09:39Z

pulp rpm content upload does exactly that.

Concur:

$ pulp rpm content upload --help
Usage: pulp rpm content upload [OPTIONS]

  Create a content unit by uploading a file

Options:
  --chunk-size TEXT     Chunk size to break up rpm package into. Defaults to
                        1MB
  --relative-path TEXT  Relative path within a distribution of the entity
  --file FILENAME       An RPM binary  [required]
  --repository TEXT     Repository to add the content to in the form
                        '[[<plugin>:]<resource_type>:]<name>' or by href.
  --help                Show this message and exit.

jlsherrill · 2024-06-18T14:20:58Z

oh great! I missed that in the docs (maybe i overlooked it?)

Then i guess the only thing missing is the ability to upload multiple rpms rather than just one?

jlsherrill · 2024-06-18T14:22:24Z

The big difference between this and just running the command multiple times is that if you were to run this 100 times, you would get 100 new repo versions

ggainey · 2024-06-18T14:28:40Z

if you want to add 100 files as an atomic operation, using a file: repo and syncing them is going to be more efficient. I'm not sure how the REST-API would handle trying to stream, say, 100 files in one enormous request.

Making and removing versions is pretty straightforward - it's publishing that can take time, in pulp_rpm.

jlsherrill · 2024-06-18T14:50:26Z

Doesn't it use chunking to upload chunks of each file? Katello's cli supports this (also via chunking).

Yes making/removing versions is very straight forward, but if i need to tell a user how to upload a local set of rpms to a server, a single command versus run this command 100 times in a bash loop and then go delete 99 repo versions isn't ideal.

This is all about usability.

ggainey · 2024-06-18T14:58:54Z

Hm, ok. So if we had something like --directory=./foo --cleanup-versions True, that (in the CLI process) call the pulp_rpm/restapi.html#tag/Content:-Packages/operation/content_rpm_packages_create API in a loop for all-directory-contents-ending-in-.rpm, remembering "current version at start" and cleaning up all interim-versions for the user - that would do the deed, yeah?

Could add a --dry-run as well, to output "this will add the following RPMs, taking up N GB, to repository foo"

jlsherrill · 2024-06-18T15:04:11Z

oh the https://docs.pulpproject.org/pulp_rpm/restapi.html#tag/Content:-Packages/operation/content_rpm_packages_create api doesnt' support chunking, thats not good. There are limits to buffer sizes on web servers that prevent rpms from being uploaded in a lot of cases. On satellite for example you HAVE to use a chunked api in order to upload anything over ~5 mb which is pretty small.

I don't think that api is the right one to use in this case (or any case for the actual file upload IMO)

jlsherrill · 2024-06-18T15:08:11Z

forgot to add: if you want to create 100 versions and delete 99 of them, that would be okay, but the apis today support just creating 1 version with the 100 rpms i thought, so that seems like the simpler option?

ggainey · 2024-06-18T15:13:48Z

The problem we have here, is "orphan-upload" (ie not directly into a repository) collides really badly with RBAC and orphan-cleanup and "who owns Artifacts" when they are de-duplicated entities. We're going to have think pretty hard about how to address this.

mdellweg · 2024-06-18T15:47:45Z

Upload absolutely supports chunking. (I never remember the versions when something was added...)
And the CLI handles the Upload object transparently for the user.

At this point I'd say: Create a temporary repository with retain_version=1, upload all the packages there, and then move them all to the destination repository in one go, clean up the tmp repo. And that is a process the cli can do as a high level workflow.

jlsherrill · 2024-06-20T20:25:55Z

I don't quite understand the purpose of using a temporary repository to host the units? You can already add them to a repository here: https://pulpproject.org/pulp_rpm/restapi/#tag/Repositories:-Rpm/operation/repositories_rpm_rpm_modify

mdellweg · 2024-06-21T08:39:27Z

The benefits are: You only create a single new repository version (on the target repository), while never needing to deal with orphans.
The whole idea of users needing to create orphaned content prior to using is not sane in a multi user environment.

jlsherrill · 2024-06-21T13:26:02Z

Could you explain why its not sane?

mdellweg · 2024-06-21T14:13:44Z

Orphans can fall victim to orphan cleanup basically at any time. At the same time because content is shared (and therefore must be immutable) it cannot be owned by someone. The only way to prevent it being orphan is to put it in a repository (usually owned by someone).
There has been a lot of discussions in the past and i found at least some here (at different states of implementation):
https://discourse.pulpproject.org/t/multi-tennancy-and-the-question-of-owning-content/237
https://discourse.pulpproject.org/t/pulp-rpm-content-create-as-non-admin-user/1216
https://discourse.pulpproject.org/t/normal-user-got-permission-problem-when-uploading-big-rpm-file-as-artifact/1023

jlsherrill · 2024-06-21T14:25:56Z

I thought orphan cleanup only cleaned up units that were older than orphan_protection_time ? As long as the user uses a reasonable orphan_protection_time this shouldn't be an issue?

mdellweg · 2024-06-21T15:33:12Z

You can always provide the time with the call and that can be zero. Because "I really need this artifact to leave the system now."
But in the end the whole idea of letting users care about orphans could have been avoided, because Pulp should be about "Content in Repositories" instead of "Repositories and Content".

jlsherrill · 2024-06-21T17:46:25Z

To me that is a big edge case and one that is a problem with the api/server, not one that should be solved via a cli workflow. I could see a server side setting for minimum protection time, but to me this is not the correct place to solve this problem.

decko · 2024-06-21T18:42:07Z

This threads remind me of one of first ideas I had for pulp-cli: What if we could write a recipe and use it as a parameter for the CLI.
Like:

Create a mirror of fedora-updates
Create an on-demand repo of python packages
...
without entering in the details of Pulp architecture

Could something like: "Upload all rpm packages from $PWD" help you somehow @jlsherrill, or do you need this "sugar" on the API level?

dkliban · 2024-06-21T20:22:21Z

I believe that users would benefit from being able to just run

pulp rpm content upload --dir $PWD --repository my-repo

and have all the RPMs added to the the repository in one repository version.

ggainey · 2024-06-21T20:40:01Z

Was playing with the existing functionality today; let me record here for posterity.

wget -r -np -R "index.html*" https://fixtures.pulpproject.org/rpm-signed/
cd fixtures.pulpproject.org/rpm-signed
pulp rpm repository create --name uploaded --retain-repo-versions 1 
pulp rpm distribution create --name uploaded --repository uploaded --base-path uploaded
for f in *.rpm; do echo $f ; pulp rpm content -t package upload --repository rpm:rpm:uploaded --file "${f}" ; done
pulp rpm publication create --repository uploaded
wget http://localhost:5001/pulp/content/uploaded/

bmbouter · 2024-06-25T15:00:04Z

We discussed at pulpcore meeting today. Here's my summary of what I learned and what I think the plan is:

Our desired usage here is for non-admin users
Non-admin users can upload an orphaned artifact
Non-admin users, can't create RPMs from orphaned artifacts so we can't just upload RPMs without sending them to a destination repo.
If auto-publish is enabled (likely) and you're uploading 100 RPMs you'll get 99 publish tasks and 99 versions (undesirable).

Do we want to build this CLI functionality without rethinking any server-side APIs? Yes

Therefore the only viable path is to have the CLI:

Create a temporary repo with auto-publish disabled
Upload everything in foo directory to that repo
Publish (if necessary)
Use the copy (rpm only API) or modify (all plugins generic API) to copy everything into the destination repo
Delete the temporary repo

Also here's some info on the desired timeline and usage.

It would be nice if this worked for all content types, but in terms of actual need RPM is all we really need for now
We need something to test by mid-July, hopefully fully merged and released by end-of-July.

@pulp/core FYI

bmbouter · 2024-06-25T18:47:17Z

After more discussion with @jlsherrill the issue with this plan is architecturally the end user performing the upload can't make calls to create or delete repos. I'm going to schedule us a 30 minute call to talk over some options, hopefully that's helpful.

@pulp/core

This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository as a single repository-version. closes pulp#994.

ggainey · 2024-06-25T19:12:08Z

Well that's...a shame. Um, I have a working POC - but it absolutely requires the ability to create a repository under the invoking user's credentials.

ggainey · 2024-06-25T19:14:57Z

For reference, here's example output from the current approach:

(oci-env) (994_directory_upload) ~/github/Pulp3/pulp-cli $ pulp rpm repository create --name upload
(oci-env) (994_directory_upload) ~/github/Pulp3/pulp-cli $ pulp rpm content upload --directory ./signed/fixtures.pulpproject.org/rpm-signed/ --repository rpm:rpm:upload
About to upload 35 files for upload into tmp-repository uploadtmp_upload_de488023-3e4c-4e4d-878f-fd9d76fc1f72
Started background task /pulp/api/v3/tasks/019050cd-2c15-7b7c-bf9a-6d985fd7827b/
Done.
Uploaded ./signed/fixtures.pulpproject.org/rpm-signed/crow-0.8-1.noarch.rpm...
Started background task /pulp/api/v3/tasks/019050cd-303a-73df-8324-adb8b96891a1/
Done.
Uploaded ./signed/fixtures.pulpproject.org/rpm-signed/cheetah-1.25.3-5.noarch.rpm...

[lots of uploads happen...]

Started background task /pulp/api/v3/tasks/019050cd-d0f0-75a9-813e-11d4787b2ab0/
Done.
Uploaded ./signed/fixtures.pulpproject.org/rpm-signed/camel-0.1-1.noarch.rpm...
Creating new version of repository upload
Started background task /pulp/api/v3/tasks/019050cd-d653-7bde-a5f5-b5be967a6b7d/
Done.
Created new version in upload: /pulp/api/v3/repositories/rpm/rpm/019050cd-1490-7e65-94e3-616225428889/versions/1/
Removing tmp-repository uploadtmp_upload_de488023-3e4c-4e4d-878f-fd9d76fc1f72
Started background task /pulp/api/v3/tasks/019050cd-d90f-7b9f-b2c5-f4e08be8fe27/
Done.
(oci-env) (994_directory_upload) ~/github/Pulp3/pulp-cli $

This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository as a single repository-version. closes pulp#994.

dralley · 2024-06-25T23:28:47Z

With the exception of the detail that the CLI is responsible for doing these things, we had this same discussion with COPR and came to the same conclusion - that if you're constructing a new release, you want to upload your RPMs into some temporary holding area (a repository) and then transfer all of them into the main repository in one operation later on.

ggainey · 2024-06-25T23:59:04Z

@dralley Yeah, the current state of the PR is "pretty close" to what COPR is asking for. However, there's no (current) way to get here starting from "a user who can upload content but is not allowed to create repositories". If there existed a subclass of RpmRepository, something like RpmTmpRepository, that could have its own set of RBAC controls, that might work? An RpmTmpRepository would be a limited version of RpmRepository - no remote allowed, can't be sync'd, retain-versions==1, refuses to be published or distributed (however that would be be implemented). Would be great if it could be automatically reaped, eventually? Is it obvious I am making this up?

That is, of course, a significantly new kind-of Thing, and A LOT more work than #1000 is. And getting it working, tested, and released in a few weeks, with a major holiday between now and then, would be...exciting.

mdellweg · 2024-06-26T06:46:44Z

Then give the user repo-creation... Would this be such a bad idea?
Possible solutions (server side):

TmpRepository: To me that should be a content agnostic container. If we don't also add a periodic cleanup task for them, I don't see the added value.
UsrRepository: We could give every user their own persistent content bucket (also content agnostic).
TarUpload: Consume a bunch of packages from a tar file. The cli could handle the packing and chunking on the fly.

dralley · 2024-06-26T18:21:53Z

Why would it be content agnostic? In many cases (including COPR's specifically) the temporary repository should be publishable on its own so that it would be possible to test the packages before making them live.

That requires that it be a standard typed repository, and in any case an untyped repository would be something completely new and different from what currently exists.

ggainey · 2024-06-27T15:10:24Z

Proposal:

First - modify the existing POC to just-upload-directly-to-repository
- Why? requesting user is actually OK with "one revision per RPM"
- Problem: how can user tell Justin's app "I'm done now, please publish this repo"
  - we don't have a way to do this currently if the user doesn't have access to create-publication
  - maybe expose publication through jsherril's API?
Second, more-complicated desire
- add list-of-upload-hrefs-and names to rpm\Package\Create
- for a directory-full-of-rpms, use core\Upload\Create to get an RPM into upload-staging and remember name-and href.
- Once all rpms in dir are uploaded, call rpm\Package\Create with dictionary-of-(name: content-href) list

More discussion incoming tomorrow, will add minutes here.

mdellweg · 2024-06-27T15:14:24Z

Why would it be content agnostic? In many cases (including COPR's specifically) the temporary repository should be publishable on its own so that it would be possible to test the packages before making them live.

That requires that it be a standard typed repository, and in any case an untyped repository would be something completely new and different from what currently exists.

What you describe here is a completely different workflow, because here the "temporary repository" would not be an intermediary asset of the single cli call which it will delete in the end, but a created artifact you want to use.
So in that case, you'd need to build the workflow on a higher level. Creating the temporary repository, filling it and then moving it's content to the final destination is no longer an atomic step.

This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.

ggainey · 2024-06-28T15:16:32Z

More discussion happened today. Some notes:

Added --directory to rpm content -t package upload. #1003 POC uploads serially - how can we parallelize this?
- First cut is OK to be serial
Exactly which REST APIs are involved here?
- AI: ggainey to run the Added --directory to rpm content -t package upload. #1003 POC and list the REST calls involved
On the pulp-side - how is the Upload API controlled, in the RBAC context?
- Users can start uploads
What kind of new API would make this better?
- A modification to Rpm::Content:;Package::Create to accept "a list of upload-hrefs" (in addition to/in place of "a single upload-href")?
- even with multipart/form, there will be a limit to the number of hrefs acceptable in a single API call
- adding/extending the API is a "nice to have" - maybe some day, but not Now
- what if the API change was "here is One Gigantic tarfile"?
  - "duplicate size problem" rears its ugly head
  - would have to be supported by creating/streaming "on the fly" to the new API
  - consider multi-plugin/multi-content-type support here

Today's conclusions:

ggainey to provide list of REST calls
Added --directory to rpm content -t package upload. #1003 is going to be "good enough" for the immediate ('by July') need
Added --directory to rpm content -t package upload. #1000 approach is/may be useful to other upstream-Pulp users - keep and extend in addition
- more investigation needed

ggainey · 2024-06-28T15:35:58Z

"ggainey to provide list of REST calls" - see #1003 (comment)

mdellweg · 2024-06-29T08:44:30Z

One more thought: pulp_container implemented pending_blobs and pending_manifests in order to allow podman push to first push a bunch of blobs, then push a manifest needing them and (optionally) combine manifests in a manifest list after that. And only the last object gets a tag and is added (with all its dependents) to the repository. Maybe we can do something similar here.
This basically is content that belongs to a repository (solves the rbac side) but is not part of it yet (don't collect garbage, create single new version on an event).

This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.

jlsherrill added feature request New feature request (template-set) Triage-Needed Needs to be reviewed at next pulp-cli mtg labels Jun 18, 2024

ggainey self-assigned this Jun 25, 2024

mdellweg removed the Triage-Needed Needs to be reviewed at next pulp-cli mtg label Jun 25, 2024

ggainey mentioned this issue Jun 25, 2024

Added --directory to rpm content -t package upload. #1000

Closed

3 tasks

ggainey mentioned this issue Jun 27, 2024

Added --directory to rpm content -t package upload. #1003

Merged

ggainey closed this as completed in #1003 Jul 15, 2024

ggainey closed this as completed in 51498c3 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide easier workflow for uploading rpms into a repository #994

Provide easier workflow for uploading rpms into a repository #994

jlsherrill commented Jun 18, 2024 •

edited

Loading

mdellweg commented Jun 18, 2024

ggainey commented Jun 18, 2024

jlsherrill commented Jun 18, 2024

jlsherrill commented Jun 18, 2024

ggainey commented Jun 18, 2024

jlsherrill commented Jun 18, 2024

ggainey commented Jun 18, 2024

jlsherrill commented Jun 18, 2024 •

edited

Loading

jlsherrill commented Jun 18, 2024

ggainey commented Jun 18, 2024

mdellweg commented Jun 18, 2024

jlsherrill commented Jun 20, 2024

mdellweg commented Jun 21, 2024

jlsherrill commented Jun 21, 2024

mdellweg commented Jun 21, 2024

jlsherrill commented Jun 21, 2024

mdellweg commented Jun 21, 2024

jlsherrill commented Jun 21, 2024

decko commented Jun 21, 2024

dkliban commented Jun 21, 2024 •

edited

Loading

ggainey commented Jun 21, 2024

bmbouter commented Jun 25, 2024 •

edited

Loading

bmbouter commented Jun 25, 2024

ggainey commented Jun 25, 2024

ggainey commented Jun 25, 2024

dralley commented Jun 25, 2024

ggainey commented Jun 25, 2024

mdellweg commented Jun 26, 2024

dralley commented Jun 26, 2024 •

edited

Loading

ggainey commented Jun 27, 2024

mdellweg commented Jun 27, 2024

ggainey commented Jun 28, 2024

ggainey commented Jun 28, 2024

mdellweg commented Jun 29, 2024

Provide easier workflow for uploading rpms into a repository #994

Provide easier workflow for uploading rpms into a repository #994

Comments

jlsherrill commented Jun 18, 2024 • edited Loading

Summary

Examples

mdellweg commented Jun 18, 2024

ggainey commented Jun 18, 2024

jlsherrill commented Jun 18, 2024

jlsherrill commented Jun 18, 2024

ggainey commented Jun 18, 2024

jlsherrill commented Jun 18, 2024

ggainey commented Jun 18, 2024

jlsherrill commented Jun 18, 2024 • edited Loading

jlsherrill commented Jun 18, 2024

ggainey commented Jun 18, 2024

mdellweg commented Jun 18, 2024

jlsherrill commented Jun 20, 2024

mdellweg commented Jun 21, 2024

jlsherrill commented Jun 21, 2024

mdellweg commented Jun 21, 2024

jlsherrill commented Jun 21, 2024

mdellweg commented Jun 21, 2024

jlsherrill commented Jun 21, 2024

decko commented Jun 21, 2024

dkliban commented Jun 21, 2024 • edited Loading

ggainey commented Jun 21, 2024

bmbouter commented Jun 25, 2024 • edited Loading

bmbouter commented Jun 25, 2024

ggainey commented Jun 25, 2024

ggainey commented Jun 25, 2024

dralley commented Jun 25, 2024

ggainey commented Jun 25, 2024

mdellweg commented Jun 26, 2024

dralley commented Jun 26, 2024 • edited Loading

ggainey commented Jun 27, 2024

mdellweg commented Jun 27, 2024

ggainey commented Jun 28, 2024

ggainey commented Jun 28, 2024

mdellweg commented Jun 29, 2024

jlsherrill commented Jun 18, 2024 •

edited

Loading

jlsherrill commented Jun 18, 2024 •

edited

Loading

dkliban commented Jun 21, 2024 •

edited

Loading

bmbouter commented Jun 25, 2024 •

edited

Loading

dralley commented Jun 26, 2024 •

edited

Loading