-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide easier workflow for uploading rpms into a repository #994
Comments
|
Concur:
|
oh great! I missed that in the docs (maybe i overlooked it?) Then i guess the only thing missing is the ability to upload multiple rpms rather than just one? |
The big difference between this and just running the command multiple times is that if you were to run this 100 times, you would get 100 new repo versions |
if you want to add 100 files as an atomic operation, using a file: repo and syncing them is going to be more efficient. I'm not sure how the REST-API would handle trying to stream, say, 100 files in one enormous request. Making and removing versions is pretty straightforward - it's publishing that can take time, in pulp_rpm. |
Doesn't it use chunking to upload chunks of each file? Katello's cli supports this (also via chunking). Yes making/removing versions is very straight forward, but if i need to tell a user how to upload a local set of rpms to a server, a single command versus run this command 100 times in a bash loop and then go delete 99 repo versions isn't ideal. This is all about usability. |
Hm, ok. So if we had something like Could add a --dry-run as well, to output "this will add the following RPMs, taking up N GB, to repository foo" |
oh the https://docs.pulpproject.org/pulp_rpm/restapi.html#tag/Content:-Packages/operation/content_rpm_packages_create api doesnt' support chunking, thats not good. There are limits to buffer sizes on web servers that prevent rpms from being uploaded in a lot of cases. On satellite for example you HAVE to use a chunked api in order to upload anything over ~5 mb which is pretty small. I don't think that api is the right one to use in this case (or any case for the actual file upload IMO) |
forgot to add: if you want to create 100 versions and delete 99 of them, that would be okay, but the apis today support just creating 1 version with the 100 rpms i thought, so that seems like the simpler option? |
The problem we have here, is "orphan-upload" (ie not directly into a repository) collides really badly with RBAC and orphan-cleanup and "who owns Artifacts" when they are de-duplicated entities. We're going to have think pretty hard about how to address this. |
Upload absolutely supports chunking. (I never remember the versions when something was added...) At this point I'd say: Create a temporary repository with |
I don't quite understand the purpose of using a temporary repository to host the units? You can already add them to a repository here: https://pulpproject.org/pulp_rpm/restapi/#tag/Repositories:-Rpm/operation/repositories_rpm_rpm_modify |
The benefits are: You only create a single new repository version (on the target repository), while never needing to deal with orphans. |
Could you explain why its not sane? |
Orphans can fall victim to orphan cleanup basically at any time. At the same time because content is shared (and therefore must be immutable) it cannot be owned by someone. The only way to prevent it being orphan is to put it in a repository (usually owned by someone). |
I thought orphan cleanup only cleaned up units that were older than orphan_protection_time ? As long as the user uses a reasonable orphan_protection_time this shouldn't be an issue? |
You can always provide the time with the call and that can be zero. Because "I really need this artifact to leave the system now." |
To me that is a big edge case and one that is a problem with the api/server, not one that should be solved via a cli workflow. I could see a server side setting for minimum protection time, but to me this is not the correct place to solve this problem. |
This threads remind me of one of first ideas I had for pulp-cli: What if we could write a recipe and use it as a parameter for the CLI.
Could something like: "Upload all rpm packages from $PWD" help you somehow @jlsherrill, or do you need this "sugar" on the API level? |
I believe that users would benefit from being able to just run
and have all the RPMs added to the the repository in one repository version. |
Was playing with the existing functionality today; let me record here for posterity.
|
We discussed at pulpcore meeting today. Here's my summary of what I learned and what I think the plan is:
Do we want to build this CLI functionality without rethinking any server-side APIs? Yes Therefore the only viable path is to have the CLI:
Also here's some info on the desired timeline and usage.
@pulp/core FYI |
After more discussion with @jlsherrill the issue with this plan is architecturally the end user performing the upload can't make calls to create or delete repos. I'm going to schedule us a 30 minute call to talk over some options, hopefully that's helpful. @pulp/core |
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository as a single repository-version. closes pulp#994.
Well that's...a shame. Um, I have a working POC - but it absolutely requires the ability to create a repository under the invoking user's credentials. |
For reference, here's example output from the current approach:
|
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository as a single repository-version. closes pulp#994.
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository as a single repository-version. closes pulp#994.
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository as a single repository-version. closes pulp#994.
With the exception of the detail that the CLI is responsible for doing these things, we had this same discussion with COPR and came to the same conclusion - that if you're constructing a new release, you want to upload your RPMs into some temporary holding area (a repository) and then transfer all of them into the main repository in one operation later on. |
@dralley Yeah, the current state of the PR is "pretty close" to what COPR is asking for. However, there's no (current) way to get here starting from "a user who can upload content but is not allowed to create repositories". If there existed a subclass of RpmRepository, something like RpmTmpRepository, that could have its own set of RBAC controls, that might work? An RpmTmpRepository would be a limited version of RpmRepository - no remote allowed, can't be sync'd, retain-versions==1, refuses to be published or distributed (however that would be be implemented). Would be great if it could be automatically reaped, eventually? Is it obvious I am making this up? That is, of course, a significantly new kind-of Thing, and A LOT more work than #1000 is. And getting it working, tested, and released in a few weeks, with a major holiday between now and then, would be...exciting. |
Then give the user repo-creation... Would this be such a bad idea?
|
Why would it be content agnostic? In many cases (including COPR's specifically) the temporary repository should be publishable on its own so that it would be possible to test the packages before making them live. That requires that it be a standard typed repository, and in any case an untyped repository would be something completely new and different from what currently exists. |
Proposal:
More discussion incoming tomorrow, will add minutes here. |
What you describe here is a completely different workflow, because here the "temporary repository" would not be an intermediary asset of the single cli call which it will delete in the end, but a created artifact you want to use. |
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.
More discussion happened today. Some notes:
Today's conclusions:
|
"ggainey to provide list of REST calls" - see #1003 (comment) |
One more thought: pulp_container implemented pending_blobs and pending_manifests in order to allow |
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.
This finds all *.rpm in the specified directory and arranges for them to be added to the specified --repository, and then publishes the final resulting repository-version. closes pulp#994.
Summary
Today with pulp-cli, if a user wants to upload 3 rpms, they have to run:
pulp artifact upload --file "$PKG1"
pulp artifact upload --file "$PKG2"
pulp artifact upload --file "$PKG3"
Then for each of those you have to grab the href of each artifact and fetch the sha256:
pulp show --href "${ARTIFACT_HREF_1}" | jq -r '.sha256')
pulp show --href "${ARTIFACT_HREF_2}" | jq -r '.sha256')
pulp show --href "${ARTIFACT_HREF_3}" | jq -r '.sha256'
Then for each of those, they have to use the sha256 to create a content unit and grab the unit href:
PACKAGE_HREF1=$(pulp rpm content create --sha256 "${ARTIFACT_SHA256}" | jq -r '.pulp_href')
PACKAGE_HREF2=$(pulp rpm content create --sha256 "${ARTIFACT_SHA256}" | jq -r '.pulp_href')
PACKAGE_HREF3=$(pulp rpm content create --sha256 "${ARTIFACT_SHA256}" | jq -r '.pulp_href')
Finally you can add these to a repository:
TASK_HREF=$(pulp rpm repository content modify
--repository "${REPO_NAME}"
--add-content "[{"pulp_href": "${PACKAGE_HREF1}"}, {"pulp_href": "${PACKAGE_HREF2}"}, {"pulp_href": "${PACKAGE_HREF3}"}]"
2>&1 >/dev/null | awk '{print $4}')
This is a LOT of commands to run for a user and there isn't an easy way for me to easily give a couple commands to a user to run to upload an arbitrary list of rpms. Having a command that handled everything and simply took in a list of rpms and outputted a repo version would be much simpler.
An example of what i'm thinking of:
This command would need to handle things such as:
Examples
The text was updated successfully, but these errors were encountered: