Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse and import both rpm and deb packages metadata #9101

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
860a02f
Create lzreposync subdirectory
agraul May 23, 2024
d9b3efe
Correct error message
waterflow80 Aug 4, 2024
e594cfa
Add remote_path column
waterflow80 Aug 4, 2024
199381e
Add expand_full_filelist parameter
waterflow80 Aug 4, 2024
5acf7ab
Update deprecated method
waterflow80 Aug 4, 2024
86df68e
Add import_signatures parameter
waterflow80 Aug 4, 2024
f772e64
Implement Primary.xml file parser
waterflow80 Aug 4, 2024
3b2d236
Implement filelists.xml file parser
waterflow80 Aug 4, 2024
1649f34
Implement full rpm metadata parsing
waterflow80 Aug 4, 2024
a6d462c
Parse and import rpm patches/updates
waterflow80 Aug 4, 2024
0df0030
Import parsed rpm & deb packages to db
waterflow80 Aug 4, 2024
fda9e51
Implement the deb Packages md file
waterflow80 Aug 4, 2024
eee052b
Implement the Translation file parser
waterflow80 Aug 4, 2024
5a51d0d
Implement full deb metadata parsing
waterflow80 Aug 4, 2024
5e26b7a
Fetch repository information from the db
waterflow80 Aug 4, 2024
8ef299a
Complete lzreposync service entry point
waterflow80 Aug 4, 2024
5435907
Add new dependency
waterflow80 Aug 4, 2024
3567817
Add unit tests for rpm metadata parsers
waterflow80 Aug 4, 2024
8bad179
Delete no longer used files
waterflow80 Aug 4, 2024
61b0a74
Remove already defined function
waterflow80 Aug 4, 2024
898d571
Fix linting complain
waterflow80 Aug 4, 2024
4c7db58
Complete code for lzreposync version 0.1
waterflow80 Aug 15, 2024
4f8a070
Complete tests for lzreposync service
waterflow80 Aug 15, 2024
f329e51
Fix error: too many clients already
waterflow80 Aug 15, 2024
913f21c
Complete latest version
waterflow80 Aug 17, 2024
8a49313
Optimize code and do some cleanup
waterflow80 Aug 26, 2024
6ccb3bf
Optimize and consolidate code
waterflow80 Aug 29, 2024
2157d56
Fix cachedir path formatting issue
waterflow80 Aug 29, 2024
dceccec
fixup! Complete lzreposync service entry point
waterflow80 Sep 2, 2024
2a79e72
fixup! Optimize code and do some cleanup
waterflow80 Sep 2, 2024
2f4c998
fixup! Optimize and consolidate code
waterflow80 Sep 2, 2024
9a95e5a
fixup! Complete latest version
waterflow80 Sep 2, 2024
fed31fe
fixup! Optimize and consolidate code
waterflow80 Sep 2, 2024
a033e4d
Complete gpg signature check for rpm
waterflow80 Sep 9, 2024
b890b1c
fixup! Add remote_path column
waterflow80 Sep 9, 2024
7d0c57c
Refactor: Allow more input variants in makedirs()
agraul Sep 9, 2024
7aa602c
Merge pull request #1 from agraul/refactor-makedirs
waterflow80 Sep 9, 2024
4042fc4
fixup! Refactor: Allow more input variants in makedirs()
waterflow80 Sep 9, 2024
a7270cd
Complete gpg signature check for debian
waterflow80 Sep 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,6 @@ yarn-error.log
# This should never be used since we use Yarn, but avoid anyone accidentally committing it
package-lock.json

rel-eng/custom/__pycache__

# Intellij IDEA
.idea/
*.iml
Expand All @@ -86,6 +84,12 @@ python/.vscode
# Python
venv/
.venv/
*.egg-info/
*.egg
wheels/
__pycache__/
build/
.pytest_cache/

# Schema

Expand Down
75 changes: 75 additions & 0 deletions python/lzreposync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# lzreposync

TODO: project description

## How to work in this project

1. Create a new virtual environment
```sh
$ python3.11 -m venv .venv
$ . .venv/bin/activate
```
2. Install `lzreposync` in *editable* mode
``` sh
$ pip install -e .
```
3. Install other required dependencies (required by spacewalk and other modules)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add these to pyproject.toml as dependencies, then step 2 will install them as well.

```sh
pip install pytest
pip install pycurl
pip install pyopenssl
pip install rpm
pip install psycopg2-binary
```
4. Add a path configuration file (**Important!**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct fix will be that we build python311 versions of these libraries and install them as RPMs. We'll do that later, it's not for you (and especially not for this PR).

```
echo "absolute/path/to/uyuni/python/" > .venv/lib64/python3.11/site-packages/uyuni_python_paths.pth
# This is a temporary solution that will allow the lzreposync service to recognize/locate other modules like spacewalk, etc...
```
5. Add configuration environment variables
```sh
vim /etc/rhn/rhn.conf: # create directory/file if not exists

DB_BACKEND=postgresql
DB_USER=spacewalk
DB_PASSWORD=spacewalk
DB_NAME=susemanager
DB_HOST=127.0.0.1 # might not work with 'localhost'
DB_PORT=5432
PRODUCT_NAME=any
TRACEBACK_MAIL=any
DB_SSL_ENABLED=
DB_SSLROOTCERT=any
DEBUG=1
ENABLE_NVREA=1
MOUNT_POINT=/tmp
SYNC_SOURCE_PACKAGES=0

# Some values might not be the right ones
```
6. Try `lzreposync`
``` sh
$ lzreposync -u https://download.opensuse.org/update/leap/15.5/oss/ --type yum [--import-updates]
```

### How do I ...?

- add new a dependency? Add the *pypi* name to the `dependencies` list in the `[project]` section in `pyproject.toml`.

## Tests
We're using a special postgres db docker container that contains all the `susemanager` database schema built and ready.

To pull and start the database, you should:
```sh
cd /uyuni/java
sudo make -f Makefile.docker EXECUTOR=podman dockerrun_pg
# Wait a few seconds until the db is fully initialized
```

After installing with `pip install .` (or `pip install -e .`), `pytest tests/` runs all tests. Sometimes a `rehash` is required to ensure `.venv/bin/pytest` is used by your shell.

You can connect to the database by:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it foolproof and explicit that this is the test database (where we can't just use spacewalk-sql -i), I suggest

Suggested change
You can connect to the database by:
You can connect to the test database by:

```sh
psql -h localhost -d susemanager -U spacewalk # password: spacewalk
```

17 changes: 17 additions & 0 deletions python/lzreposync/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"

[project]
name = "lzreposync"
version = "0.1"
dependencies = [
"memory_profiler",
"pytest",
"requests",
"python-gnupg",
"more_itertools"
]

[project.scripts]
lzreposync = "lzreposync:main"
218 changes: 218 additions & 0 deletions python/lzreposync/src/lzreposync/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# pylint: disable=missing-module-docstring

import argparse
import logging

from lzreposync import db_utils, updates_util
from lzreposync.db_utils import (
get_compatible_arches,
get_channel_info_by_label,
get_all_arches,
create_channel,
ChannelAlreadyExistsException,
)
from lzreposync.deb_repo import DebRepo
from lzreposync.import_utils import (
import_package_batch,
batched,
import_repository_packages_in_batch,
)
from lzreposync.rpm_repo import RPMRepo


def main():
parser = argparse.ArgumentParser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, i.e. not in this PR, we should align the CLI interface as much as possible with spacewalk-repo-sync. We don't have to support all the flags, a subset is fine. Right now we have conflicting flags (e.g. -d means debug here but dry-run in spacewalk-repo-sync)

description="Lazy reposync service",
conflict_handler="resolve",
formatter_class=argparse.RawDescriptionHelpFormatter,
)

parser.add_argument(
"--url",
"-u",
help="The target url of the remote repository of which we'll "
"parse the metadata",
dest="url",
type=str,
default=None,
)

parser.add_argument(
"-n",
"--name",
help="Name of the repository",
dest="name",
type=str,
default="noname",
)

parser.add_argument(
"-d",
"--debug",
help="Show debug messages",
action="store_const",
dest="loglevel",
const=logging.DEBUG,
default=logging.INFO,
)

parser.add_argument(
"-c",
"--cache",
help="Path to the cache directory",
dest="cache",
default=".",
type=str,
)

parser.add_argument(
"-b",
"--batch-size",
help="Size of the batch (num of packages by batch)",
dest="batch_size",
default=20,
type=int,
)

parser.add_argument(
"-a",
"--arch",
help="A filter for package architecture. Can be a regex, for example: 'x86_64', '(x86_64|arch_64)'",
default=".*",
dest="arch",
type=str,
)

parser.add_argument(
"--channel",
help="The channel label of which you want to synchronize repositories",
dest="channel",
type=str,
default=None,
)

parser.add_argument(
"--type",
help="Repo type (yum or deb)",
dest="repo_type",
type=str,
default=None,
)

parser.add_argument(
"--import-updates",
help="Import related patches/updates",
action="store_true",
dest="import_updates",
default=False,
)

parser.add_argument(
"--create-channel",
help="Create a new channel by providing the 'channel_label', and the 'channel_arch' eg: x86_64.\n"
"Eg: --create-channel test_channel x86_64",
dest="channel_info",
type=str,
nargs=2,
)

args = parser.parse_args()

# Creating a new channel
if args.channel_info:
channel_label, channel_arch = args.channel_info[0], args.channel_info[1]
print(
f"Creating a new channel with label: {channel_label}, and arch: {channel_arch}"
)
try:
channel = create_channel(
channel_label=channel_label, channel_arch=channel_arch
)
print(
f"Info: successfully created channel: {channel_label} -> id={channel.get_id()}, name={channel.get_label()}"
)
except ChannelAlreadyExistsException:
print(f"Warn: failed to create channel {channel_label}. Already exists !!")
return

arch = args.arch
if arch != ".*":
# pylint: disable-next=consider-using-f-string
arch = "(noarch|{})".format(args.arch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of silencing pylint, we should follow the rules we have set it for it. In this case it's as simple as:

Suggested change
# pylint: disable-next=consider-using-f-string
arch = "(noarch|{})".format(args.arch)
arch = f"(noarch|{args.arch})"


logging.getLogger().setLevel(args.loglevel)
if args.url:
if not args.repo_type:
print("ERROR: --type (yum/deb) must be specified when using --url")
return
if args.repo_type == "yum":
repo = RPMRepo(args.name, args.cache, args.url, arch)
elif args.repo_type == "deb":
repo = DebRepo(args.name, args.cache, args.url, None)
else:
print(f"ERROR: not supported repo_type: {args.repo_type}")
return
compatible_arches = get_all_arches()
failed = import_repository_packages_in_batch(
repo, args.batch_size, compatible_arches=compatible_arches
)
logging.debug("Completed import with %d failed packages", failed)

else:
# No url specified
if args.channel:
channel_label = args.channel
channel = get_channel_info_by_label(channel_label)
if not channel:
logging.error("Couldn't fetch channel with label %s", channel_label)
return
compatible_arches = get_compatible_arches(channel_label)
if args.arch and args.arch != ".*" and args.arch not in compatible_arches:
logging.error(
"Not compatible arch: %s for channel: %s",
args.channel_arch,
args.channel,
)
return
target_repos = db_utils.get_repositories_by_channel_label(channel_label)
for repo in target_repos:
if repo.repo_type == "yum":
rpm_repo = RPMRepo(
repo.repo_label, args.cache, repo.source_url, repo.channel_arch
)
logging.debug("Importing package for repo %s", repo.repo_label)
failed = import_repository_packages_in_batch(
rpm_repo,
args.batch_size,
channel,
compatible_arches=compatible_arches,
import_updates=args.import_updates,
)
logging.debug(
"Completed import for repo %s with %d failed packages",
repo.repo_label,
failed,
)
elif repo.repo_type == "deb":
dep_repo = DebRepo(
repo.repo_label, args.cache, repo.source_url, repo.channel_label
)
logging.debug("Importing package for repo %s", repo.repo_label)
failed = import_repository_packages_in_batch(
dep_repo,
args.batch_size,
channel,
compatible_arches=compatible_arches,
)
logging.debug(
"Completed import for repo %s with %d failed packages",
repo.repo_label,
failed,
)
else:
# TODO: handle repositories other than yum and deb
logging.debug("Not supported repo type: %s", repo.repo_type)
continue

else:
logging.error("Either --url or --channel must be specified")
Loading