Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse and import both rpm and deb packages metadata #9101

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Commits on Aug 2, 2024

  1. Create lzreposync subdirectory

    lzreposync will be a spacewalk-repo-sync replacement written in Python.
    It uses a src layout and a pyproject.toml. The target Python version is
    3.11, compatibility with older Python versions is explicitly not a goal.
    agraul authored and waterflow80 committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    860a02f View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2024

  1. Correct error message

    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    d9b3efe View commit details
    Browse the repository at this point in the history
  2. Add remote_path column

    Added the remote_path column that will hold the remote path/
    url of a given package.
    
    This information will help locate the package later-on on the
    remote repository and download it.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    e594cfa View commit details
    Browse the repository at this point in the history
  3. Add expand_full_filelist parameter

    A boolean argument that checks whether we should call the
    header.hdr.fullFilelist()
    
    We added this argument to disable the header.hdr.fullFilelist()
    function only for the lzreposync service.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    199381e View commit details
    Browse the repository at this point in the history
  4. Update deprecated method

    The inspect.getargspec() method is deprecated in Python 3
    It can be replaced by inspect.getfullargspec()
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    5acf7ab View commit details
    Browse the repository at this point in the history
  5. Add import_signatures parameter

    The import_signatures is a boolean argument that specifies
    whether we should execute the _import_signatures() method.
    
    We added this parameter to disable the _import_signatures()
    method for the lzreposync service.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    86df68e View commit details
    Browse the repository at this point in the history
  6. Implement Primary.xml file parser

    Parsing the rpm's Primary.xml packages metadata file using
    pulldom xml parser as a memory efficient parsing library.
    
    Note that some attributes in the returned parsed object are
    faked, and maybe filled in elsewhere.
    
    The faking of some of the data is done because some
    attributes are required by the importer service.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    f772e64 View commit details
    Browse the repository at this point in the history
  7. Implement filelists.xml file parser

    Parsing the rpm's filelists.xml metadata file using
    pulldom xml parser as a memory efficient parsing library.
    
    The parser parses the given filelists.xml file (normally in gz
    format), and cache the filelist information of each package
    in a separate file in the cache directory, using the package's
    hash as the filename, with no file extension.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    3b2d236 View commit details
    Browse the repository at this point in the history
  8. Implement full rpm metadata parsing

    Using both primary_parser and filelists_parser, return the full
    packages' metadata, pacakge by package, using lazing parsing.
    
    Note that there some attributes that are faked, because we
    can't fetch them now, and they're required by the package
    importer later-on.
    However, we can fake them more efficiently, using less memory.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    1649f34 View commit details
    Browse the repository at this point in the history
  9. Parse and import rpm patches/updates

    Parsed the update-info.xml file and imported the parsed
    patches/updates to the database.
    
    We used pretty much the same code from the old Reposync class.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    a6d462c View commit details
    Browse the repository at this point in the history
  10. Import parsed rpm & deb packages to db

    Import the parsed rpm and debian packages to the database in
    batche, and associate each pacakge with the corresponding
    channel
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    0df0030 View commit details
    Browse the repository at this point in the history
  11. Implement the deb Packages md file

    Parsed the debian Packages metadata file in a lazy way and
    yield the metadata of each package separately.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    fda9e51 View commit details
    Browse the repository at this point in the history
  12. Implement the Translation file parser

    Parsed the debian's Translation file that contains the full
    description of packages, grouped by description-md5, and
    cache the parsed descriptions in a cache directory.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    eee052b View commit details
    Browse the repository at this point in the history
  13. Implement full deb metadata parsing

    Using both packages_parser and translation_parser, return the
    full packages' metadata, pacakge by package, using lazing
    parsing
    
    Also set the debian repository's information in a DebRepo
    class
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    5a51d0d View commit details
    Browse the repository at this point in the history
  14. Fetch repository information from the db

    Given the channel label, fetch important repository's
    information form the database, and store it in a temporary
    object RepoDTO
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    5e26b7a View commit details
    Browse the repository at this point in the history
  15. Complete lzreposync service entry point

    Added the necessary command line arguments.
    
    Identify the target repositories, prepare the datastructures,
    and execute the lazy synchronization of repositories/packages.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    8ef299a View commit details
    Browse the repository at this point in the history
  16. Add new dependency

    Added a new dependency python-gnupg used to verify repo
    signature.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    5435907 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    3567817 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    8bad179 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    61b0a74 View commit details
    Browse the repository at this point in the history
  20. Fix linting complain

    Ignored two linting complains about rasing exceptions floowing the
    approach in the old reposync.
    
    We can enhance the code instead of doing this though.
    waterflow80 committed Aug 4, 2024
    Configuration menu
    Copy the full SHA
    898d571 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2024

  1. Complete code for lzreposync version 0.1

    This commit completes almost all the logic and use cases
    of the new lazy reposync.
    
    **Note** that this commit will be restructured and possibly
    divided into smaller and more convenient commits.
    This commit is for review purposes.
    waterflow80 committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    4c7db58 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4f8a070 View commit details
    Browse the repository at this point in the history
  3. Fix error: too many clients already

    Seemingly this error happened because we reached the maximum number
    of unclosed db connections. And thought that this might be due to
    the fact that the close() method in the Database class was not
    implemented, and the rhnSQL.closeDB() was not closing any connection.
    
    However, we're still hesitating about whether this is the root cause
    of the problem, because the old(current) reposync is was using it
    without any error.
    waterflow80 committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    f329e51 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2024

  1. Complete latest version

    This is the latest and almost the final version of the
    lzreposync service. (gpg sig check not complete)
    
    It contains pretty much all the necessary tests,
    including the ones for updates/patches import.
    
    Some of the remaining 'todos' are either for code
    enhancements or some unclear concepts that
    will be discussed with the team.
    
    Of course, this commit will be split into smaller
    ones later after rebase.
    waterflow80 committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    913f21c View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2024

  1. Optimize code and do some cleanup

    - Removed some todos.
    - Changed some sql queries with equivalent ones using
    JOIN...ON.
    - Some other minor cleanup
    waterflow80 committed Aug 26, 2024
    Configuration menu
    Copy the full SHA
    8a49313 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Optimize and consolidate code

    Optimized some code by changing classes and methods
    in some logics with free functions.
    
    Consolidated the debian repo parsing.
    waterflow80 committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    6ccb3bf View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2157d56 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. Configuration menu
    Copy the full SHA
    dceccec View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2a79e72 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2f4c998 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9a95e5a View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    fed31fe View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2024

  1. Complete gpg signature check for rpm

    Completed the gpg signature check for rpm repositories,
    mainly for the repomd.xml file.
    
    This is done by downloading the signature file from the
    remote rpm repo, and executing 'gpg verify' to verify the
    repomd.xml file against its signature using the already
    added gpg keys on the filesystem.
    
    So, if you haven't already added the required gpg keyring
    on your system, you'll not be able to verify the repo.
    
    You should ideally run this version directly on the uyuni-
    server, because the gpg keyring will probably be present
    there.
    waterflow80 committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    a033e4d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b890b1c View commit details
    Browse the repository at this point in the history
  3. Refactor: Allow more input variants in makedirs()

    makedirs() in uyuni.common.fileutils now accepts relative paths that
    consist of only a directory name or paths with trailing slashes.
    agraul committed Sep 9, 2024
    Configuration menu
    Copy the full SHA
    7d0c57c View commit details
    Browse the repository at this point in the history
  4. Merge pull request #1 from agraul/refactor-makedirs

    Refactor: Allow more input variants in makedirs()
    waterflow80 authored Sep 9, 2024
    Configuration menu
    Copy the full SHA
    7aa602c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    4042fc4 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2024

  1. Complete gpg signature check for debian

    Completed the gpg signature check for debian repositories.
    
    If you haven't already added the required gpg keyring
    on your system, you'll not be able to verify the repo,
    and you'll normally get a GeneralRepoException.
    
    You should ideally run this version directly on the uyuni-
    server, because the gpg keyring will probably be present
    there.
    waterflow80 committed Sep 11, 2024
    Configuration menu
    Copy the full SHA
    a7270cd View commit details
    Browse the repository at this point in the history