Skip to content

Releases: datalad/datalad-next

1.5.0 (2024-06-16)

16 Jun 13:10
@mih mih
1.5.0
067fa7c
Compare
Choose a tag to compare

💫 New features

  • new support subpackage for Git's pathspecs [8d05f1cd]
  • add pathspec support to iter_(gitworktree|submodules)() [7622267c]
  • GitWorktreeFileSystemItem:
  • iter_gitworktree:
    • added submodule recursion with pathspec support [2851ec4c]
    • add ability to report untracked content only [2763b10f]
  • iter_submodules:

🐛 Bug Fixes

  • iter_gitstatus:
  • iter_gitstatus:
    • rectify usage of unsupported untracked=no mode in tests [673e6c22]
  • Patches for DataLad (core):

📝 Documentation

  • Contributing guide:
    • expand description of expected commit messages [04f68b49]

1.4.1 (2024-05-22)

21 May 22:16
@mih mih
1.4.1
9bbe486
Compare
Choose a tag to compare

🐛 Bug Fixes

  • dependencies: limit test patch import to test runs [905b99b]

📝 Documentation

  • add note of Git >= v2.31 requirement for next-status [093575d]
  • state conventional-commits requirement [a9180fc]

🛡 Tests

  • fixture: add missing import (for non-WebDAV fallback) [ddd6679]

1.4.0 (2024-05-17)

17 May 19:29
@mih mih
1.4.0
e8647c4
Compare
Choose a tag to compare

🐛 Bug Fixes

  • RIA over SSH access from Mac clients to Linux server was broken
    due to an inappropriate platform check that assumed that local and
    remote platform are identical.
    Fixes datalad/datalad#7536 via #653 (by @mih)

  • next-status has received a number of fixes:

    • It no longer issues undesirable modification reports
      that are based on mtime changes alone (i.e., no content change).
      Fixes #639 via #650 (by @mih)
    • It now detects staged changes in repositories with no
      commit.
      Fixes #680 via #681 (by @mih)
    • next-status -r mono now reports on new commits in submodules.
      Previously this was ignored, leading to the impression of
      clean datasets despite unsaved changes.
      Fixes #645 via #679 (by @mih)
  • iter_annexworktree() can now also be used on plain Git repos,
    and would behave exactly as if reporting on non-annexed files
    in a git-annex repo. Previously, a cryptic iterable did not yield matching item for route-in item, cardinality mismatch? error was
    issued in this case.
    Fixes #670 via #673 (by @mih)

💫 Enhancements and new features

  • datalad_next.shell provides a context manager for (long-running)
    shell or interpreter subprocesses. Within the context any number of
    commands can be executed in such a shell, and each command can
    process input (iterables), and yield output (iterables). This feature
    is suitable for running and controlling "remote shells" like a login
    shell on a server via SSH. A range of utilities is provided to
    employ this functionality for special purpose implementations
    (e.g., accept fixed-length or variable-length process output).
    A suite of operations like download/upload file to a remote shell is
    provided for POSIX-compliant shells datalad_next.shell.operations.posix.
    #596 (by @christian-monch)

  • A rewrite of SSHRemoteIO, the RIA SSH-operations implementation from
    datalad-core is provided as a patch. It is based on the new shell
    feature, and provides more robust operations. It's IO performance is
    at the same level as scp-based down/uploads. In contrast to the
    original implementation, it support fine-grained progress reporting
    for uploads and downloads.
    Via #655 (by @mih)

  • The SpecialRemote base class in datalad-core is patched to support
    a standard close() method for implementing resource release and cleanup
    operations. The main special remote entry point has been altered to
    run implementations within a closing() context manager to guarantee
    execution of such handlers.
    Via #655 (by @mih)

  • A new has_initialized_annex() helper function is provided to
    test for a locally initialized annex in a repo.
    Via #673 (by @mih)

  • iter_annexworktree() can now also be used on plain Git repositories,
    and it yields the same output and behavior as running on a git-annex
    repository with no annex'ed content (just tracked with Git).
    Fixes #670 via #673 (by @mih)

  • next-status and iter_gitstatus() have been improved to
    report on further modifications after a file addition has been
    originally staged.
    Fixes #637 via #679 (by @mih)

  • next-status result rendering has been updated to be more markedly
    different than git-status's. Coloring is now exclusively
    determined by the nature of a change, rather than being partially
    similar to git-status's index-updated annotation. This reduces
    the chance for misinterpretations, and does not create an undesirable
    focus on the Git index (which is largely ignored by DataLad).
    Fixes #640 via #679 (by @mih)

  • A large 3k-line patch set replaces almost the entire RIA implementation,
    including the ORA special remote, and the create-sibling-ria command.
    The new implementation brings uniform support for Windows clients, progress
    reporting for uploads and downloads via SSH, and a faster and more
    robust behavior for SSH-based operations (based on the new remote
    shell feature).
    Fixes #654 via #669 (by @christian-monch)

📝 Documentation

  • Git-related subprocess execution helpers are now accessible in the
    rendered documentation, and all supported file collections are now
    mentioned in the ls-file-collection command help.
    Fixes #668 via #671 (by @mih)

🛡 Tests

  • Test setup has been improved to support a uniform, datalad-next
    enabled environment for subprocesses too. This extends the scope
    of testing to special remote implementations and other code that
    is executed in subprocesses, and relies on runtime patches.
    See https://github.com/datalad/datalad-next/pull/i665 (by @mih)

1.3.0 (2024-03-19)

19 Mar 12:42
@mih mih
1.3.0
12c6d53
Compare
Choose a tag to compare

💫 Enhancements and new features

  • Code organization is adjusted to clearly indicate what is part of the
    package's public Python API. Anything that can be imported directly from
    the top-level of any sub-package is part of the public API.
    As an example: from datalad_next.runners import iter_git_subproc
    imports a part of the public API, but
    from datalad_next.runners.git import iter_git_subproc does not.
    See README.md for more information.
    Fixes #613 via #615 (by @mih), #617 (by @mih), #618 (by @mih), #619 (by @mih),
    #620 (by @mih), #621 (by @mih), #622 (by @mih), #623 (by @mih)

  • New patched_env context manager for patching a process'
    environment. This avoids the for importing unittest outside
    test implementations.
    Via #633 (by @mih)

  • call_git...() functions received a new force_c_locale
    parameter. This can be set whenever Git output needs to be parsed
    to force running the command with LC_ALL=C. Such an environment
    manipulation is off by default and not done unconditionally to
    let localized messaging through in a user's normal locale.

🐛 Bug Fixes

  • datalad-annex:: Git remote helper now tests for a repository
    deposit, and distinguishes an absent remote repository deposit
    vs cloning from an empty repository deposit. This rectifies
    confusing behavior (successful clones of empty repositories
    from broken URLs), but also fixes handling of subdataset clone
    candidate handling in get (which failed to skip inaccessible
    datalad-annex:: URLs for the same reason).
    Fixes #636 via
    #638 (by @mih)

📝 Documentation

  • API docs have been updated to include all top-level symbols
    of any sub-package, or in other words: the public API.
    See #627 (by @mih)

🏠 Internal

  • The tree command no longer uses the subdatasets command
    for queries, but employs the recently introduced iter_submodules()
    for leaner operations.
    See #628 (by @mih)

  • call_git...() functions are established as the only used abstraction
    to interface with Git and git-annex commands outside the use in
    DataLad's Repo classes. Any usage of DataLad's traditional
    Runner functionality is discontinued.
    Fixes #541 via
    #632 (by @mih)

  • Type annotations have been added to the implementation of the
    uncurl git-annex remote. A number of unhandled conditions have
    been discovered and were rectified.

1.2.0 (2024-02-02)

02 Feb 13:30
@mih mih
1.2.0
Compare
Choose a tag to compare

🐛 Bug Fixes

  • Fix an invalid escape sequence in a regex that caused a syntax warning.
    Fixes #602 via
    #603 (by @mih)

💫 Enhancements and new features

  • Speed up of status reports for repositories with many submodules.
    An early presence check for submodules skips unnecessary evaluation
    steps. Fixes #606 via
    #607 (by @mih)

🏠 Internal

  • Fix implementation error in ParamDictator class that caused a test
    failure. The class itself is unused and has been scheduled for removal.
    See #611 and
    #610 (by @christian-monch)

🛡 Tests

  • Promote a previously internal fixture to provide a standard
    modified_dataset fixture. This fixture is sessions-scope, and
    yields a dataset with many facets of modification, suitable for
    testing change reporting. The fixture verifies that no
    modifications have been applied to the testbed. (by @mih)

  • iterable_subprocess tests have been robustified to better handle the
    observed diversity of execution environments. This addresseses, for example,
    https://bugs.debian.org/1061739.
    #614 (by @christian-monch)

1.1.0 -- Iterate!

21 Jan 07:51
@mih mih
1.1.0
2beedba
Compare
Choose a tag to compare

💫 Enhancements and new features

  • A new paradigm for subprocess execution is introduced. The main
    workhorse is datalad_next.runners.iter_subproc. This is a
    context manager that feeds input to subprocesses via iterables,
    and also exposes their output as an iterable. The implementation
    is based on https://github.com/uktrade/iterable-subprocess, and
    a copy of it is now included in the sources. It has been modified
    to work homogeneously on the Windows platform too.
    This new implementation is leaner and more performant. Benchmarks
    suggest that the execution of multi-step pipe connections of Git
    and git-annex commands is within 5% of the runtime of their direct
    shell-execution equivalent (outside Python).
    See #538 (by @mih),
    #547 (by @mih).

    With this change a number of additional features have been added,
    and internal improvements have been made. For example, any
    use of ThreadedRunner has been discontinued. See
    #539 (by @christian-monch),
    #545 (by @christian-monch),
    #550 (by @christian-monch),
    #573 (by @christian-monch)

    • A new itertools module was added. It provides implementations
      of iterators that can be used in conjunction with iter_subproc
      for standard tasks. This includes the itemization of output
      (e.g., line-by-line) across chunks of bytes read from a process
      (itemize), output decoding (decode_bytes), JSON-loading
      (json_load), and helpers to construct more complex data flows
      (route_out, route_in).

    • The more_itertools package has been added as a new dependency.
      It is used for datalad-next iterator implementations, but is also
      ideal for client code that employed this new functionality.

    • A new iter_annexworktree() provides the analog of iter_gitworktree()
      for git-annex repositories.

    • iter_gitworktree() has been reimplemented around iter_subproc. The
      performance is substantially improved.

    • iter_gitworktree() now also provides file pointers to
      symlinked content. Fixes #553
      via #555 (by @mih)

    • iter_gitworktree() and iter_annexworktree() now support single
      directory (i.e., non-recursive) reporting too.
      See #552

    • A new iter_gittree() that wraps git ls-tree for iterating over
      the content of a Git tree-ish.
      #580 (by @mih).

    • A new iter_gitdiff() wraps git diff-tree|files and provides a flexible
      basis for iteration over changesets.

  • PathBasedItem, a dataclass that is the bases for many item types yielded
    by iterators now more strictly separates name property from path semantics.
    The name is a plain string, and an additional, explicit path property
    provides it in the form of a Path. This simplifies code (the
    _ZipFileDirPath utility class became obsolete and was removed), and
    improve performance.
    Fixes #554 and
    #581 via
    #583 (by @mih)

  • A collection of helpers for running Git command has been added at
    datalad_next.runners.git. Direct uses of datalad-core runners,
    or subprocess.run() for this purpose have been replaced with call
    to these utilities.
    #585 (by @mih)

  • The performance of iter_gitworktree() has been improved by about
    10%. Fixes #540
    via #544 (by @mih).

  • New EnsureHashAlgorithm constraint to automatically expose
    and verify algorithm labels from hashlib.algorithms_guaranteed
    Fixes #346 via
    #492 (by @mslw @adswa)

  • The archivist remote now supports archive type detection
    from *E-type annex keys for .tgz archives too.
    Fixes #517 via
    #518 (by @mih)

  • iter_zip() uses a dedicated, internal PurePath variant to report on
    directories (_ZipFileDirPath). This enables more straightforward
    item.name in zip_archive tests, which require a trailing / for
    directory-type archive members.
    #430 (by @christian-monch)

  • A new ZipArchiveOperations class added support for ZIP files, and enables
    their use together with the archivist git-annex special remote.
    #578 (by @christian-monch)

  • datalad ls-file-collection has learned additional collections types:

    • The new zipfile collection type that enables uniform reporting on
      the additional archive type.

    • The new annexworktree collection that enhances the gitworktree
      collection by also reporting on annexed content, using the new
      iter_annexworktree() implementation. It is about 15% faster than a
      datalad --annex basic --untracked no -e no -t eval.

    • The new gittree collection for listing any Git tree-ish.

    • A new iter_gitstatus() can replace the functionality of
      GitRepo.diffstatus() with a substantially faster implementation.
      It also provides a novel mono recursion mode that completely
      hides the notion of submodules and presents deeply nested
      hierarchies of datasets as a single "monorepo".
      #592 (by @mih)

  • A new next-status command provides a substantially faster
    alternative to the datalad-core status command. It is closely
    aligned to git status semantics, only reports changes (not repository
    listings), and supports type change detection. Moreover, it exposes
    the "monorepo" recursion mode, and single-directory reporting options
    of iter_gitstatus(). It is the first command to use dataclass
    instances as result types, rather than the traditional dictionaries.

  • SshUrlOperations now supports non-standard SSH ports, non-default
    user names, and custom identity file specifications.
    Fixed #571 via
    #570 (by @mih)

  • A new EnsureRemoteName constraint improves the parameter validation
    of create-sibling-webdav. Moreover, the command has been uplifted
    to support uniform parameter validation also for the Python API.
    Missing required remotes, or naming conflicts are now detected and
    reported immediately before the actual command implementation runs.
    Fixes #193 via
    #577 (by @mih)

  • datalad_next.repo_utils provide a collection of implementations
    for common operations on Git repositories. Unlike the datalad-core
    Repo classes, these implementations do no require a specific
    data structure or object type beyond a Path.

🐛 Bug Fixes

  • Add patch to fix update's target detection for adjusted mode datasets
    that can crash under some circumstances.
    See datalad/datalad#7507, fixed via
    #509 (by @mih)

  • Comparison with is and a literal was replaced with a proper construct.
    While having no functional impact, it removes an ugly SyntaxWarning.
    Fixed #526 via
    #527 (by @mih)

📝 Documentation

  • The API documentation has been substantially extended. More already
    documented API components are now actually renderer, and more documentation
    has been written.

🏠 Internal

  • Type annotations have been extended. The development workflows now inform
    about type annotation issues for each proposed change.

  • Constants have been migrated to datalad_next.consts.
    #575 (by @mih)

🛡 Tests

  • A new test verifies compatibility with HTTP serves that do not report
    download progress.
    #369 (by @christian-monch)

  • The overall noise-level in the test battery output has been reduced
    substantially. INFO log messages are no longer shown, and command result
    rendering is largely suppressed. New test fixtures make it easier
    to maintain tidier output: reduce_logging, no_result_rendering.
    The contribution guide has been adjusted encourage their use.

  • Tests that require an unprivileged system account to run are now skipped
    when executed as root. This fixes an issue of the Debian package.
    #593 (by @adswa)

Full Changelog: 1.0.2...1.1.0

1.0.2 -- Debianize!

23 Oct 13:26
@mih mih
1.0.2
Compare
Choose a tag to compare

🏠 Internal

  • The www-authenticate dependencies is dropped. The functionality is
    replaced by a requests-based implementation of an alternative parser.
    This trims the dependency footprint and facilitates Debian-packaging.
    The previous test cases are kept and further extended.
    Fixes #493 via
    #495 (by @mih)

🛡 Tests

  • The test battery now honors the DATALAD_TESTS_NONETWORK environment
    variable and downgrades by skipping any tests that require external
    network access. (by @mih)

1.0.1 -- Fix me some 3.8

18 Oct 08:15
@mih mih
1.0.1
Compare
Choose a tag to compare

🐛 Bug Fixes

  • Fix f-string syntax in error message of the uncurl remote.
    #455 (by @christian-monch)

  • FileSystemItem.from_path() now honors its link_target parameter, and
    resolves a target for any symlink item conditional on this setting.
    Previously, a symlink target was always resolved.
    Fixes #462 via
    #464 (by @mih)

  • Update the vendor installation of versioneer to v0.29. This
    resolves an installation failure with Python 3.12 due to
    the removal of an ancient class.
    Fixes #475 via
    #483 (by @mih)

  • Bump dependency on Python to 3.8. This is presently the oldest version
    still supported upstream. However, some functionality already used
    3.8 features, so this is also a bug fix.
    Fixes #481 via
    #486 (by @mih)

💫 Enhancements and new features

  • Patch datalad-core's run command to honor configuration defaults
    for substitutions. This enables placeholders like {python} that
    point to sys.executable by default, and need not be explicitly
    defined in system/user/dataset configuration.
    Fixes #478 via
    #485 (by @mih)

📝 Documentation

🛡 Tests

  • Simplified setup for subprocess test-coverage reporting. Standard
    pytest-cov features are not employed, rather than the previous
    approach that was adopted from datalad-core, which originated
    in a time when testing was performed via nose.
    Fixes #453 via
    #457 (by @mih)

1.0.0 -- semantic versioning from here

25 Sep 15:36
@mih mih
1.0.0
b153dae
Compare
Choose a tag to compare

This release represents a milestone in the development of the extension.
The package is reorganized to be a collection of more self-contained
mini-packages, each with its own set of tests.

Developer documentation and guidelines have been added to aid further
development. One particular goal is to establish datalad-next as a proxy
for importing datalad-core functionality for other extensions. Direct imports
from datalad-core can be minimized in favor of imports from datalad-next.
This helps identifying functionality needed outside the core package,
and guides efforts for future improvements.

The 1.0 release marks the switch to a more standard approach to semantic
versioning. However, although a substantial improvements have been made,
the 1.0 version nohow indicates a slowdown of development or a change in the
likelihood of (breaking) changes. They will merely become more easily
discoverable from the version label alone.

Notable high-level features introduced by this major release are:

  • The new UrlOperations framework to provide a set of basic operations like
    download, upload, stat for different protocols. This framework can be
    thought of as a replacement for the "downloaders" functionality in
    datalad-core -- although the feature list is not 100% overlapping. This new
    framework is more easily extensible by 3rd-party code.

  • The Constraints framework elevates parameter/input validation to the next
    level. In contrast to datalad-core, declarative input validation is no longer
    limited to the CLI. Instead, command parameters can now be validated regardless
    of the entrypoint through which a command is used. They can be validated
    individually, but also sets of parameters can be validated jointly to implement
    particular interaction checks. All parameter validations can now be performed
    exhaustive, to present a user with a complete list of validation errors, rather
    then the fail-on-first-error method implemented exclusively in datalad-core.
    Validation errors are now reported using dedicated structured data type to aid
    their communication via non-console interfaces.

  • The Credentials system has been further refined with more homogenized
    workflows and deeper integration into other subsystems. This release merely
    represents a snapshot of continued development towards a standardization of
    credential handling workflows.

  • The annex remotes uncurl and archivist are replacements for the
    datalad-core implementations datalad and datalad-archive. The offer
    substantially improved configurability and leaner operation -- built on the
    UrlOperations framework.

  • A growing collection of iterator (see iter_collections) aims to provide
    fast (and more Pythonic) operations on common data structures (Git worktrees,
    directories, archives). The can be used as an alternative to the traditional
    Repo classes (GitRepo, AnnexRepo) from datalad-core.

  • Analog to UrlOperations the ArchiveOperations framework aims to provide
    an abstraction for operations on different archive types (e.g., TAR). The
    represent an alternative to the traditional implementations of
    ExtractedArchive and ArchivesCache from datalad-core, and aim at leaner
    resource footprints.

  • The collection of runtime patches for datalad-core has been further expanded.
    All patches are now individually documented, and applied using a set of standard
    helpers (see http://docs.datalad.org/projects/next/en/latest/patches.html).

For details, please see the changelogs of the 1.0.0 beta releases below.

💫 Enhancements and new features

  • TarArchiveOperations is the first implementation of the ArchiveOperations
    abstraction, providing archive handlers with a set of standard operations:
    • open to get a file object for a particular archive member
    • __contains__ to check for the presence of a particular archive member
    • __iter__ to get an iterator for processing all archive members
      #415 (by @mih)

🐛 Bug Fixes

  • Make TarfileItem.name be of type PurePosixPath to reflect the fact
    that a TAR archive can contain members with names that cannot be represent
    unmodified on a non-POSIX file system.
    #422 (by @mih)
    An analog change is done for ZipfileItem.name.
    #409 (by @christian-monch)

  • Fix git ls-file parsing in iter_gitworktree() to be compatible with
    file names that start with a tab character.
    #421 (by @christian-monch)

📝 Documentation

  • Expanded guidelines on test implementations.

  • Add missing and fix wrong docstrings for HTTP/WebDAV server related fixtures.
    #445 (by @adswa)

🏠 Internal

  • Deduplicate configuration handling code in annex remotes.
    #440 (by @adswa)

🛡 Tests

  • New test fixtures have been introduced to replace traditional test helpers
    from datalad-core:

    • datalad_interactive_ui and datalad_noninteractive_ui for testing
      user interactions. They replace with_testsui.
      #427 (by @mih)
  • Expand test coverage for create_sibling_webdav to include recursive
    operation.
    #434 (by @adswa)

1.0.0.b3 (third 1.0 preview release)

09 Jun 09:27
@mih mih
1.0.0b3
Compare
Choose a tag to compare
Pre-release

🐛 Bug Fixes

  • Patch CommandError, the standard exception raised for any non-zero exit
    command execution to now reports which command failed with repr() too.
    Previously, only str() would produce an informative message about a failure,
    while repr() would report CommandError(''), unless a dedicated message was
    provided. (by @mih)

  • Some error messages (in particular from within git-annex special remotes)
    exhibited uninformative error messages like CommandError(''). This
    is now fixed by letting CommandError produce the same error rendering
    in __str__ and __repr__. Previously, RuntimeError.__repr__ was used,
    which was unaware of command execution details also available in the exception.
    #386 (by @mih)

  • The datalad-annex Git remote helper can now handle the case where
    a to-be-clone repository has a configured HEAD ref that does not
    match the local configured default (e.g., master vs main
    default branch).
    Fixes #412 via
    #411 (by @mih)

  • Patch create_sibling_gitlab to work with present day GitLab deployments.
    This required adjusting the naming scheme for the flat and collection
    layouts. Moreover, the hierarchy layout is removed. it has never been
    fully implemented, and conceptually suffers from various corner-cases
    that cannot be (easily) addressed. Consequently, the collection layout
    is the new default. It's behavior matches that of hierarchy as far as this
    was functional, hence there should be no breakage for active users.
    #413

💫 Enhancements and new features

  • Patch the process entrypoint of DataLad's git-annex special remote
    implementations to funnel internal progress reporting to git-annex
    via standard PROGRESS protocol messages. This makes it obsolete
    (in many cases) to implement custom progress reporting, and the
    use of the standard log_progress() helper (either directly or
    indirectly) is sufficient to let both a parent DataLad process
    or git-annex see progress reports from special remotes.
    Fixes #328 via
    #329 (by @mih)

  • The HttpUrlOperations handler now supports custom HTTP headers.
    This makes it possible to define custom handlers in configuration
    that include such header customization, for example to send
    custom secret or session IDs.
    Fixes #336 (by @mih)

  • Constraint implementations now raise ConstraintError consistently
    on a violation. This now makes it possible to distinguish properly
    handled violations from improper implementation of such checks.
    Moreover, raise_for() is now used consistently, providing
    uniform, structured information on such violations.
    ConstraintError is derived from ValueError (the exception
    that was previously (mostly) raised. Therefore, client-code should
    continue to work without modification, unless a specific wording
    of an exception message is relied upon. In few cases, an implicit
    TypeError (e.g., EnsureIterableof) has been replaced by an
    explicit ConstraintError, and client code needs to be adjusted.
    The underlying exception continues to be available via
    ConstraintError.caused_by. (by @mih)

  • New MultiHash helper to compute multiple hashes in one go.
    Fixes #345 (by @mih)

  • As a companion of LeanGitRepo a LeanAnnexRepo has been added. This class
    is primarily used to signal that particular code does not require the full
    AnnexRepo API, but works with a much reduced API, as defined by that class.
    The API definition is not final and will grow in future releases to accommodate
    all standard use cases. #387
    (by @mih)

  • Dedicated dataclasses for common types, such as git-annex keys (AnnexKey)
    and dl+archives: URLs (ArchivistLocator) have been added. They support
    parsing and rendering their respective plain-text representations. These new
    types are now also available for more precise type annotation and argument
    validation. (by @mih)

  • datalad_next.archive_operations has been added, and follows the pattern
    established by the UrlOperations framework, to provide uniform handling
    to different archive types. Two main (read) operations are supported:
    iteration over archive members, and access to individual member content
    via a file-like. (by @mih)

  • New archivist git-annex special remote, as a replacement for the
    datalad-archives remote. It is implemented as a drop-in replacement
    with the ability to also fall-back on the previous implementation.
    In comparison to its predecessor, it reduces the storage overhead
    from 200% to 100% by doing partial extraction from fully downloaded
    archives. It is designed to be extended with support for partial
    access to remote archives (thereby reducing storage overhead to zero),
    but this is not yet implemented.

  • New datalad_next.iter_collections module providing iterators for
    items in particular collections, such as TAR or ZIP archives members,
    the content of a file system directory, or the worktree of a Git repository.
    Iterators yield items of defined types that typically carry information on
    the properties of collections items, and (in the case of files) access to
    their content.

  • New command ls_file_collection() is providing access to a select set
    of collection iterators via the DataLad command. In addition to the
    plain iterators, it provide uniform content hashing across all
    supported collection types.

  • The datalad-annex Git remote helper can now recognize and handle
    legacy repository deposits made by its predecessor from datalad-osf.
    #411 (by @mih)

🏠 Internal

  • Remove DataLad runner performance patch, and all patches to clone
    functionality. They are included in datalad-0.18.1, dependency adjusted.

  • New deprecated decorator for standardized deprecation handling
    of commands, functions, and also individual keyword arguments of
    callables, and even particular values for such arguments.
    Inspired by datalad/datalad#6998.
    Contributed by @adswa

  • Use the correct type annotation for cfg-parameter of
    datalad_next.utils.requests_auth.DataladAuth.__init__()
    #385 (by @christian-monch)

  • The patch registry has been moved to datalad_next.patches.enabled,
    and the apply_patch() helper is now located in datalad_next.patches
    directly to avoid issues with circular dependencies when patching
    core components like the ConfigManager. The documentation on patching
    has been adjusted accordingly.
    #391 (by @mih)

  • The main() entrypoint of the datalad-annex Git remote helper has
    be generalized to be more re-usable by other (derived) remote helper
    implementations.
    #411 (by @mih)