-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup audb.available() for S3/Minio #458
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThe implementation optimizes the Sequence diagram for optimized audb.available() functionsequenceDiagram
participant User
participant audb
participant Repository
participant Backend
User->>audb: Call available()
audb->>Repository: Iterate over REPOSITORIES
Repository->>Backend: Create backend interface
Backend-->>Repository: Return interface
Repository->>audb: List database names and versions
audb->>User: Return available databases
Class diagram for audb.available() optimizationclassDiagram
class audb {
+available()
}
class Repository {
+create_backend_interface()
+name
+backend
+host
}
class Backend {
+list_objects()
+exists()
}
audb --> Repository
Repository --> Backend
note for audb "Optimized available() to avoid recursive listing for S3/MinIO"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hagenw - I've reviewed your changes - here's some feedback:
Overall Comments:
- The code changes look well-structured, but the benchmarks show this is actually slightly slower (1.9s vs 1.7s). Please investigate why this 'optimization' is resulting in worse performance - perhaps by profiling the code or adding logging to compare the number of backend API calls being made in both approaches.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hagenw - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding proper interface methods to the backend class instead of accessing _client directly. While the current approach works, it would be cleaner to maintain the abstraction layer. This could be addressed in a follow-up PR.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
My original plan was also to add this feature directly to |
This speeds up
audb.available()
for the S3 and MinIO backends by avoiding listing all files recursively, but focusing on the database names and the versions of the database header files.Execution time for running
audb.available()
for the repositoriesaudb-public
(containing 7 smaller datasets) andaudb-internal
(containing 1 big dataset).Benchmark code
The pull request also makes sure
audb.available()
is tested for two S3 repositories as well, besides the Artifactory repositories.Summary by Sourcery
Enhancements:
Summary by Sourcery
Enhancements: