-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement index exporter #2746
base: main
Are you sure you want to change the base?
feat: Implement index exporter #2746
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive implementation of an index exportation job for the Vald project. The changes include a new GitHub Actions workflow, Makefile updates, a Docker configuration, Kubernetes deployment resources, and multiple Go packages that define the configuration, service, and use case for the index exportation functionality. The implementation provides a robust mechanism for exporting and managing index data, with support for concurrent processing, error handling, and observability. Changes
Sequence DiagramsequenceDiagram
participant Main as Main Entrypoint
participant Config as Configuration Loader
participant Exporter as Exporter Service
participant Gateway as gRPC Gateway
participant KVS as Key-Value Store
Main->>Config: Load Configuration
Config-->>Main: Return Configuration
Main->>Exporter: Initialize with Configuration
Exporter->>Gateway: Start Client Connection
Exporter->>KVS: Open Index Path
Exporter->>Gateway: Stream Index Data
Gateway-->>Exporter: Return Stream Results
Exporter->>KVS: Update/Insert Vectors
Possibly related PRs
Suggested Labels
Suggested Reviewers
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
27d750c
to
49cf2d7
Compare
596dac6
to
15f9fef
Compare
15f9fef
to
9aeb887
Compare
f238829
to
11d9d77
Compare
1d80568
to
8264562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🧹 Nitpick comments (8)
dockers/index/job/exportation/Dockerfile (1)
20-20
: Remove unused build argumentUPX_OPTIONS
The build argument
UPX_OPTIONS
is set but not used in the Dockerfile. Consider removing it to simplify the build context and avoid confusion.pkg/index/job/exportation/service/exporter.go (1)
182-183
: Ensure errors are properly handled when canceling the contextWhen canceling the context due to a streaming error, make sure to log or handle the error appropriately to aid in debugging.
cmd/index/job/exportation/main.go (2)
29-31
: Consider documenting version constraintsThe version constraints (
maxVersion
andminVersion
) are defined but their significance and the reasoning behind the specific values chosen isn't clear.Consider adding comments explaining:
- Why v0.0.10 is the maximum supported version
- What breaking changes might exist between versions
- When these constraints should be updated
34-58
: Potential improvement in error handling and recoveryWhile the error handling using
safety.RecoverFunc
is good, the error logging could be enhanced.Consider this improvement:
if err := safety.RecoverFunc(func() error { return runner.Do( context.Background(), // ... other options ... ) })(); err != nil { - log.Fatal(err, info.Get()) - return + log.Fatalf("Failed to run %s: %+v\nInfo: %+v", name, err, info.Get()) }pkg/index/job/exportation/config/config.go (3)
24-37
: Consider adding validation tags and defaultsThe configuration struct could benefit from validation tags and default values.
Consider adding validation tags:
type Data struct { config.GlobalConfig `json:",inline" yaml:",inline"` - Server *config.Servers `json:"server_config" yaml:"server_config"` + Server *config.Servers `json:"server_config" yaml:"server_config" validate:"required"` - Observability *config.Observability `json:"observability" yaml:"observability"` + Observability *config.Observability `json:"observability" yaml:"observability" validate:"required"` - Exporter *config.IndexExporter `json:"exporter" yaml:"exporter"` + Exporter *config.IndexExporter `json:"exporter" yaml:"exporter" validate:"required"` }
43-45
: Enhance error context in configuration readingThe error from
config.Read
could benefit from additional context.-if err = config.Read(path, &cfg); err != nil { - return nil, err +if err = config.Read(path, &cfg); err != nil { + return nil, errors.Wrap(err, "failed to read configuration file") }
53-57
: Consider logging when using default Observability configWhen falling back to default Observability configuration, it would be helpful to log this decision.
Consider adding logging:
if cfg.Observability != nil { _ = cfg.Observability.Bind() } else { + log.Debug("Using default Observability configuration") cfg.Observability = new(config.Observability).Bind() }
cmd/index/job/exportation/sample.yaml (1)
71-75
: Consider increasing concurrency for better performanceThe current concurrency setting of 1 might be a bottleneck for large index exports. Consider increasing it based on your system's capacity.
- concurrency: 1 + concurrency: 4 # Adjust based on system capacity
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
hack/actions/gen/main.go
is excluded by!**/gen/**
hack/docker/gen/main.go
is excluded by!**/gen/**
📒 Files selected for processing (15)
.github/workflows/dockers-index-exportation-image.yaml
(1 hunks)Makefile
(1 hunks)Makefile.d/build.mk
(4 hunks)Makefile.d/docker.mk
(3 hunks)cmd/index/job/exportation/main.go
(1 hunks)cmd/index/job/exportation/sample.yaml
(1 hunks)dockers/index/job/exportation/Dockerfile
(1 hunks)internal/config/index_exporter.go
(1 hunks)k8s/index/job/exportation/configmap.yaml
(1 hunks)k8s/index/job/exportation/cronjob.yaml
(1 hunks)k8s/index/job/exportation/pvc.yaml
(1 hunks)pkg/index/job/exportation/config/config.go
(1 hunks)pkg/index/job/exportation/service/exporter.go
(1 hunks)pkg/index/job/exportation/service/options.go
(1 hunks)pkg/index/job/exportation/usecase/exportation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- k8s/index/job/exportation/pvc.yaml
🧰 Additional context used
🪛 yamllint (1.35.1)
k8s/index/job/exportation/cronjob.yaml
[warning] 55-55: wrong indentation: expected 8 but found 12
(indentation)
[error] 67-67: syntax error: expected , but found ''
(syntax)
🔇 Additional comments (10)
dockers/index/job/exportation/Dockerfile (2)
85-86
: Verify the correctness of the COPY
command paths
Ensure that the source path cmd/index/job/exportation/sample.yaml
exists in the build context and that the destination directory /etc/server
exists or is created in the final image. The distroless image is minimal and may not contain /etc/server
.
Consider modifying the COPY
command or ensuring the destination directory is properly set up.
1-89
: Dockerfile is well-structured and follows best practices
The multi-stage build is correctly implemented, and best practices are followed for building and running the index-exportation
application.
internal/config/index_exporter.go (1)
34-43
: Configuration struct and binding method are properly defined
The IndexExporter
struct is well-defined with necessary configuration fields, and the Bind
method correctly resolves actual values and binds the gateway configuration.
pkg/index/job/exportation/service/options.go (1)
25-31
: Review default values for production readiness
The default values need careful review for production deployment:
- 200 concurrent streams might be too high for some environments
- 5s intervals might be too frequent for large datasets
Let's check if these values are documented or discussed in the issue:
Consider making these values configurable through environment variables or configuration files.
k8s/index/job/exportation/cronjob.yaml (1)
138-140
: Verify PVC creation and configuration
The CronJob references a PVC named export-pvc
, but we need to ensure this PVC is created with appropriate storage class and capacity.
✅ Verification successful
This broader search should help us find any PVC definitions, including those that might have different filenames.
This will help us verify the PVC configuration details.
PVC configuration is properly defined and referenced
The PVC export-pvc
is correctly defined in k8s/index/job/exportation/pvc.yaml
with:
- 10Gi storage capacity
- ReadWriteOnce access mode
- local-path storage class
- Matches the name referenced in the CronJob
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if PVC definition exists in the repository
fd -t f "export-pvc" | grep "\.yaml$"
Length of output: 37
Script:
#!/bin/bash
# Search for PVC definitions in yaml files using ripgrep
rg -l "kind:\s*PersistentVolumeClaim" --type yaml
# Also search for potential PVC definitions with the specific name
rg "claimName:\s*export-pvc" --type yaml -B5 -A5
Length of output: 669
Script:
#!/bin/bash
# Let's check the content of the PVC definition file
cat k8s/index/job/exportation/pvc.yaml
Length of output: 252
Makefile.d/build.mk (2)
28-28
: LGTM: Binary build targets properly configured
The new index-exportation targets are correctly added to both binary/build and binary/build/zip lists, maintaining alphabetical order and following the established pattern.
Also applies to: 142-142
89-91
: LGTM: Build configuration properly defined
The build configuration for index-exportation follows best practices:
- Uses static linking (CGO_ENABLED=0)
- Consistent with other index job configurations
.github/workflows/dockers-index-exportation-image.yaml (1)
20-247
: LGTM: GitHub workflow properly configured
The workflow configuration is well-structured:
- Comprehensive path triggers for relevant file changes
- Proper integration with shared Docker build workflow
- Correct secret inheritance for secure builds
Makefile.d/docker.mk (1)
363-368
: LGTM: Docker build target properly configured
The build target configuration follows the standard pattern and includes all necessary parameters.
Makefile (1)
45-45
: LGTM! Verify variable usage across build system.
The INDEX_EXPORTATION_IMAGE
variable is correctly defined following the project's naming conventions and patterns.
Let's verify how this variable is used across the build system:
✅ Verification successful
Variable usage verified and properly integrated
The INDEX_EXPORTATION_IMAGE
variable is correctly integrated into the build system and its ecosystem:
- Used in
Makefile.d/docker.mk
for Docker image building - Has a dedicated Dockerfile in
dockers/index/job/exportation/
- Configured with proper GitHub workflow in
.github/workflows/dockers-index-exportation-image.yaml
- Referenced in Helm charts for deployment configuration
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check usage of INDEX_EXPORTATION_IMAGE variable in build system files
# Search for variable usage in Makefile.d directory
echo "Checking Makefile.d directory for variable usage:"
rg "INDEX_EXPORTATION_IMAGE" Makefile.d/
# Search for variable usage in Docker-related files
echo "Checking Docker-related files:"
rg "index-exportation" .github/workflows/ docker/ charts/
Length of output: 868
Script:
#!/bin/bash
# Let's verify if there are any Dockerfile or configuration files for index-exportation
echo "Checking for index-exportation related files:"
fd "index.*export" --type f
# Check for any potential usage of the image in k8s manifests or helm charts
echo "Checking for index-exportation in k8s/helm related files:"
fd "values.yaml|_test.yaml|deployment.yaml" --type f --exec rg -l "index.*export" {}
# Check the GitHub workflow file content
echo "Checking GitHub workflow content:"
cat .github/workflows/dockers-index-exportation-image.yaml
Length of output: 8811
Script:
#!/bin/bash
# Let's check the Dockerfile and related configuration
echo "Checking Dockerfile content:"
cat dockers/index/job/exportation/Dockerfile
# Let's examine the values.yaml that might use this image
echo "Checking values.yaml content for index exportation:"
rg -A 5 -B 5 "index.*export" charts/vald/values.yaml
Length of output: 4618
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pkg/index/job/exportation/usecase/exportation.go (1)
130-145
:⚠️ Potential issueImprove error handling and process termination logic.
The current implementation has several issues:
- Using
log.Fatalf
in a goroutine is unsafe- Self-termination using SIGTERM might not be the best approach
Consider implementing a graceful shutdown mechanism using context cancellation instead of process signals.
🧹 Nitpick comments (3)
pkg/index/job/exportation/usecase/exportation.go (3)
38-44
: LGTM! Well-designed struct following dependency injection pattern.The
run
struct is well-designed with clear separation of concerns. Each field serves a specific purpose:
eg
: For concurrent operations managementcfg
: For configuration managementobservability
: For metrics and tracingserver
: For handling network operationsexporter
: For the core business logicConsider adding field-level documentation comments to improve code maintainability.
54-55
: Add comment explaining the static analysis skip.The
skipcq: CRT-D0001
comment skips a static analysis check, but it's not clear why this is necessary.Add a comment explaining why this static analysis check needs to be skipped.
174-187
: Consider using a slice for collecting errors in Stop method.The current error joining approach works but could be more readable using a slice to collect errors.
Consider this improvement:
func (r *run) Stop(ctx context.Context) (errs error) { + var errors []error if r.observability != nil { if err := r.observability.Stop(ctx); err != nil { - errs = errors.Join(errs, err) + errors = append(errors, err) } } if r.server != nil { if err := r.server.Shutdown(ctx); err != nil { - errs = errors.Join(errs, err) + errors = append(errors, err) } } - return errs + return errors.Join(errors...) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
Makefile.d/build.mk
(4 hunks)Makefile.d/docker.mk
(3 hunks)k8s/index/job/exportation/configmap.yaml
(1 hunks)k8s/index/job/exportation/cronjob.yaml
(1 hunks)pkg/index/job/exportation/service/exporter.go
(1 hunks)pkg/index/job/exportation/usecase/exportation.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- Makefile.d/docker.mk
- k8s/index/job/exportation/configmap.yaml
- Makefile.d/build.mk
- k8s/index/job/exportation/cronjob.yaml
- pkg/index/job/exportation/service/exporter.go
🔇 Additional comments (2)
pkg/index/job/exportation/usecase/exportation.go (2)
1-36
: LGTM! Well-organized imports and proper license header.
The package structure is clean with all necessary imports properly organized.
107-113
: LGTM! Clean implementation with proper nil checking.
The PreStart method correctly handles the case when observability is not enabled.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2746 +/- ##
==========================================
- Coverage 23.92% 18.03% -5.89%
==========================================
Files 546 406 -140
Lines 54544 33155 -21389
==========================================
- Hits 13050 5980 -7070
+ Misses 40709 26816 -13893
+ Partials 785 359 -426 ☔ View full report in Codecov by Sentry. |
Description
This pull request adds feature of index exporter.
Index exporter is exporting index to specific directory using gateway's endpoint of stream.
It can configure concurenncy of network I/O and disk I/O, output directory.
It also supports eventual consistency, so possibility of not being able to retrieve the latest stored data.
Related Issue
#2655
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation