Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement index exporter #2746

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

highpon
Copy link
Contributor

@highpon highpon commented Nov 14, 2024

Description

This pull request adds feature of index exporter.
Index exporter is exporting index to specific directory using gateway's endpoint of stream.
It can configure concurenncy of network I/O and disk I/O, output directory.
It also supports eventual consistency, so possibility of not being able to retrieve the latest stored data.

Related Issue

#2655

Versions

  • Vald Version: v1.7.14
  • Go Version: v1.23.3
  • Rust Version: v1.82.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.2
  • Helm Version: v3.16.2
  • NGT Version: v2.3.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new GitHub Actions workflow for automating Docker image builds for the index-exportation component.
    • Added a new Kubernetes CronJob for the index-exportation service, scheduled to run hourly.
    • Created a new PersistentVolumeClaim for storage needs in the index-exportation job.
    • Implemented a comprehensive configuration for the index-exportation service, including logging, server, and observability settings.
  • Improvements

    • Enhanced the build process with new Docker image targets and zip artifacts for the index-exportation job.
    • Added configuration management structures and options for the exportation job, improving flexibility and error handling.
  • Documentation

    • Updated configuration files and added YAML files for Kubernetes deployment.

Copy link
Contributor

coderabbitai bot commented Nov 14, 2024

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive implementation of an index exportation job for the Vald project. The changes include a new GitHub Actions workflow, Makefile updates, a Docker configuration, Kubernetes deployment resources, and multiple Go packages that define the configuration, service, and use case for the index exportation functionality. The implementation provides a robust mechanism for exporting and managing index data, with support for concurrent processing, error handling, and observability.

Changes

File Change Summary
.github/workflows/dockers-index-exportation-image.yaml New workflow for building Docker image for index exportation
Makefile, Makefile.d/build.mk, Makefile.d/docker.mk Added new targets and variables for index exportation build and Docker image
cmd/index/job/exportation/main.go Main entry point for index exportation job
cmd/index/job/exportation/sample.yaml Sample configuration file for index exportation
dockers/index/job/exportation/Dockerfile Multi-stage Dockerfile for building index exportation image
internal/config/index_exporter.go Configuration struct and binding method for index exporter
k8s/index/job/exportation/ Kubernetes resources (ConfigMap, CronJob, PVC) for deploying index exportation
pkg/index/job/exportation/ Go packages implementing configuration, service, and use case for index exportation

Sequence Diagram

sequenceDiagram
    participant Main as Main Entrypoint
    participant Config as Configuration Loader
    participant Exporter as Exporter Service
    participant Gateway as gRPC Gateway
    participant KVS as Key-Value Store

    Main->>Config: Load Configuration
    Config-->>Main: Return Configuration
    Main->>Exporter: Initialize with Configuration
    Exporter->>Gateway: Start Client Connection
    Exporter->>KVS: Open Index Path
    Exporter->>Gateway: Stream Index Data
    Gateway-->>Exporter: Return Stream Results
    Exporter->>KVS: Update/Insert Vectors
Loading

Possibly related PRs

Suggested Labels

type/feature, priority/medium, area/internal, area/makefile, area/index/job/creation, area/gateway/lb, actions/e2e-deploy, actions/fossa

Suggested Reviewers

  • hlts2
  • vankichi

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@highpon highpon marked this pull request as draft November 14, 2024 15:10
@highpon highpon force-pushed the feat/implement-vald-index-exporter-and-importer branch 4 times, most recently from 27d750c to 49cf2d7 Compare November 21, 2024 04:05
@highpon highpon force-pushed the feat/implement-vald-index-exporter-and-importer branch 2 times, most recently from 596dac6 to 15f9fef Compare November 25, 2024 04:19
@highpon highpon force-pushed the feat/implement-vald-index-exporter-and-importer branch from 15f9fef to 9aeb887 Compare December 12, 2024 01:42
@highpon highpon changed the title [WIP] feat: Implement index exporter and importer [WIP] feat: Implement index exporter Dec 14, 2024
@highpon highpon force-pushed the feat/implement-vald-index-exporter-and-importer branch from f238829 to 11d9d77 Compare December 15, 2024 17:18
@highpon highpon force-pushed the feat/implement-vald-index-exporter-and-importer branch from 1d80568 to 8264562 Compare December 15, 2024 17:25
@highpon highpon changed the title [WIP] feat: Implement index exporter feat: Implement index exporter Dec 16, 2024
@highpon highpon marked this pull request as ready for review December 16, 2024 04:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (8)
dockers/index/job/exportation/Dockerfile (1)

20-20: Remove unused build argument UPX_OPTIONS

The build argument UPX_OPTIONS is set but not used in the Dockerfile. Consider removing it to simplify the build context and avoid confusion.

pkg/index/job/exportation/service/exporter.go (1)

182-183: Ensure errors are properly handled when canceling the context

When canceling the context due to a streaming error, make sure to log or handle the error appropriately to aid in debugging.

cmd/index/job/exportation/main.go (2)

29-31: Consider documenting version constraints

The version constraints (maxVersion and minVersion) are defined but their significance and the reasoning behind the specific values chosen isn't clear.

Consider adding comments explaining:

  • Why v0.0.10 is the maximum supported version
  • What breaking changes might exist between versions
  • When these constraints should be updated

34-58: Potential improvement in error handling and recovery

While the error handling using safety.RecoverFunc is good, the error logging could be enhanced.

Consider this improvement:

 if err := safety.RecoverFunc(func() error {
   return runner.Do(
     context.Background(),
     // ... other options ...
   )
 })(); err != nil {
-  log.Fatal(err, info.Get())
-  return
+  log.Fatalf("Failed to run %s: %+v\nInfo: %+v", name, err, info.Get())
 }
pkg/index/job/exportation/config/config.go (3)

24-37: Consider adding validation tags and defaults

The configuration struct could benefit from validation tags and default values.

Consider adding validation tags:

 type Data struct {
   config.GlobalConfig `json:",inline" yaml:",inline"`
 
-  Server *config.Servers `json:"server_config" yaml:"server_config"`
+  Server *config.Servers `json:"server_config" yaml:"server_config" validate:"required"`
 
-  Observability *config.Observability `json:"observability" yaml:"observability"`
+  Observability *config.Observability `json:"observability" yaml:"observability" validate:"required"`
 
-  Exporter *config.IndexExporter `json:"exporter" yaml:"exporter"`
+  Exporter *config.IndexExporter `json:"exporter" yaml:"exporter" validate:"required"`
 }

43-45: Enhance error context in configuration reading

The error from config.Read could benefit from additional context.

-if err = config.Read(path, &cfg); err != nil {
-  return nil, err
+if err = config.Read(path, &cfg); err != nil {
+  return nil, errors.Wrap(err, "failed to read configuration file")
 }

53-57: Consider logging when using default Observability config

When falling back to default Observability configuration, it would be helpful to log this decision.

Consider adding logging:

 if cfg.Observability != nil {
   _ = cfg.Observability.Bind()
 } else {
+  log.Debug("Using default Observability configuration")
   cfg.Observability = new(config.Observability).Bind()
 }
cmd/index/job/exportation/sample.yaml (1)

71-75: Consider increasing concurrency for better performance

The current concurrency setting of 1 might be a bottleneck for large index exports. Consider increasing it based on your system's capacity.

-  concurrency: 1
+  concurrency: 4  # Adjust based on system capacity
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd27082 and 247895e.

⛔ Files ignored due to path filters (2)
  • hack/actions/gen/main.go is excluded by !**/gen/**
  • hack/docker/gen/main.go is excluded by !**/gen/**
📒 Files selected for processing (15)
  • .github/workflows/dockers-index-exportation-image.yaml (1 hunks)
  • Makefile (1 hunks)
  • Makefile.d/build.mk (4 hunks)
  • Makefile.d/docker.mk (3 hunks)
  • cmd/index/job/exportation/main.go (1 hunks)
  • cmd/index/job/exportation/sample.yaml (1 hunks)
  • dockers/index/job/exportation/Dockerfile (1 hunks)
  • internal/config/index_exporter.go (1 hunks)
  • k8s/index/job/exportation/configmap.yaml (1 hunks)
  • k8s/index/job/exportation/cronjob.yaml (1 hunks)
  • k8s/index/job/exportation/pvc.yaml (1 hunks)
  • pkg/index/job/exportation/config/config.go (1 hunks)
  • pkg/index/job/exportation/service/exporter.go (1 hunks)
  • pkg/index/job/exportation/service/options.go (1 hunks)
  • pkg/index/job/exportation/usecase/exportation.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • k8s/index/job/exportation/pvc.yaml
🧰 Additional context used
🪛 yamllint (1.35.1)
k8s/index/job/exportation/cronjob.yaml

[warning] 55-55: wrong indentation: expected 8 but found 12

(indentation)


[error] 67-67: syntax error: expected , but found ''

(syntax)

🔇 Additional comments (10)
dockers/index/job/exportation/Dockerfile (2)

85-86: Verify the correctness of the COPY command paths

Ensure that the source path cmd/index/job/exportation/sample.yaml exists in the build context and that the destination directory /etc/server exists or is created in the final image. The distroless image is minimal and may not contain /etc/server.

Consider modifying the COPY command or ensuring the destination directory is properly set up.


1-89: Dockerfile is well-structured and follows best practices

The multi-stage build is correctly implemented, and best practices are followed for building and running the index-exportation application.

internal/config/index_exporter.go (1)

34-43: Configuration struct and binding method are properly defined

The IndexExporter struct is well-defined with necessary configuration fields, and the Bind method correctly resolves actual values and binds the gateway configuration.

pkg/index/job/exportation/service/options.go (1)

25-31: Review default values for production readiness

The default values need careful review for production deployment:

  • 200 concurrent streams might be too high for some environments
  • 5s intervals might be too frequent for large datasets

Let's check if these values are documented or discussed in the issue:

Consider making these values configurable through environment variables or configuration files.

k8s/index/job/exportation/cronjob.yaml (1)

138-140: Verify PVC creation and configuration

The CronJob references a PVC named export-pvc, but we need to ensure this PVC is created with appropriate storage class and capacity.

✅ Verification successful

This broader search should help us find any PVC definitions, including those that might have different filenames.


This will help us verify the PVC configuration details.


PVC configuration is properly defined and referenced

The PVC export-pvc is correctly defined in k8s/index/job/exportation/pvc.yaml with:

  • 10Gi storage capacity
  • ReadWriteOnce access mode
  • local-path storage class
  • Matches the name referenced in the CronJob
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if PVC definition exists in the repository
fd -t f "export-pvc" | grep "\.yaml$"

Length of output: 37


Script:

#!/bin/bash
# Search for PVC definitions in yaml files using ripgrep
rg -l "kind:\s*PersistentVolumeClaim" --type yaml

# Also search for potential PVC definitions with the specific name
rg "claimName:\s*export-pvc" --type yaml -B5 -A5

Length of output: 669


Script:

#!/bin/bash
# Let's check the content of the PVC definition file
cat k8s/index/job/exportation/pvc.yaml

Length of output: 252

Makefile.d/build.mk (2)

28-28: LGTM: Binary build targets properly configured

The new index-exportation targets are correctly added to both binary/build and binary/build/zip lists, maintaining alphabetical order and following the established pattern.

Also applies to: 142-142


89-91: LGTM: Build configuration properly defined

The build configuration for index-exportation follows best practices:

  • Uses static linking (CGO_ENABLED=0)
  • Consistent with other index job configurations
.github/workflows/dockers-index-exportation-image.yaml (1)

20-247: LGTM: GitHub workflow properly configured

The workflow configuration is well-structured:

  • Comprehensive path triggers for relevant file changes
  • Proper integration with shared Docker build workflow
  • Correct secret inheritance for secure builds
Makefile.d/docker.mk (1)

363-368: LGTM: Docker build target properly configured

The build target configuration follows the standard pattern and includes all necessary parameters.

Makefile (1)

45-45: LGTM! Verify variable usage across build system.

The INDEX_EXPORTATION_IMAGE variable is correctly defined following the project's naming conventions and patterns.

Let's verify how this variable is used across the build system:

✅ Verification successful

Variable usage verified and properly integrated

The INDEX_EXPORTATION_IMAGE variable is correctly integrated into the build system and its ecosystem:

  • Used in Makefile.d/docker.mk for Docker image building
  • Has a dedicated Dockerfile in dockers/index/job/exportation/
  • Configured with proper GitHub workflow in .github/workflows/dockers-index-exportation-image.yaml
  • Referenced in Helm charts for deployment configuration
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of INDEX_EXPORTATION_IMAGE variable in build system files

# Search for variable usage in Makefile.d directory
echo "Checking Makefile.d directory for variable usage:"
rg "INDEX_EXPORTATION_IMAGE" Makefile.d/

# Search for variable usage in Docker-related files
echo "Checking Docker-related files:"
rg "index-exportation" .github/workflows/ docker/ charts/

Length of output: 868


Script:

#!/bin/bash
# Let's verify if there are any Dockerfile or configuration files for index-exportation
echo "Checking for index-exportation related files:"
fd "index.*export" --type f

# Check for any potential usage of the image in k8s manifests or helm charts
echo "Checking for index-exportation in k8s/helm related files:"
fd "values.yaml|_test.yaml|deployment.yaml" --type f --exec rg -l "index.*export" {}

# Check the GitHub workflow file content
echo "Checking GitHub workflow content:"
cat .github/workflows/dockers-index-exportation-image.yaml

Length of output: 8811


Script:

#!/bin/bash
# Let's check the Dockerfile and related configuration
echo "Checking Dockerfile content:"
cat dockers/index/job/exportation/Dockerfile

# Let's examine the values.yaml that might use this image
echo "Checking values.yaml content for index exportation:"
rg -A 5 -B 5 "index.*export" charts/vald/values.yaml

Length of output: 4618

pkg/index/job/exportation/usecase/exportation.go Outdated Show resolved Hide resolved
pkg/index/job/exportation/service/exporter.go Outdated Show resolved Hide resolved
pkg/index/job/exportation/service/exporter.go Outdated Show resolved Hide resolved
cmd/index/job/exportation/sample.yaml Show resolved Hide resolved
k8s/index/job/exportation/configmap.yaml Show resolved Hide resolved
k8s/index/job/exportation/configmap.yaml Outdated Show resolved Hide resolved
Makefile.d/build.mk Outdated Show resolved Hide resolved
Makefile.d/docker.mk Outdated Show resolved Hide resolved
@kmrmt kmrmt requested review from a team, kmrmt and kpango and removed request for a team December 16, 2024 04:43
highpon and others added 4 commits December 18, 2024 00:07
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
highpon and others added 3 commits December 18, 2024 00:22
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: s-shiraki <s.shiraki.business@gmail.com>
coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 17, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/index/job/exportation/usecase/exportation.go (1)

130-145: ⚠️ Potential issue

Improve error handling and process termination logic.

The current implementation has several issues:

  1. Using log.Fatalf in a goroutine is unsafe
  2. Self-termination using SIGTERM might not be the best approach

Consider implementing a graceful shutdown mechanism using context cancellation instead of process signals.

🧹 Nitpick comments (3)
pkg/index/job/exportation/usecase/exportation.go (3)

38-44: LGTM! Well-designed struct following dependency injection pattern.

The run struct is well-designed with clear separation of concerns. Each field serves a specific purpose:

  • eg: For concurrent operations management
  • cfg: For configuration management
  • observability: For metrics and tracing
  • server: For handling network operations
  • exporter: For the core business logic

Consider adding field-level documentation comments to improve code maintainability.


54-55: Add comment explaining the static analysis skip.

The skipcq: CRT-D0001 comment skips a static analysis check, but it's not clear why this is necessary.

Add a comment explaining why this static analysis check needs to be skipped.


174-187: Consider using a slice for collecting errors in Stop method.

The current error joining approach works but could be more readable using a slice to collect errors.

Consider this improvement:

 func (r *run) Stop(ctx context.Context) (errs error) {
+    var errors []error
     if r.observability != nil {
         if err := r.observability.Stop(ctx); err != nil {
-            errs = errors.Join(errs, err)
+            errors = append(errors, err)
         }
     }
     if r.server != nil {
         if err := r.server.Shutdown(ctx); err != nil {
-            errs = errors.Join(errs, err)
+            errors = append(errors, err)
         }
     }
-    return errs
+    return errors.Join(errors...)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 247895e and e9db705.

📒 Files selected for processing (6)
  • Makefile.d/build.mk (4 hunks)
  • Makefile.d/docker.mk (3 hunks)
  • k8s/index/job/exportation/configmap.yaml (1 hunks)
  • k8s/index/job/exportation/cronjob.yaml (1 hunks)
  • pkg/index/job/exportation/service/exporter.go (1 hunks)
  • pkg/index/job/exportation/usecase/exportation.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • Makefile.d/docker.mk
  • k8s/index/job/exportation/configmap.yaml
  • Makefile.d/build.mk
  • k8s/index/job/exportation/cronjob.yaml
  • pkg/index/job/exportation/service/exporter.go
🔇 Additional comments (2)
pkg/index/job/exportation/usecase/exportation.go (2)

1-36: LGTM! Well-organized imports and proper license header.

The package structure is clean with all necessary imports properly organized.


107-113: LGTM! Clean implementation with proper nil checking.

The PreStart method correctly handles the case when observability is not enabled.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 18, 2024
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 0% with 342 lines in your changes missing coverage. Please review.

Project coverage is 18.03%. Comparing base (d99ad22) to head (7f56a13).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/index/job/exportation/service/exporter.go 0.00% 125 Missing ⚠️
pkg/index/job/exportation/usecase/exportation.go 0.00% 111 Missing ⚠️
pkg/index/job/exportation/service/options.go 0.00% 49 Missing ⚠️
cmd/index/job/exportation/main.go 0.00% 22 Missing ⚠️
pkg/index/job/exportation/config/config.go 0.00% 22 Missing ⚠️
internal/config/index_exporter.go 0.00% 9 Missing ⚠️
hack/docker/gen/main.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2746      +/-   ##
==========================================
- Coverage   23.92%   18.03%   -5.89%     
==========================================
  Files         546      406     -140     
  Lines       54544    33155   -21389     
==========================================
- Hits        13050     5980    -7070     
+ Misses      40709    26816   -13893     
+ Partials      785      359     -426     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants