Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport PR #2733 to release/v1.7 for [BUGFIX] resolve agent GetGraphStatistics API double-free error problem #2738

Conversation

vdaas-ci
Copy link
Collaborator

@vdaas-ci vdaas-ci commented Nov 11, 2024

Description

SSIA

Related Issue

Versions

  • Vald Version: v1.7.13
  • Go Version: v1.23.2
  • Rust Version: v1.81.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.1
  • Helm Version: v3.16.2
  • NGT Version: v2.2.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

  • New Features

    • Introduced new Docker configurations for index job deletion.
    • Enhanced the panic recovery mechanism with detailed logging.
    • Added new test cases for the singleflight implementation to improve coverage.
    • Updated the expiredHook function to accept an additional parameter for enhanced functionality.
    • Updated the Rust toolchain to version 1.82.0.
    • Updated various dependency versions to ensure compatibility and access to the latest features.
  • Bug Fixes

    • Updated error handling in various components to improve clarity and accuracy.
    • Corrected log messages in the saveIndex method to accurately reflect operations.
    • Adjusted expected outputs in the usearch tests to align with updated functionality.
  • Documentation

    • Updated version numbers across multiple dependencies and templates for consistency.
  • Chores

    • General maintenance updates to dependency versions for improved stability and performance.

Copy link

cloudflare-workers-and-pages bot commented Nov 11, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 98ca44a
Status: ✅  Deploy successful!
Preview URL: https://9c8b6614.vald.pages.dev
Branch Preview URL: https://backport-release-v1-7-bugfix-csxi.vald.pages.dev

View logs

Copy link
Contributor

coderabbitai bot commented Nov 11, 2024

📝 Walkthrough

Walkthrough

This pull request includes significant modifications to the project structure, focusing on the addition and renaming of files, particularly in the Dockerfile configurations and GitHub workflows. New files for Docker configurations related to deletion jobs and updates to existing workflow files have been introduced. Additionally, several version updates have been made across various dependencies, including Rust, Go, and Kubernetes versions. The changes also include enhancements to error handling and function signatures in the codebase.

Changes

File Path Change Summary
.github/workflows/dockers-example-client-image.yaml New file added.
.github/workflows/dockers-index-deletion.yaml New file added.
cmd/index/job/deletion/main.go New file added.
cmd/index/job/deletion/sample.yaml New file added.
dockers/index/job/deletion/Dockerfile New file added; modified to copy a configuration file.
.github/ISSUE_TEMPLATE/bug_report.md Updated version numbers for various components.
.github/ISSUE_TEMPLATE/security_issue_report.md Updated version numbers for various components.
.github/PULL_REQUEST_TEMPLATE.md Updated version numbers for various components.
dockers/agent/core/agent/Dockerfile Added newline at the end of the file.
dockers/agent/core/faiss/Dockerfile Added newline at the end of the file.
dockers/agent/core/ngt/Dockerfile Added newline at the end of the file.
dockers/agent/sidecar/Dockerfile Added newline at the end of the file.
dockers/binfmt/Dockerfile Added newline at the end of the file.
dockers/buildbase/Dockerfile Added newline at the end of the file.
dockers/buildkit/Dockerfile Added newline at the end of the file.
dockers/buildkit/syft/scanner/Dockerfile Added newline at the end of the file.
dockers/ci/base/Dockerfile Added newline at the end of the file.
dockers/dev/Dockerfile Expanded installation process and environment setup.
dockers/discoverer/k8s/Dockerfile Added newline at the end of the file.
dockers/example/client/Dockerfile Modified ENTRYPOINT instruction.
dockers/gateway/filter/Dockerfile Updated ENTRYPOINT instruction and added newline.
dockers/gateway/lb/Dockerfile Added newline at the end of the file.
dockers/gateway/mirror/Dockerfile Adjusted ENTRYPOINT instruction and added newline.
dockers/index/job/correction/Dockerfile Added newline at the end of the file.
dockers/index/job/creation/Dockerfile Modified ENTRYPOINT instruction.
dockers/index/job/deletion/Dockerfile Added configuration file copy step.
dockers/index/job/readreplica/rotate/Dockerfile Added newline at the end of the file.
dockers/index/job/save/Dockerfile Added newline at the end of the file.
dockers/index/operator/Dockerfile Added newline at the end of the file.
dockers/manager/index/Dockerfile Modified ENTRYPOINT instruction.
dockers/operator/helm/Dockerfile Adjusted ENTRYPOINT instruction and build process.
dockers/tools/benchmark/job/Dockerfile Added newline at the end of the file.
dockers/tools/benchmark/operator/Dockerfile Added newline at the end of the file.
dockers/tools/cli/loadtest/Dockerfile Added newline at the end of the file.
example/client/go.mod Updated various dependency versions.
go.mod Updated various dependency versions.
hack/go.mod.default Updated various dependency versions.
internal/cache/cache.go Updated expiredHook function signature.
internal/cache/gache/gache.go Updated expiredHook function signature.
internal/cache/gache/gache_test.go Updated test functions to match new expiredHook signature.
internal/cache/gache/option.go Updated WithExpiredHook function signature.
internal/cache/gache/option_test.go Updated test cases to match new WithExpiredHook signature.
internal/cache/option.go Updated WithExpiredHook function signature.
internal/cache/option_test.go Updated test cases to match new WithExpiredHook signature.
internal/client/v1/client/agent/core/client.go Updated client instantiation logic in methods.
internal/core/algorithm/ngt/ngt.go Enhanced GetGraphStatistics method with context handling.
internal/core/algorithm/ngt/ngt_test.go Updated tests for ngt structure.
internal/core/algorithm/usearch/usearch_test.go Removed a test case and updated expected outputs.
internal/errors/errors.go Refactored Is function and added new helper function.
internal/net/dialer.go Updated cacheExpireHook method signature.
internal/net/dialer_test.go Updated tests for cacheExpireHook method.
internal/safety/safety.go Enhanced error handling and logging in recoverFn function.
internal/sync/singleflight/singleflight.go Added ForgetUnshared method to manage keys in the group.
internal/sync/singleflight/singleflight_test.go Added tests for timeout and error handling in Do method.
pkg/agent/core/ngt/service/ngt.go Updated methods to accept context for better cancellation management.
pkg/manager/index/service/indexer.go Updated log messages for clarity.
rust/rust-toolchain Updated Rust toolchain version.
rust/rust-toolchain.toml Updated Rust toolchain version.
versions/BUF_VERSION Updated version number.
versions/GOLANGCILINT_VERSION Updated version number.
versions/GO_VERSION Updated version number.
versions/K3S_VERSION Updated version number.
versions/KIND_VERSION Updated version number.
versions/KUBECTL_VERSION Updated version number.
versions/KUBELINTER_VERSION Updated version number.
versions/NGT_VERSION Updated version number.
versions/OPERATOR_SDK_VERSION Updated version number.
versions/PROMETHEUS_STACK_VERSION Updated version number.
versions/PROTOBUF_VERSION Updated version number.
versions/RUST_VERSION Updated version number.
versions/TELEPRESENCE_VERSION Updated version number.
versions/USEARCH_VERSION Updated version number.
versions/actions/ACTIONS_SETUP_GO Updated version number.
versions/actions/ACTIONS_SETUP_NODE Updated version number.
versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG Updated version number.
versions/actions/GITHUB_CODEQL_ACTION_ANALYZE Updated version number.
versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD Updated version number.
versions/actions/GITHUB_CODEQL_ACTION_INIT Updated version number.
versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF Updated version number.
versions/actions/GITHUB_ISSUE_METRICS Updated version number.
versions/actions/SOFTPROPS_ACTION_GH_RELEASE Updated version number.

Possibly related PRs

Suggested reviewers

  • vankichi
  • kmrmt
  • datelier

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

USER root:root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [hadolint] <DL3002> reported by reviewdog 🐶
Last USER should not be root

@vdaas-ci
Copy link
Collaborator Author

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (20)
internal/cache/cache.go (1)

Line range hint 38-52: Consider adding error handling for expired values.

While the New function implementation looks correct, consider whether the expiredHook should handle errors during cleanup operations, especially since this change is part of fixing a double-free issue.

Consider updating the signature to:

-	expiredHook    func(context.Context, string, V)
+	expiredHook    func(context.Context, string, V) error

This would allow proper error propagation if cleanup operations fail.

internal/cache/gache/gache.go (1)

32-32: Consider adding documentation about hook execution timing.

The expiredHook field would benefit from documentation that clarifies when exactly the hook is executed in relation to the value's expiration and cleanup. This helps implementers understand the lifecycle and avoid potential race conditions.

Add a comment like:

+// expiredHook is called just before the value is removed from the cache.
+// The hook receives the key and value that are about to expire.
 expiredHook    func(context.Context, string, V)
internal/client/v1/client/agent/core/client.go (1)

Line range hint 39-77: Consider adding memory management documentation.

Given that this PR is related to fixing a double-free error, it would be helpful to document any memory management considerations in the client creation process, particularly in the New and NewAgentClient functions.

internal/cache/gache/option_test.go (1)

Line range hint 196-214: Consider adding test cases that utilize the value parameter.

While the current test case validates the basic functionality, consider adding test cases that actually use the value parameter to ensure proper handling of resources during cache expiration. This would help verify that the double-free issue is properly addressed.

Example test case:

 func() test {
-    fn := func(context.Context, string, any) {}
+    var captured any
+    fn := func(_ context.Context, _ string, v any) {
+        captured = v
+    }
     return test{
         name: "set success when f is not nil",
         args: args{
             f: fn,
         },
         want: want{
             want: &T{
                 expiredHook: fn,
             },
         },
         checkFunc: func(w want, g *T) error {
+            // Verify function pointer
             if reflect.ValueOf(w.want.expiredHook).Pointer() != reflect.ValueOf(g.expiredHook).Pointer() {
                 return errors.Errorf("got: \"%#v\",\n\t\t\t\twant: \"%#v\"", g, w)
             }
+            // Call the hook with a test value
+            g.expiredHook(context.Background(), "test-key", "test-value")
+            if captured != "test-value" {
+                return errors.Errorf("captured value = %v, want test-value", captured)
+            }
             return nil
         },
     }
 }()
internal/cache/gache/gache_test.go (1)

Missing test coverage for expired hook value parameter

The test file lacks verification of the expired value being correctly passed to the hook function. While the signature has been updated to func(context.Context, string, any), there are no test cases that:

  • Set up an expired hook that verifies the received value
  • Wait for cache expiration
  • Assert that the hook was called with the correct value

This gap in test coverage could hide potential issues with the double-free fix.

Suggested test additions:

  • Add test case(s) that verify the expired value parameter in the hook function
  • Include scenarios with different value types to ensure proper type handling
  • Test the hook's behavior during natural expiration and forced deletion
🔗 Analysis chain

Line range hint 79-91: LGTM: Signature update aligns with double-free fix.

The expiredHook signature update to include the cached value (any) as the third parameter is consistent with the PR's objective of fixing the double-free error in GetGraphStatistics API. This change allows proper cleanup by providing the actual value being expired.

Let's verify the test coverage for the new parameter:


Let me gather more information about the test coverage by checking the entire test file for expired hook related tests.


Let me check if there are any test cases that specifically verify the expired value is correctly passed to the hook by looking for test cases with expiration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are test cases that verify the expired value is correctly passed to the hook
rg -U "expiredHook.*func.*any.*\{[\s\S]*?\}" internal/cache/gache/gache_test.go

Length of output: 137


Script:

#!/bin/bash
# Get more context around expired hook tests
rg -U "expiredHook.*|expired.*hook" -B 3 -A 10 internal/cache/gache/gache_test.go

Length of output: 6613


Script:

#!/bin/bash
# Search for test cases involving expiration and hooks
rg -U "expire.*test|test.*expire" -B 5 -A 15 internal/cache/gache/gache_test.go

Length of output: 3166

internal/sync/singleflight/singleflight_test.go (1)

661-689: Enhance error testing robustness

While the test covers the basic concurrent error handling, it could be improved for better reliability and coverage:

Consider these improvements:

 func TestDoMultipleErrors(t *testing.T) {
     g := New[string]()
     var calls int32
-    someErr := errors.New("Some error")
+    someErr := errors.New("test error: concurrent error handling")
 
     const n = 10
     var wg sync.WaitGroup
+    results := make(chan error, n)
     for i := 0; i < n; i++ {
         wg.Add(1)
         go func() {
             defer wg.Done()
             v, shared, err := g.Do(context.Background(), "key", func(context.Context) (string, error) {
                 atomic.AddInt32(&calls, 1)
-                time.Sleep(10 * time.Millisecond)
+                // Simulate work without relying on sleep
+                runtime.Gosched()
                 return "", someErr
             })
-            if err != someErr {
+            if !errors.Is(err, someErr) {
                 t.Errorf("Do error = %v; want %v", err, someErr)
             }
             if v != "" {
                 t.Errorf("Do = %v; want empty string", v)
             }
+            if !shared {
+                t.Error("Expected error to be shared between goroutines")
+            }
+            results <- err
         }()
     }
     wg.Wait()
+    close(results)
+
+    // Verify all goroutines received the same error
+    var firstErr error
+    for err := range results {
+        if firstErr == nil {
+            firstErr = err
+            continue
+        }
+        if err != firstErr {
+            t.Error("Expected all goroutines to receive the same error instance")
+        }
+    }
+
     if got := atomic.LoadInt32(&calls); got != 1 {
-        t.Errorf("number of calls = %d; want 1", got)
+        t.Errorf("Expected function to be called exactly once, got %d calls", got)
     }
 }

The improvements include:

  1. More descriptive error message
  2. Proper error comparison using errors.Is
  3. Verification of error sharing between goroutines
  4. Removed sleep in favor of runtime.Gosched()
  5. Better test failure messages
go.mod (1)

Line range hint 355-355: Resolve placeholder versions in dependencies.

Several dependencies are using placeholder versions (00010101000000-000000000000). This could lead to reproducibility issues. Consider pinning these to specific versions:

  • github.com/akrylysov/pogreb
  • github.com/go-redis/redis/v8
  • github.com/gocraft/dbr/v2
  • github.com/gorilla/mux
  • github.com/grafana/pyroscope-go/godeltaprof

Also applies to: 356-356, 357-357, 358-358, 359-359, 360-360, 361-361, 362-362, 363-363, 364-364, 365-365

internal/core/algorithm/ngt/ngt_test.go (2)

175-178: Consider adding test cases for concurrent access.

While the synchronization primitives are properly initialized, there are no test cases that verify the thread-safety of the new fields under concurrent access.

Consider adding test cases like:

func TestNGT_ConcurrentGraphStatistics(t *testing.T) {
    n, err := New(/* options */)
    if err != nil {
        t.Fatal(err)
    }
    defer n.Close()

    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            ctx := context.Background()
            _, err := n.GetGraphStatistics(ctx)
            if err != nil {
                t.Error(err)
            }
        }()
    }
    wg.Wait()
}

337-340: Consistent initialization of synchronization fields across test cases.

The new fields are consistently initialized across all test cases, which is good. However, consider adding assertions to verify that these fields are properly cleaned up in the Close() method.

Add cleanup verification in the test helper:

func verifyCleanup(t *testing.T, n *ngt) {
    // Verify mutex cleanup
    if n.smu != nil {
        t.Error("smu not properly cleaned up")
    }
    // Verify group cleanup
    if n.group != nil {
        t.Error("group not properly cleaned up")
    }
}

Also applies to: 407-410, 477-480, 547-550, 747-750, 798-801

internal/errors/errors.go (3)

Line range hint 168-172: Avoid direct comparison of error messages

Comparing errors using err.Error() == target.Error() and strings.EqualFold(err.Error(), target.Error()) can be fragile since error messages might change and are not unique identifiers. Relying on error message strings for error comparison may lead to incorrect behavior if the error messages are modified.

Consider using more robust methods for error comparison:

  • Use sentinel errors (predefined exported variables) for direct comparison.
  • Implement the Is method for custom error types to enable proper comparison without relying on error messages.
  • Utilize type assertions or error wrapping to compare error types.

191-193: Clarify logical expressions with parentheses

The return statement contains a complex logical expression combining && and ||. Adding parentheses will improve readability and ensure the correct evaluation order, preventing potential misunderstandings due to operator precedence.

Apply this diff to clarify the logical expression:

-return targetComparable && err == target ||
-	err.Error() == target.Error() ||
-	strings.EqualFold(err.Error(), target.Error())
+return (targetComparable && err == target) ||
+	err.Error() == target.Error() ||
+	strings.EqualFold(err.Error(), target.Error())

195-197: Clarify logical expressions with parentheses

Similarly, the return statement here would benefit from added parentheses for better readability and to prevent logical errors.

Apply this diff to improve clarity:

-return targetComparable && err == target ||
-	err.Error() == target.Error() ||
-	strings.EqualFold(err.Error(), target.Error())
+return (targetComparable && err == target) ||
+	err.Error() == target.Error() ||
+	strings.EqualFold(err.Error(), target.Error())
internal/core/algorithm/ngt/ngt.go (3)

Line range hint 1099-1135: Use safer practices when handling C arrays to prevent potential overflows

In fromCGraphStatistics, the use of fixed-size arrays like [1 << 30] when converting C arrays to Go slices could be unsafe and might lead to undefined behavior if the array bounds are exceeded. Consider using unsafe.Slice (available from Go 1.17) to safely handle C arrays.

Apply the following changes to safely handle the C arrays:

 func fromCGraphStatistics(cstats *C.NGTGraphStatistics) *GraphStatistics {
 	if cstats == nil {
 		return nil
 	}
 	goStats := &GraphStatistics{
 		// Field assignments...
 	}

 	// Convert indegreeCount
 	indegreeCountSize := int(cstats.indegreeCountSize)
-	goStats.IndegreeCount = make([]int64, indegreeCountSize)
-	cIndegreeCount := (*[1 << 30]C.size_t)(unsafe.Pointer(cstats.indegreeCount))[:indegreeCountSize:indegreeCountSize]
-	for i := 0; i < indegreeCountSize; i++ {
-		goStats.IndegreeCount[i] = int64(cIndegreeCount[i])
-	}
+	cIndegreeCount := unsafe.Slice(cstats.indegreeCount, indegreeCountSize)
+	goStats.IndegreeCount = make([]int64, indegreeCountSize)
+	for i := 0; i < indegreeCountSize; i++ {
+		goStats.IndegreeCount[i] = int64(cIndegreeCount[i])
+	}

 	// Similar changes for outdegreeHistogram and indegreeHistogram...

 	return goStats
 }

This refactor ensures that array bounds are respected, preventing potential overflows and making the code safer.


486-487: Initialize smu and group fields safely in the constructor

The initialization of n.smu and n.group should be thread-safe and ideally placed in the constructor to ensure they are initialized before use. Ensure that these fields are properly initialized before any method that uses them is called.

Consider adjusting the initialization order or adding comments to confirm that all usages of n.smu and n.group are safe.


119-122: Clarify the necessity of both mu and smu mutexes

The struct ngt now includes both mu *sync.RWMutex and smu *sync.Mutex. Review whether both mutexes are necessary or if they can be consolidated to reduce complexity and potential contention.

Consider whether smu can be replaced with mu if they protect similar critical sections, or document clearly why both are required to help maintainers understand the locking strategy.

pkg/agent/core/ngt/service/ngt.go (1)

1456-1470: Simplify loadStatistics by removing unnecessary goroutine

The use of a goroutine in loadStatistics might be unnecessary since the function waits for it to complete immediately. Consider simplifying the code by removing the goroutine unless asynchronous execution is required.

Here's the refactored code:

 func (n *ngt) loadStatistics(ctx context.Context) (err error) {
     if n.IsStatisticsEnabled() {
         log.Info("loading index statistics to cache")
-        var stats *core.GraphStatistics
-        done := make(chan struct{})
-        n.eg.Go(safety.RecoverFunc(func() error {
-            defer close(done)
-            stats, err = n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
-            return nil
-        }))
-        select {
-        case <-ctx.Done():
-            return ctx.Err()
-        case <-done:
-        }
+        stats, err := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
+        if err != nil {
+            log.Errorf("failed to load index statistics to cache: %v", err)
+            return err
+        }
         n.statisticsCache.Store(&payload.Info_Index_Statistics{
             Valid:                            stats.Valid,
             MedianIndegree:                   stats.MedianIndegree,
             // ... (remaining fields)
         })
     }
     return nil
 }
.gitfiles (4)

92-92: Ensure Consistency in Workflow Naming Convention

The new workflow file .github/workflows/dockers-example-client-image.yaml has been added. Please verify that the naming convention aligns with existing workflow files for consistency (e.g., using hyphens vs. underscores).


543-543: Optimize the New Dockerfile for the Example Client

The addition of dockers/example/client/Dockerfile is noted. Please ensure it follows Docker best practices to optimize image size and build performance, such as leveraging caching layers and minimizing the number of layers.


846-846: Consistency in Configuration File Naming

The file internal/config/index_deleter.go has been added. Consider renaming it to index_deletion.go to maintain consistency with the naming of other components in the index job directory.


1464-1465: Ensure Kubernetes Configurations Follow Best Practices

New Kubernetes configuration files k8s/index/job/deletion/configmap.yaml and cronjob.yaml have been introduced. Please review these files to ensure they adhere to Kubernetes best practices, such as specifying resource limits and proper labeling for easier management.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 63f0478 and 98ca44a.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (77)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (19 hunks)
  • hack/go.mod.default (1 hunks)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/gache_test.go (6 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/gache/option_test.go (2 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/cache/option_test.go (3 hunks)
  • internal/client/v1/client/agent/core/client.go (3 hunks)
  • internal/core/algorithm/ngt/ngt.go (12 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (16 hunks)
  • internal/core/algorithm/usearch/usearch_test.go (1 hunks)
  • internal/errors/errors.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
  • internal/safety/safety.go (2 hunks)
  • internal/sync/singleflight/singleflight.go (2 hunks)
  • internal/sync/singleflight/singleflight_test.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • pkg/manager/index/service/indexer.go (1 hunks)
  • rust/rust-toolchain (1 hunks)
  • rust/rust-toolchain.toml (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/GOLANGCILINT_VERSION (1 hunks)
  • versions/GO_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KIND_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/KUBELINTER_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_ANALYZE (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_INIT (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (53)
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • pkg/manager/index/service/indexer.go
  • rust/rust-toolchain
  • rust/rust-toolchain.toml
  • versions/BUF_VERSION
  • versions/GOLANGCILINT_VERSION
  • versions/GO_VERSION
  • versions/K3S_VERSION
  • versions/KIND_VERSION
  • versions/KUBECTL_VERSION
  • versions/KUBELINTER_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
  • versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
  • versions/actions/GITHUB_CODEQL_ACTION_INIT
  • versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (43)
internal/cache/cache.go (1)

33-33: Approve the expiredHook signature change.

The modification to include the value V in the expiredHook signature is a good improvement as it provides access to the expired value during cleanup, which can help prevent memory-related issues like double-free.

Let's verify that this signature change is consistently implemented across the codebase:

✅ Verification successful

Let me verify the usage of expiredHook to ensure it's being called correctly:


Let me check one more thing to see how the hook is actually used in the underlying gache implementation:


The expiredHook signature change is consistently implemented

The verification shows that:

  • No old signatures (without value parameter) remain in the codebase
  • The new signature func(context.Context, string, V) is consistently used in both cache.go and gache.go
  • The hook is properly initialized and enabled in the gache implementation
  • Test files correctly use the new signature with any type
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the expiredHook signature is consistently updated across all related files

# Check for any remaining old signatures
echo "Checking for old expiredHook signatures..."
rg "expiredHook.*func\(context\.Context,\s*string\)" --type go

# Check the new signature implementation
echo "Checking new expiredHook signatures..."
rg "expiredHook.*func\(context\.Context,\s*string,.*\)" --type go

# Check for any direct usage of expiredHook
echo "Checking direct usage of expiredHook..."
ast-grep --pattern 'expiredHook($$$)'

Length of output: 1121


Script:

#!/bin/bash
# Check how expiredHook is called in the codebase
ast-grep --pattern 'if c.expiredHook != nil {
  $$$
}'

# Also check for any direct calls
ast-grep --pattern 'c.expiredHook($$$)'

Length of output: 591


Script:

#!/bin/bash
# Check the gache implementation for expired hook usage
rg "SetExpiredHook|EnableExpiredHook" -A 5 -B 5 internal/cache/gache/gache.go

# Also check the test cases to see the hook being triggered
rg "SetExpiredHook|EnableExpiredHook" -A 5 -B 5 internal/cache/gache/gache_test.go

Length of output: 738

internal/cache/gache/option.go (1)

Line range hint 45-51: Breaking change: Expired hook signature updated.

The modification to include the expired value in the hook callback is a good improvement that provides more context during expiration. However, this is a breaking change that requires updates to all existing expired hook implementations.

Let's verify the impact and implementation across the codebase:

✅ Verification successful

Let me gather more information about the implementation of the expired hook function to verify if all usages have been properly updated.


Breaking change has been properly implemented across the codebase

All usages of WithExpiredHook have been correctly updated to handle the value parameter:

  • The main implementation in internal/net/dialer.go has been updated with the correct signature func(ctx context.Context, addr string, _ *dialerCache)
  • Test implementations in gache_test.go use the correct signature with the value parameter
  • The cache struct definition in gache.go correctly defines expiredHook as func(context.Context, string, V)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for existing WithExpiredHook usage and implementation
# Expected: All implementations should be updated to handle the value parameter

# Find all direct usages of WithExpiredHook
echo "Checking WithExpiredHook usage:"
rg -A 3 "WithExpiredHook\["

# Find potential expired hook implementations
echo -e "\nChecking potential hook implementations:"
rg -A 5 "func.*context\.Context.*string.*\).*{" --type go

# Check test files for hook usage
echo -e "\nChecking test implementations:"
rg -A 3 "WithExpiredHook" --type go --glob "*_test.go"

Length of output: 3756


Script:

#!/bin/bash
# Check the implementation of cacheExpireHook in dialer.go
echo "Checking dialer.go cacheExpireHook implementation:"
rg -B 2 -A 5 "cacheExpireHook" internal/net/dialer.go

# Check the expiredHook implementation in test files
echo -e "\nChecking expiredHook test implementation:"
rg -B 2 -A 5 "expiredHook.*=.*func" internal/cache/gache/gache_test.go

# Check the cache struct definition to verify hook type
echo -e "\nChecking cache struct definition:"
rg -A 5 "type\s+cache\[" internal/cache/gache/gache.go

Length of output: 1471

example/client/go.mod (2)

32-41: Review indirect dependency updates

Several indirect dependencies have been updated. While these are typically handled automatically by Go modules, in a backport scenario we should be cautious about introducing new versions.

Let's verify if any of these indirect dependencies are critical for the fix:

#!/bin/bash
# Description: Check for any breaking changes in the updated indirect dependencies
# that might affect the GetGraphStatistics functionality

# Search for GetGraphStatistics usage with these packages
rg -l "GetGraphStatistics" --type go | xargs rg -l "google.golang.org/genproto|google.golang.org/grpc|golang.org/x/sys"

10-17: Verify the necessity of dependency updates for backport

The extensive updates to multiple dependencies seem broader than typically needed for a backport fix addressing a specific double-free error:

  • Multiple security-related packages (crypto, net, text)
  • gRPC and protobuf-related packages

While keeping dependencies up-to-date is good practice, extensive updates in a backport could introduce unnecessary risks.

Let's verify if these updates are related to the double-free fix:

✅ Verification successful

Dependency updates are necessary for this backport

The git history shows these dependency updates are part of PR #2733 which fixes the GetGraphStatistics API double-free error. The changes include:

  • Updating vald-client-go from v1.7.13 to v1.7.14
  • Updating gRPC and related genproto packages
  • Updating core Go packages (crypto, net, text) which are indirect dependencies

The extensive usage of gRPC and genproto packages throughout the codebase (50+ files) indicates these updates are integral to the fix, as they affect the core communication layer where the double-free issue likely originates.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the original PR (#2733) modified go.mod with similar changes
# and look for any GetGraphStatistics API related changes

# Check the original PR
gh pr view 2733 --json files -q '.files[].path' | grep -E 'go\.mod|GetGraphStatistics'

# Look for GetGraphStatistics usage in the codebase
rg -l "GetGraphStatistics" 

Length of output: 213


Script:

#!/bin/bash
# Let's try a different approach to verify the dependency changes

# 1. Check the current go.mod requirements
echo "=== Current go.mod requirements ==="
cat example/client/go.mod

# 2. Look for any usage of the updated packages in the codebase
echo -e "\n=== Usage of updated packages ==="
rg -l "google.golang.org/grpc" --type go
rg -l "golang.org/x/crypto" --type go
rg -l "golang.org/x/net" --type go
rg -l "golang.org/x/text" --type go
rg -l "google.golang.org/genproto" --type go
rg -l "github.com/pkg/sftp" --type go

# 3. Check git history for recent changes in go.mod
echo -e "\n=== Recent go.mod changes ==="
git log -p example/client/go.mod | head -n 50

Length of output: 8036

internal/cache/option.go (2)

Line range hint 39-45: Verify fix for GetGraphStatistics API double-free issue

While providing access to the cached value during expiration is a step towards preventing double-free issues, please clarify:

  1. How is this hook used in the GetGraphStatistics API?
  2. Are there complementary changes in other files that complete the fix?

Let's check for related changes:

#!/bin/bash
# Look for GetGraphStatistics implementation
ast-grep --pattern 'func $_($$) GetGraphStatistics($$$) {
  $$$
}'

# Find cache usage in related files
rg -A 5 "GetGraphStatistics.*cache"

Line range hint 39-45: Breaking Change: WithExpiredHook signature updated to include cached value

The function now provides access to the cached value during expiration, which could help prevent double-free issues by allowing proper cleanup in the expiration hook. However, this is a breaking change that requires updates to all existing hook implementations.

Let's verify the impact of this change:

Consider adding migration guidance in the PR description to help users update their hook implementations. Example migration:

- func myHook(ctx context.Context, key string) {
-   // old implementation
- }
+ func myHook[V any](ctx context.Context, key string, value V) {
+   // new implementation with access to the expired value
+ }
✅ Verification successful

Let me gather more information about the hook implementations to understand the full impact.


Breaking change impact is minimal and already handled

The signature change in WithExpiredHook has limited impact and all existing implementations have been properly updated:

  • The only production usage is in internal/net/dialer.go which already implements the new signature with cacheExpireHook(ctx context.Context, addr string, _ *dialerCache)
  • Test implementations in internal/cache/gache/gache_test.go are also updated to match the new signature
  • No other usages of WithExpiredHook were found in the codebase

The change appears to be part of a coordinated update where all callers were modified together with the interface change.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all usages of WithExpiredHook to identify affected code
rg -A 2 "WithExpiredHook\[.*\]" 

# Find potential hook function declarations that need updating
ast-grep --pattern 'func $_(context.Context, string)' 

Length of output: 1373


Script:

#!/bin/bash
# Find the implementation of cacheExpireHook in dialer.go
rg -A 5 "cacheExpireHook" internal/net/dialer.go

# Find all hook function implementations in test files
rg -A 5 "expiredHook.*=.*func" internal/cache/

Length of output: 964

internal/safety/safety.go (3)

22-22: LGTM: Import addition is appropriate.

The addition of runtime/debug import is necessary for the new stack trace functionality and follows Go conventions.


40-42: LGTM: Essential nil check addition.

The addition of the nil check is a critical safety improvement that prevents potential panic from nil pointer dereference. The early return with nil is appropriate in this context.


45-46: Consider optimizing string concatenation for large stack traces.

While the addition of stack traces greatly improves debugging capabilities, consider optimizing the string concatenation to handle large stack traces more efficiently. Consider using strings.Builder or bytes.Buffer for better memory efficiency.

Here's an optimized version:

-				ds := debug.Stack()
-				infoStr := info.Get().String() + "\r\n" + conv.Btoa(ds)
+				var sb strings.Builder
+				sb.WriteString(info.Get().String())
+				sb.WriteString("\r\n")
+				sb.WriteString(conv.Btoa(debug.Stack()))
+				infoStr := sb.String()

Let's check if similar string concatenation patterns exist elsewhere in the codebase:

internal/cache/gache/gache.go (1)

32-32: LGTM! This change helps prevent double-free issues.

The addition of the value parameter V to the expiredHook signature ensures that the expired value is safely accessible within the hook, preventing potential access to freed memory. This change aligns well with the PR's objective of resolving the double-free error in the GetGraphStatistics API.

dockers/index/job/deletion/Dockerfile (2)

85-85: Verify the implications of using sample.yaml as default config.

Adding a default config file could potentially:

  1. Override custom configurations if the path is hardcoded
  2. Cause issues if the sample config contains environment-specific settings

Let's verify these concerns:

✅ Verification successful

Using sample.yaml as default config is a standard practice in this codebase

The verification reveals:

  1. The /etc/server/config.yaml path is a standardized configuration location used across all services in the codebase, as evidenced by its usage in multiple Dockerfiles and the default path in internal/params/option.go
  2. The sample config contains only non-sensitive, template configuration data with proper defaults
  3. The configuration is auto-generated and follows a consistent pattern across all services, as shown in hack/docker/gen/main.go
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if config path is hardcoded in code
rg -A 3 "/etc/server/config.yaml"

# Check sample config content for sensitive data
cat cmd/index/job/deletion/sample.yaml || echo "Sample config not found in current branch"

# Check if other services use the same config path
fd -t f "Dockerfile" -x grep -l "/etc/server/config.yaml" {}

Length of output: 12477


85-85: Clarify the relationship with double-free fix.

According to the PR objectives, this change is part of a backport to fix a double-free error in GetGraphStatistics API. However, it's unclear how adding a default configuration file helps resolve memory management issues.

Let's check the original PR for context:

dockers/dev/Dockerfile (2)

148-148: ⚠️ Potential issue

Remove duplicate USER directive.

This USER directive is redundant as it's already set to root:root at the beginning of the Dockerfile (line 25). Additionally, running containers as root is generally discouraged from a security perspective.

Let's verify if this is a duplicate directive:

#!/bin/bash
# Check for existing USER directives in the Dockerfile
rg "^USER root:root" dockers/dev/Dockerfile

Consider either:

  1. Removing this duplicate directive:
-USER root:root
  1. Or if root access is not strictly required, switching to a non-root user for better security:
-USER root:root
+USER nonroot:nonroot

Line range hint 1-148: Verify relevance to double-free bug fix.

This Dockerfile change appears unrelated to the stated PR objective of fixing the GetGraphStatistics API double-free error. Please clarify if this modification is necessary for the backport.

Let's check if this file was modified in the original PR:

internal/client/v1/client/agent/core/client.go (4)

108-108: LGTM: CreateIndex client creation standardization.

The change to use NewAgentClient wrapper is consistent with the standardization pattern.


125-125: LGTM: SaveIndex client creation standardization.

The change to use NewAgentClient wrapper is consistent with the standardization pattern.


142-142: LGTM: CreateAndSaveIndex client creation standardization.

The change to use NewAgentClient wrapper is consistent with the standardization pattern.


108-108: Verify the relationship between client creation changes and double-free fix.

The changes standardize client creation by using NewAgentClient wrapper instead of direct agent.NewAgentClient. While this is a good practice for consistency, it's not immediately clear how these changes relate to resolving the double-free error in GetGraphStatistics API mentioned in the PR objective.

Let's verify the connection:

Also applies to: 125-125, 142-142

internal/sync/singleflight/singleflight.go (2)

107-109: LGTM: Early map initialization improves thread safety.

The change to initialize the map during group creation rather than lazily is a good improvement that eliminates potential race conditions and simplifies the code.

Let's verify that all map accesses are properly protected by mutex:

✅ Verification successful

After analyzing the file content, I can now provide the final response:

Early map initialization is safe and properly synchronized

The code review confirms that all map accesses are properly protected by mutex locks:

  • All map operations are guarded by g.mu.Lock() and g.mu.Unlock()
  • Key operations are protected:
    • Read access in Do and DoChan methods (lines 121, 150)
    • Write access in Do and DoChan methods (lines 134, 158)
    • Delete operations in Forget and ForgetUnshared methods
  • The early initialization eliminates the need for lazy initialization checks while maintaining thread safety
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all map accesses are protected by mutex locks
# Test: Search for map accesses and ensure they're within mutex-protected blocks

ast-grep --pattern 'g.m[$_]' | while read -r line; do
  # Get 5 lines before each map access to check for mutex lock
  echo "$line" | xargs -I {} rg -B 5 "{}"
done

Length of output: 1391


Script:

#!/bin/bash
# Let's try a different approach to verify mutex protection
# First, get the content of the file to analyze map access patterns

cat internal/sync/singleflight/singleflight.go

Length of output: 7804


248-266: Consider potential race condition in ForgetUnshared.

While the implementation is generally sound, there's a subtle race condition to consider. Even though the mutex protects the map access, the dups field check and map deletion aren't atomic. A new duplicate could arrive between checking dups == 0 and deleting the entry.

Consider the following sequence:

  1. T1: ForgetUnshared checks dups == 0
  2. T2: New caller arrives, increments dups
  3. T1: Deletes the entry despite it now being shared

Let's verify the usage pattern to assess the impact:

Consider these options to address the race:

  1. Use atomic operations for the dups field
  2. Add a "marked for deletion" state to prevent new duplicates
  3. Document the race condition if it's acceptable for your use case
internal/cache/option_test.go (2)

33-33: LGTM: Function signature update aligns with the double-free fix.

The addition of the any parameter to the expired hook function signature provides more context during cache expiration, which is essential for proper resource cleanup.


Line range hint 54-96: LGTM: Test cases properly updated for new signature.

The test cases are well-structured and cover both nil and non-nil function scenarios. The simplified option application without explicit type parameter is a good improvement.

internal/cache/gache/option_test.go (1)

173-173: LGTM: Function signature enhancement improves memory management.

The addition of the cached value parameter to the expired hook callback is a good improvement. This change allows proper cleanup of resources when cache entries expire, which helps prevent memory leaks and double-free issues.

internal/net/dialer.go (1)

Line range hint 457-462: LGTM! Consider adding documentation for the ignored parameter.

The change to add an ignored cache parameter appears to fix the double-free issue by preventing access to potentially freed cache data. However, it would be helpful to add a comment explaining why the parameter is intentionally ignored to prevent future confusion.

Consider adding a comment like:

 func (d *dialer) cacheExpireHook(ctx context.Context, addr string, _ *dialerCache) {
+  // Note: The cache parameter is intentionally ignored as we refresh the DNS entry
+  // by performing a new lookup instead of accessing the expired cache data
   if err := safety.RecoverFunc(func() (err error) {

Let's check if there are similar patterns in the codebase that might need the same fix:

✅ Verification successful

Based on the search results, I can now generate the final response:

No similar patterns requiring the same fix were found

The search results show that the cacheExpireHook pattern is properly implemented across the codebase:

  • The cache package defines a generic WithExpiredHook[V] that expects a function with the correct signature func(context.Context, string, V)
  • The dialer's cacheExpireHook is the only concrete implementation, and it's already been fixed in this PR
  • All other occurrences are either tests or the hook infrastructure itself

The original review comment's suggestion to add documentation for the ignored parameter is still valid.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other expire hook implementations that might have similar issues
rg -g '!vendor/*' -g '*.go' 'ExpiredHook|ExpireHook' -A 5

Length of output: 5270

internal/sync/singleflight/singleflight_test.go (1)

60-62: LGTM: Proper initialization of the group struct

The initialization follows Go's best practices for map initialization in a generic struct.

hack/go.mod.default (2)

332-341: LGTM! Consistent Kubernetes dependency updates.

The updates to k8s.io packages from v0.31.1 to v0.31.2 are patch version updates that maintain version consistency across all Kubernetes components. This is a good practice for ensuring compatibility and stability.


344-344: LGTM! Compatible controller-runtime and merge-diff updates.

The updates to controller-runtime (v0.19.1) and structured-merge-diff (v4.4.1) are compatible with the Kubernetes v0.31.x series. These changes align with the Kubernetes ecosystem's version compatibility matrix.

Let's verify the version compatibility:

Also applies to: 347-347

✅ Verification successful

Version compatibility verified between k8s.io v0.31.2 and controller-runtime v0.19.1

The verification confirms that:

  • All k8s.io dependencies are consistently set to v0.31.2 across go.mod and go.sum files
  • Controller-runtime is correctly set to v0.19.1 in both go.mod and go.sum
  • These versions are properly aligned in both the main go.mod and hack/go.mod.default files
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify version compatibility between k8s.io and controller-runtime
# Expected: All k8s.io dependencies should be at v0.31.2 and controller-runtime at v0.19.1

# Check k8s.io package versions
echo "Checking k8s.io package versions..."
rg "k8s.io/.+ v0.31.2"

# Check controller-runtime version
echo "Checking controller-runtime version..."
rg "controller-runtime v0.19.1"

Length of output: 2736

go.mod (3)

332-332: Kubernetes dependencies synchronized to v0.31.2.

All Kubernetes-related packages have been updated to v0.31.2. This is a synchronized update which is good practice. However, since this is a backport PR:

#!/bin/bash
# Description: Verify that v0.31.2 is compatible with the target release branch (v1.7)

# Check the Kubernetes version compatibility matrix in docs
rg -l "kubernetes.*compatibility|k8s.*compatibility" docs/

Also applies to: 333-333, 334-334, 335-335, 336-336, 414-414, 415-415, 417-417


297-297: Security-sensitive package updates included.

Several security-sensitive packages have been updated:

  • golang.org/x/crypto to v0.29.0
  • golang.org/x/net to v0.31.0
  • golang.org/x/oauth2 to v0.24.0

Please ensure these updates don't introduce any breaking changes while backporting.

#!/bin/bash
# Description: Check for any security-related changes or breaking changes in the updated packages

# Search for security-sensitive code that might be affected
rg -l "crypto|tls|ssh|x509|oauth" 

Also applies to: 400-400, 401-401


279-279: Verify OpenTelemetry updates impact on memory management.

All OpenTelemetry packages have been updated to v1.32.0. Since this PR aims to fix a double-free error, ensure that these updates don't introduce any memory management issues in the telemetry instrumentation.

Also applies to: 285-285, 286-286, 287-287, 288-288, 387-387, 391-391, 392-392, 393-393, 394-394

✅ Verification successful

The previous search wasn't specific enough. Let's focus on finding OpenTelemetry usage and any memory-related patterns in those files to better understand if the version update could affect memory management.


Based on the analysis of the shell script results, I can now provide a final response:

OpenTelemetry update from v1.30.0 to v1.32.0 is safe for memory management

The update to OpenTelemetry v1.32.0 is safe as:

  1. The codebase uses OpenTelemetry primarily for metrics and tracing instrumentation
  2. Memory-related operations are handled by Go's runtime and custom metrics collectors, not directly by OpenTelemetry
  3. The OpenTelemetry packages are used in a read-only manner for reporting metrics, not for memory management
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any memory management related changes in the OpenTelemetry changelog
# between the previous version and v1.32.0

# Search for memory-related changes in the codebase that use OpenTelemetry
rg -i "otel.*free|otel.*memory|otel.*leak" 

Length of output: 4727


Script:

#!/bin/bash
# Let's check the current version of OpenTelemetry packages in go.mod before the update
git show HEAD^:go.mod | grep -A 4 "opentelemetry.io/otel "

# And check if there are any changes in the usage of OpenTelemetry in the codebase
rg "go.opentelemetry.io/otel" -A 2 -B 2

# Also check for any memory-related function calls or patterns in files using OpenTelemetry
rg -l "go.opentelemetry.io/otel" | xargs rg -i "free|alloc|memory|gc|runtime"

Length of output: 40023

internal/core/algorithm/usearch/usearch_test.go (2)

213-213: Verify the change in expected distance calculation.

The expected distance for vector ID 10 has been changed from 3 to 0. This change in behavior needs verification:

  1. Is this change intentional and related to the double-free error fix?
  2. Does it reflect the correct distance calculation for a vector with math.MaxFloat32 as its last component?
#!/bin/bash
# Description: Search for other test cases or implementations that might be affected by this change in distance calculation

# Test: Look for other test cases using math.MaxFloat32
rg -A 5 "math.MaxFloat32" --type go

# Test: Look for distance calculation implementations
ast-grep --pattern 'func $_($$) float32 {
  $$$
}'

219-261: ⚠️ Potential issue

Reconsider removing test coverage.

The test case "return most accurate result after insert 10 vectors with limited size 5" has been commented out. This reduces test coverage for:

  • Accuracy of distance calculations
  • Result ordering with specific vector values
  • Search behavior with limit=5

Please clarify:

  1. Why was this test case removed?
  2. Is this behavior no longer valid after the double-free error fix?
  3. Is this test coverage provided by other test cases?
internal/net/dialer_test.go (2)

1519-1520: LGTM! Parameter addition aligns with method signature change.

The addition of the cache *dialerCache parameter to the args struct correctly reflects the updated method signature of cacheExpireHook.


1589-1589: LGTM! Test invocation updated to match new signature.

The test invocation has been properly updated to include the cache parameter, ensuring compatibility with the modified method signature.

internal/core/algorithm/ngt/ngt_test.go (2)

145-148: LGTM: Proper initialization of synchronization primitives.

The new fields smu and group are correctly initialized in the test struct:

  • smu is initialized as a new mutex for synchronizing graph statistics access
  • group is initialized for deduplicating concurrent statistics requests

1146-1146: Verify mutex initialization in error cases.

The test cases properly initialize the mutex fields but should also verify proper cleanup in error scenarios.

Let's verify the mutex initialization patterns:

Also applies to: 1172-1172, 1196-1196, 1232-1232, 1247-1247

internal/errors/errors.go (1)

159-164: Verify backward compatibility of the Is function signature

The Is function now returns a named boolean same. While naming the return value improves readability, please ensure this change does not affect any interfaces or callers that rely on the exact function signature. It's important to confirm that this modification maintains backward compatibility.

internal/core/algorithm/ngt/ngt.go (1)

90-90: Ensure all calls to GetGraphStatistics include the new context.Context parameter

The method GetGraphStatistics now requires a context.Context parameter. Please verify that all invocations of this method throughout the codebase have been updated to pass the appropriate context.

You can run the following script to locate any calls to GetGraphStatistics that may need updating:

This script searches for all instances where GetGraphStatistics is called, helping identify any that might be missing the new context parameter.

✅ Verification successful

All calls to GetGraphStatistics have been properly updated with context parameter

Based on the search results, there is only one call to GetGraphStatistics in the codebase outside of its definition, which is in pkg/agent/core/ngt/service/ngt.go. This call already includes the ctx parameter:

stats, err = n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all calls to `GetGraphStatistics` include the new `context.Context` parameter.

# Exclude the method definition itself
rg --type go --fixed-strings 'GetGraphStatistics(' -g '!ngt.go' --glob '!**/vendor/**'

Length of output: 88


Script:

#!/bin/bash
# Let's try a broader search to find any references to GetGraphStatistics
rg --type go "GetGraphStatistics" -g '!**/vendor/**' -A 2

Length of output: 991


Script:

#!/bin/bash
# Also search for interface implementations or method definitions
ast-grep --pattern 'interface {
  $$$
  GetGraphStatistics($$$) $$$
  $$$
}'

Length of output: 4021


Script:

#!/bin/bash
# Look for any struct types that might implement this method
ast-grep --pattern 'type $_ struct {
  $$$
}

func ($_ $_) GetGraphStatistics($$$) $$$'

Length of output: 89

pkg/agent/core/ngt/service/ngt.go (2)

1309-1309: Ensure proper error handling in RegenerateIndexes

At line 1309, the RegenerateIndexes method now returns the result of n.loadStatistics(ctx). Verify that any errors returned by n.loadStatistics(ctx) are appropriately handled by callers of RegenerateIndexes, so that the error flow remains consistent with the rest of the application.


1453-1453: Check error propagation in CreateIndex

At line 1453, n.loadStatistics(ctx) is now called with its error being returned by CreateIndex. Confirm that this change does not introduce unintended error behaviors in functions that call CreateIndex, ensuring that they handle the new error scenarios properly.

.gitfiles (3)

101-101: Confirm Correct Configuration of New Workflow

The workflow .github/workflows/dockers-index-deletion.yaml has been introduced. Ensure that it is properly configured and integrates seamlessly with the existing CI/CD pipelines.


307-308: Verify Changes to Protobuf Configuration Files

Updates to buf.lock and buf.yaml may affect protobuf generation. Confirm that these changes are intentional and that all protobuf code generation processes function correctly post-update.


551-551: Review Dockerfile Configuration for Index Deletion

The new Dockerfile dockers/index/job/deletion/Dockerfile has been added. Verify that it includes all necessary dependencies and configurations required for the index deletion job to run successfully.

@@ -23,20 +23,20 @@ replace (
require (
github.com/kpango/fuid v0.0.0-20221203053508-503b5ad89aa1
github.com/kpango/glg v1.6.14
github.com/vdaas/vald-client-go v1.7.13
github.com/vdaas/vald-client-go v1.7.14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Version mismatch with PR objectives

The PR objectives state this is a backport to v1.7.13, but this file requires github.com/vdaas/vald-client-go v1.7.14. This version mismatch could cause compatibility issues.

Please align the version with the backport target (v1.7.13) or provide justification for using a newer version.

@@ -129,7 +129,7 @@ func Test_cache_Start(t *testing.T) {
gache gache.Gache[any]
expireDur time.Duration
expireCheckDur time.Duration
expiredHook func(context.Context, string)
expiredHook func(context.Context, string, any)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add test cases for expired value handling.

While the expiredHook signature has been consistently updated across all test functions to include the expired value parameter, there are no test cases that verify the correct value is passed to the hook when an item expires. Consider adding test cases that:

  1. Set a value with expiration
  2. Wait for expiration
  3. Verify the expired value received by the hook matches the original value

Here's a suggested test case to add:

 func Test_cache_Start(t *testing.T) {
     // ... existing test cases ...
+    {
+        name: "Call Start with expiredHook receiving correct value",
+        args: args{
+            ctx: context.Background(),
+        },
+        fields: fields{
+            gache:          gache.New[any](),
+            expireDur:      100 * time.Millisecond,
+            expireCheckDur: 50 * time.Millisecond,
+            expiredHook: func(ctx context.Context, key string, val any) {
+                if key != "test-key" || val != "test-value" {
+                    t.Errorf("expiredHook received unexpected values: key=%v, val=%v", key, val)
+                }
+            },
+        },
+        beforeFunc: func(args args, c *cache[any]) {
+            c.Set("test-key", "test-value")
+            time.Sleep(200 * time.Millisecond)
+        },
+    },

Also applies to: 202-202, 302-302, 386-386, 490-490

Comment on lines +643 to +659
func TestDoTimeout(t *testing.T) {
g := New[string]()
start := time.Now()
v, _, err := g.Do(context.Background(), "key", func(context.Context) (string, error) {
time.Sleep(100 * time.Millisecond)
return "bar", nil
})
if err != nil {
t.Errorf("Do error: %v", err)
}
if v != "bar" {
t.Errorf("Do = %s; want %s", v, "bar")
}
if time.Since(start) < 100*time.Millisecond {
t.Errorf("Do executed too quickly; expected delay")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve test reliability and timeout handling

The current implementation has potential reliability issues:

  1. Hard-coded sleep duration could cause flaky tests on slower systems
  2. Direct time comparison could be unreliable
  3. Missing context timeout testing, which would be more appropriate for timeout scenarios

Consider this improved implementation:

 func TestDoTimeout(t *testing.T) {
     g := New[string]()
-    start := time.Now()
+    testDuration := 100 * time.Millisecond
+    tolerance := 20 * time.Millisecond
+
+    ctx, cancel := context.WithTimeout(context.Background(), testDuration * 2)
+    defer cancel()
+
+    start := time.Now()
     v, _, err := g.Do(context.Background(), "key", func(context.Context) (string, error) {
-        time.Sleep(100 * time.Millisecond)
+        time.Sleep(testDuration)
         return "bar", nil
     })
     if err != nil {
         t.Errorf("Do error: %v", err)
     }
     if v != "bar" {
         t.Errorf("Do = %s; want %s", v, "bar")
     }
-    if time.Since(start) < 100*time.Millisecond {
-        t.Errorf("Do executed too quickly; expected delay")
+    elapsed := time.Since(start)
+    if elapsed < testDuration-tolerance || elapsed > testDuration+tolerance {
+        t.Errorf("Expected execution time around %v (±%v), got %v", testDuration, tolerance, elapsed)
     }
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1162 to +1208
func (n *ngt) GetGraphStatistics(
ctx context.Context, m statisticsType,
) (stats *GraphStatistics, err error) {
return n.getGraphStatistics(ctx, m, 10)
}

func (n *ngt) getGraphStatistics(
ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
var shared bool
stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (stats *GraphStatistics, err error) {
n.smu.Lock()
defer n.smu.Unlock()
var mode rune
switch m {
case NormalStatistics:
mode = '-'
case AdditionalStatistics:
mode = 'a'
}
err = safety.RecoverFunc(func() (err error) {
ne := n.GetErrorBuffer()
cstats := C.ngt_get_graph_statistics(n.index, C.char(mode), C.size_t(n.ces), ne.err)
if !cstats.valid {
return n.newGoError(ne)
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
stats = fromCGraphStatistics(&cstats)
if stats == nil {
return errors.ErrNGTIndexStatisticsNotReady
}
return nil
})()
if err != nil {
return nil, err
}
return stats, nil
})
if err != nil {
if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
cnt--
return n.getGraphStatistics(ctx, m, cnt)
}
return nil, err
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
return fromCGraphStatistics(&cstats), nil
return stats, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor recursive retry logic in getGraphStatistics to prevent potential stack overflow

The recursive call in getGraphStatistics when err != nil and shared == true could lead to a stack overflow if the retry count (cnt) is large. Consider refactoring the retry mechanism to use an iterative approach to prevent potential stack growth and improve readability.

You can apply the following changes to refactor the recursive retries into a loop:

 func (n *ngt) getGraphStatistics(
 	ctx context.Context, m statisticsType, cnt int,
 ) (stats *GraphStatistics, err error) {
-	var shared bool
-	stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (stats *GraphStatistics, err error) {
-		// Function body...
-	})
-	if err != nil {
-		if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
-			cnt--
-			return n.getGraphStatistics(ctx, m, cnt)
-		}
-		return nil, err
-	}
-	return stats, nil
+	for cnt > 0 {
+		var shared bool
+		stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (*GraphStatistics, error) {
+			// Function body...
+		})
+		if err != nil {
+			if shared && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
+				cnt--
+				continue
+			}
+			return nil, err
+		}
+		return stats, nil
+	}
+	return nil, err
 }

This change replaces the recursive call with a loop that continues retries without increasing the call stack depth.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (n *ngt) GetGraphStatistics(
ctx context.Context, m statisticsType,
) (stats *GraphStatistics, err error) {
return n.getGraphStatistics(ctx, m, 10)
}
func (n *ngt) getGraphStatistics(
ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
var shared bool
stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (stats *GraphStatistics, err error) {
n.smu.Lock()
defer n.smu.Unlock()
var mode rune
switch m {
case NormalStatistics:
mode = '-'
case AdditionalStatistics:
mode = 'a'
}
err = safety.RecoverFunc(func() (err error) {
ne := n.GetErrorBuffer()
cstats := C.ngt_get_graph_statistics(n.index, C.char(mode), C.size_t(n.ces), ne.err)
if !cstats.valid {
return n.newGoError(ne)
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
stats = fromCGraphStatistics(&cstats)
if stats == nil {
return errors.ErrNGTIndexStatisticsNotReady
}
return nil
})()
if err != nil {
return nil, err
}
return stats, nil
})
if err != nil {
if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
cnt--
return n.getGraphStatistics(ctx, m, cnt)
}
return nil, err
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
return fromCGraphStatistics(&cstats), nil
return stats, nil
func (n *ngt) GetGraphStatistics(
ctx context.Context, m statisticsType,
) (stats *GraphStatistics, err error) {
return n.getGraphStatistics(ctx, m, 10)
}
func (n *ngt) getGraphStatistics(
ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
for cnt > 0 {
var shared bool
stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (*GraphStatistics, error) {
n.smu.Lock()
defer n.smu.Unlock()
var mode rune
switch m {
case NormalStatistics:
mode = '-'
case AdditionalStatistics:
mode = 'a'
}
err = safety.RecoverFunc(func() (err error) {
ne := n.GetErrorBuffer()
cstats := C.ngt_get_graph_statistics(n.index, C.char(mode), C.size_t(n.ces), ne.err)
if !cstats.valid {
return n.newGoError(ne)
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
stats = fromCGraphStatistics(&cstats)
if stats == nil {
return errors.ErrNGTIndexStatisticsNotReady
}
return nil
})()
if err != nil {
return nil, err
}
return stats, nil
})
if err != nil {
if shared && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
cnt--
continue
}
return nil, err
}
return stats, nil
}
return nil, err
}

Comment on lines +1456 to +1470
func (n *ngt) loadStatistics(ctx context.Context) (err error) {
if n.IsStatisticsEnabled() {
log.Info("loading index statistics to cache")
stats, err := n.core.GetGraphStatistics(core.AdditionalStatistics)
var stats *core.GraphStatistics
done := make(chan struct{})
n.eg.Go(safety.RecoverFunc(func() error {
defer close(done)
stats, err = n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
return nil
}))
select {
case <-ctx.Done():
return ctx.Err()
case <-done:
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential data race on err and stats variables

In the loadStatistics function (lines 1456-1470), the variables err and stats are accessed concurrently by the main goroutine and a child goroutine without proper synchronization, leading to a potential data race condition.

Apply this diff to safely handle err and stats using channels:

 func (n *ngt) loadStatistics(ctx context.Context) (err error) {
     if n.IsStatisticsEnabled() {
         log.Info("loading index statistics to cache")
-        var stats *core.GraphStatistics
-        done := make(chan struct{})
+        statsCh := make(chan *core.GraphStatistics, 1)
+        errCh := make(chan error, 1)
         n.eg.Go(safety.RecoverFunc(func() error {
-            defer close(done)
-            stats, err = n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
-            return nil
+            stats, e := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
+            if e != nil {
+                errCh <- e
+                return e
+            }
+            statsCh <- stats
+            return nil
         }))
         select {
         case <-ctx.Done():
             return ctx.Err()
-        case <-done:
+        case err = <-errCh:
+            return err
+        case stats = <-statsCh:
         }
-        if err != nil {
-            log.Errorf("failed to load index statistics to cache: %v", err)
-            return err
-        }
         n.statisticsCache.Store(&payload.Info_Index_Statistics{
             Valid:                            stats.Valid,
             MedianIndegree:                   stats.MedianIndegree,
             // ... (remaining fields)
         })
     }
     return nil
 }

Committable suggestion skipped: line range outside the PR's diff.

@@ -123,7 +125,7 @@
.github/workflows/semver-patch.yaml
.github/workflows/test-hack.yaml
.github/workflows/unit-test.yaml
.github/workflows/update-actions.yaml
.github/workflows/update-deps.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Update References After Renaming Workflow File

The workflow file has been renamed to .github/workflows/update-deps.yaml. Please check and update any references to the old filename to prevent broken links or misconfigurations.

Comment on lines +494 to +495
cmd/index/job/deletion/main.go
cmd/index/job/deletion/sample.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add Unit Tests for New Deletion Job

New files cmd/index/job/deletion/main.go and cmd/index/job/deletion/sample.yaml have been added for the index deletion job. Ensure that unit tests are created to cover this new functionality, maintaining code quality and preventing future regressions.

Would you like assistance in generating unit tests for the deletion job?

Comment on lines +1800 to +1803
pkg/index/job/deletion/config/config.go
pkg/index/job/deletion/service/deleter.go
pkg/index/job/deletion/service/options.go
pkg/index/job/deletion/usecase/deletion.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Implement Comprehensive Documentation and Testing

The addition of several files under pkg/index/job/deletion/ indicates significant new functionality. Ensure that:

  • All new code is thoroughly documented with clear comments.
  • Unit tests are implemented to achieve adequate test coverage.
  • The code complies with project standards and architectural guidelines.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (14)
internal/safety/safety.go (1)

40-46: Well-structured enhancement to panic recovery system

The changes form a comprehensive improvement to the panic recovery system by:

  1. Adding defensive programming with nil checks
  2. Enhancing debugging capabilities with stack traces
  3. Maintaining thread-safety and backward compatibility

This aligns well with the package's safety objectives and follows good error handling practices.

Consider documenting these panic recovery enhancements in the package documentation to help users understand the improved debugging capabilities.

dockers/index/job/deletion/Dockerfile (1)

85-85: Document configuration requirements.

Consider adding documentation that:

  1. Lists all required and optional configuration parameters
  2. Explains how to override the default configuration
  3. Provides examples of different configuration scenarios

Would you like me to help create a configuration documentation template?

internal/errors/errors.go (1)

Line range hint 166-199: Reduce code duplication in error comparison logic.

The identical comparison logic appears in two places:

  1. Lines 191-193 (case with multiple errors)
  2. Lines 195-197 (default case)

Consider extracting the common comparison logic into a helper function:

 func is(err, target error, targetComparable bool) (same bool) {
+    isEqual := func(err, target error, targetComparable bool) bool {
+        return targetComparable && err == target ||
+            err.Error() == target.Error() ||
+            strings.EqualFold(err.Error(), target.Error())
+    }
+
     for {
         if targetComparable && (err == target ||
             err.Error() == target.Error() ||
             strings.EqualFold(err.Error(), target.Error())) {
             return true
         }
         // ... rest of the code ...
-            return targetComparable && err == target ||
-                err.Error() == target.Error() ||
-                strings.EqualFold(err.Error(), target.Error())
+            return isEqual(err, target, targetComparable)
         default:
-            return targetComparable && err == target ||
-                err.Error() == target.Error() ||
-                strings.EqualFold(err.Error(), target.Error())
+            return isEqual(err, target, targetComparable)
     }
 }
internal/cache/option_test.go (1)

54-54: Consider adding test cases for expired hook value usage.

While the basic test case is updated correctly, consider adding test cases that verify the expired value is correctly passed to the hook function.

Example test case:

fn := func(ctx context.Context, key string, value any) {
    // Verify the expired value
    if value != expectedValue {
        t.Errorf("got value %v, want %v", value, expectedValue)
    }
}
internal/cache/gache/option_test.go (1)

Line range hint 351-436: Consider implementing or removing the commented-out test.

The Test_defaultOptions function is commented out with TODO markers. Consider either:

  1. Implementing the test if it's needed
  2. Removing the commented code if it's not planned for immediate implementation

This would improve code maintainability by reducing clutter.

Would you like me to help implement this test or create an issue to track this task?

internal/sync/singleflight/singleflight_test.go (2)

643-659: Consider improving test reliability and coverage.

While the test verifies basic timing behavior, consider these enhancements:

  1. Replace fixed sleep with a more reliable timing mechanism
  2. Add context timeout testing
  3. Verify the shared flag behavior

Here's a suggested improvement:

 func TestDoTimeout(t *testing.T) {
     g := New[string]()
+    ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
+    defer cancel()
     start := time.Now()
-    v, _, err := g.Do(context.Background(), "key", func(context.Context) (string, error) {
+    v, shared, err := g.Do(ctx, "key", func(ctx context.Context) (string, error) {
+        timer := time.NewTimer(100 * time.Millisecond)
+        select {
+        case <-ctx.Done():
+            return "", ctx.Err()
+        case <-timer.C:
+            return "bar", nil
+        }
-        time.Sleep(100 * time.Millisecond)
-        return "bar", nil
     })
     if err != nil {
         t.Errorf("Do error: %v", err)
     }
     if v != "bar" {
         t.Errorf("Do = %s; want %s", v, "bar")
     }
+    if shared {
+        t.Error("Expected shared to be false for first call")
+    }
     if time.Since(start) < 100*time.Millisecond {
         t.Errorf("Do executed too quickly; expected delay")
     }
 }

661-689: LGTM! Well-structured concurrent error handling test.

The test effectively verifies:

  • Error propagation across multiple goroutines
  • Duplicate call suppression (single execution)
  • Atomic operation usage for thread safety

Minor suggestion: Consider adding shared flag verification for subsequent calls to ensure complete behavior coverage.

 func TestDoMultipleErrors(t *testing.T) {
     g := New[string]()
     var calls int32
     someErr := errors.New("Some error")

     const n = 10
     var wg sync.WaitGroup
     for i := 0; i < n; i++ {
         wg.Add(1)
         go func() {
             defer wg.Done()
-            v, _, err := g.Do(context.Background(), "key", func(context.Context) (string, error) {
+            v, shared, err := g.Do(context.Background(), "key", func(context.Context) (string, error) {
                 atomic.AddInt32(&calls, 1)
                 time.Sleep(10 * time.Millisecond)
                 return "", someErr
             })
             if err != someErr {
                 t.Errorf("Do error = %v; want %v", err, someErr)
             }
             if v != "" {
                 t.Errorf("Do = %v; want empty string", v)
             }
+            if i > 0 && !shared {
+                t.Errorf("Expected shared to be true for subsequent calls")
+            }
         }()
     }
     wg.Wait()
     if got := atomic.LoadInt32(&calls); got != 1 {
         t.Errorf("number of calls = %d; want 1", got)
     }
 }
internal/core/algorithm/ngt/ngt_test.go (4)

145-148: Consider adding test cases for concurrent access patterns.

The new fields smu and group are added for synchronization, but there are no test cases specifically verifying their concurrent behavior. Consider adding test cases that:

  • Verify mutex locking/unlocking patterns
  • Test concurrent access to graph statistics
  • Validate singleflight deduplication

47-47: Consider adding edge case tests for graph statistics.

The group field is added to ignored fields in ngtComparator, but consider adding test cases for:

  • Graph statistics with empty/nil results
  • Error conditions in graph statistics retrieval
  • Race conditions in concurrent statistics access

145-148: Consider documenting mutex responsibilities.

The code uses multiple mutexes (mu, cmu, smu) and a singleflight group. Consider:

  • Adding comments explaining the responsibility of each mutex
  • Documenting potential deadlock scenarios
  • Explaining the interaction between mutexes and singleflight

Also applies to: 175-178, 204-207, 337-340, 407-410, 477-480, 547-550, 747-750, 798-801


1146-1146: Consider adding specific error cases for synchronization failures.

The test cases could be enhanced to cover:

  • Mutex deadlock scenarios
  • Singleflight cancellation cases
  • Context cancellation during graph statistics retrieval

Also applies to: 1172-1172, 1196-1196, 1232-1232, 1247-1247

internal/core/algorithm/ngt/ngt.go (2)

1162-1166: Hardcoded retry count in 'GetGraphStatistics' may reduce flexibility

The method GetGraphStatistics calls getGraphStatistics with a hardcoded retry count of 10. This may limit the ability to adjust retry behavior based on different conditions or configurations.

Suggestion:

Consider making the retry count a configurable parameter or constant. This would increase flexibility and allow for easier adjustments in the future without modifying the method signature repeatedly.

-func (n *ngt) GetGraphStatistics(
-    ctx context.Context, m statisticsType,
-) (stats *GraphStatistics, err error) {
-    return n.getGraphStatistics(ctx, m, 10)
+const defaultRetryCount = 10
+
+func (n *ngt) GetGraphStatistics(
+    ctx context.Context, m statisticsType,
+) (stats *GraphStatistics, err error) {
+    return n.getGraphStatistics(ctx, m, defaultRetryCount)

1162-1208: Potential for stack overflow due to recursion in 'getGraphStatistics'

The recursive calls in getGraphStatistics could lead to a stack overflow if the retry count is large or if errors persist. Recursion might also make the code harder to maintain and debug.

Suggestion:

Refactor the recursive logic into a loop to handle retries iteratively. This approach prevents potential stack overflows and improves code readability.

func (n *ngt) getGraphStatistics(
    ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
    for cnt > 0 {
        var shared bool
        stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (*GraphStatistics, error) {
            n.smu.Lock()
            defer n.smu.Unlock()
            // ... (rest of the logic)
        })
        if err == nil || errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
            break
        }
        cnt--
    }
    return stats, err
}
pkg/agent/core/ngt/service/ngt.go (1)

Line range hint 1456-1488: Fix potential data race when accessing err outside the goroutine

In the loadStatistics method, the variable err is assigned inside a goroutine and accessed outside without proper synchronization. This may lead to a data race condition because err could be concurrently accessed by multiple goroutines.

Suggested Fix: Use an error channel to safely retrieve err from the goroutine

Modify the code to use a buffered error channel for passing the error from the goroutine to the main function safely.

func (n *ngt) loadStatistics(ctx context.Context) (err error) {
    if n.IsStatisticsEnabled() {
        log.Info("loading index statistics to cache")
        var stats *core.GraphStatistics
-       done := make(chan struct{})
+       errCh := make(chan error, 1)
        n.eg.Go(safety.RecoverFunc(func() error {
-           defer close(done)
+           defer close(errCh)
            stats, e := n.core.GetGraphStatistics(ctx, core.AdditionalStatistics)
+           errCh <- e
            return nil
        }))
        select {
        case <-ctx.Done():
            return ctx.Err()
-       case <-done:
+       case err = <-errCh:
        }
        if err != nil {
            log.Errorf("failed to load index statistics to cache: %v", err)
            return err
        }
        n.statisticsCache.Store(&payload.Info_Index_Statistics{
            // ...
        })
    }
    return nil
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 63f0478 and 98ca44a.

⛔ Files ignored due to path filters (3)
  • example/client/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (77)
  • .gitfiles (9 hunks)
  • .github/ISSUE_TEMPLATE/bug_report.md (1 hunks)
  • .github/ISSUE_TEMPLATE/security_issue_report.md (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • dockers/agent/core/agent/Dockerfile (1 hunks)
  • dockers/agent/core/faiss/Dockerfile (1 hunks)
  • dockers/agent/core/ngt/Dockerfile (1 hunks)
  • dockers/agent/sidecar/Dockerfile (1 hunks)
  • dockers/binfmt/Dockerfile (1 hunks)
  • dockers/buildbase/Dockerfile (1 hunks)
  • dockers/buildkit/Dockerfile (1 hunks)
  • dockers/buildkit/syft/scanner/Dockerfile (1 hunks)
  • dockers/ci/base/Dockerfile (1 hunks)
  • dockers/dev/Dockerfile (1 hunks)
  • dockers/discoverer/k8s/Dockerfile (1 hunks)
  • dockers/example/client/Dockerfile (1 hunks)
  • dockers/gateway/filter/Dockerfile (1 hunks)
  • dockers/gateway/lb/Dockerfile (1 hunks)
  • dockers/gateway/mirror/Dockerfile (1 hunks)
  • dockers/index/job/correction/Dockerfile (1 hunks)
  • dockers/index/job/creation/Dockerfile (1 hunks)
  • dockers/index/job/deletion/Dockerfile (1 hunks)
  • dockers/index/job/readreplica/rotate/Dockerfile (1 hunks)
  • dockers/index/job/save/Dockerfile (1 hunks)
  • dockers/index/operator/Dockerfile (1 hunks)
  • dockers/manager/index/Dockerfile (1 hunks)
  • dockers/operator/helm/Dockerfile (1 hunks)
  • dockers/tools/benchmark/job/Dockerfile (1 hunks)
  • dockers/tools/benchmark/operator/Dockerfile (1 hunks)
  • dockers/tools/cli/loadtest/Dockerfile (1 hunks)
  • example/client/go.mod (2 hunks)
  • go.mod (19 hunks)
  • hack/go.mod.default (1 hunks)
  • internal/cache/cache.go (1 hunks)
  • internal/cache/gache/gache.go (1 hunks)
  • internal/cache/gache/gache_test.go (6 hunks)
  • internal/cache/gache/option.go (1 hunks)
  • internal/cache/gache/option_test.go (2 hunks)
  • internal/cache/option.go (1 hunks)
  • internal/cache/option_test.go (3 hunks)
  • internal/client/v1/client/agent/core/client.go (3 hunks)
  • internal/core/algorithm/ngt/ngt.go (12 hunks)
  • internal/core/algorithm/ngt/ngt_test.go (16 hunks)
  • internal/core/algorithm/usearch/usearch_test.go (1 hunks)
  • internal/errors/errors.go (2 hunks)
  • internal/net/dialer.go (1 hunks)
  • internal/net/dialer_test.go (2 hunks)
  • internal/safety/safety.go (2 hunks)
  • internal/sync/singleflight/singleflight.go (2 hunks)
  • internal/sync/singleflight/singleflight_test.go (2 hunks)
  • pkg/agent/core/ngt/service/ngt.go (2 hunks)
  • pkg/manager/index/service/indexer.go (1 hunks)
  • rust/rust-toolchain (1 hunks)
  • rust/rust-toolchain.toml (1 hunks)
  • versions/BUF_VERSION (1 hunks)
  • versions/GOLANGCILINT_VERSION (1 hunks)
  • versions/GO_VERSION (1 hunks)
  • versions/K3S_VERSION (1 hunks)
  • versions/KIND_VERSION (1 hunks)
  • versions/KUBECTL_VERSION (1 hunks)
  • versions/KUBELINTER_VERSION (1 hunks)
  • versions/NGT_VERSION (1 hunks)
  • versions/OPERATOR_SDK_VERSION (1 hunks)
  • versions/PROMETHEUS_STACK_VERSION (1 hunks)
  • versions/PROTOBUF_VERSION (1 hunks)
  • versions/RUST_VERSION (1 hunks)
  • versions/TELEPRESENCE_VERSION (1 hunks)
  • versions/USEARCH_VERSION (1 hunks)
  • versions/actions/ACTIONS_SETUP_GO (1 hunks)
  • versions/actions/ACTIONS_SETUP_NODE (1 hunks)
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_ANALYZE (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_INIT (1 hunks)
  • versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF (1 hunks)
  • versions/actions/GITHUB_ISSUE_METRICS (1 hunks)
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE (1 hunks)
✅ Files skipped from review due to trivial changes (52)
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/security_issue_report.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • dockers/agent/core/agent/Dockerfile
  • dockers/agent/core/faiss/Dockerfile
  • dockers/agent/core/ngt/Dockerfile
  • dockers/agent/sidecar/Dockerfile
  • dockers/binfmt/Dockerfile
  • dockers/buildbase/Dockerfile
  • dockers/buildkit/Dockerfile
  • dockers/buildkit/syft/scanner/Dockerfile
  • dockers/ci/base/Dockerfile
  • dockers/discoverer/k8s/Dockerfile
  • dockers/example/client/Dockerfile
  • dockers/gateway/filter/Dockerfile
  • dockers/gateway/lb/Dockerfile
  • dockers/gateway/mirror/Dockerfile
  • dockers/index/job/correction/Dockerfile
  • dockers/index/job/creation/Dockerfile
  • dockers/index/job/readreplica/rotate/Dockerfile
  • dockers/index/job/save/Dockerfile
  • dockers/index/operator/Dockerfile
  • dockers/manager/index/Dockerfile
  • dockers/operator/helm/Dockerfile
  • dockers/tools/benchmark/job/Dockerfile
  • dockers/tools/benchmark/operator/Dockerfile
  • dockers/tools/cli/loadtest/Dockerfile
  • rust/rust-toolchain
  • rust/rust-toolchain.toml
  • versions/BUF_VERSION
  • versions/GOLANGCILINT_VERSION
  • versions/GO_VERSION
  • versions/K3S_VERSION
  • versions/KIND_VERSION
  • versions/KUBECTL_VERSION
  • versions/KUBELINTER_VERSION
  • versions/NGT_VERSION
  • versions/OPERATOR_SDK_VERSION
  • versions/PROMETHEUS_STACK_VERSION
  • versions/PROTOBUF_VERSION
  • versions/RUST_VERSION
  • versions/TELEPRESENCE_VERSION
  • versions/USEARCH_VERSION
  • versions/actions/ACTIONS_SETUP_GO
  • versions/actions/ACTIONS_SETUP_NODE
  • versions/actions/CRAZY_MAX_GHACTION_IMPORT_GPG
  • versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
  • versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
  • versions/actions/GITHUB_CODEQL_ACTION_INIT
  • versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
  • versions/actions/GITHUB_ISSUE_METRICS
  • versions/actions/SOFTPROPS_ACTION_GH_RELEASE
🔇 Additional comments (46)
internal/cache/cache.go (2)

Line range hint 37-52: Implementation maintains type safety and consistency.

The changes to New function properly propagate the enhanced expiredHook signature while maintaining the generic type constraints and error handling. The implementation correctly uses the generic type parameter V throughout the cache configuration.


33-33: Enhancement to expiredHook signature improves cleanup capabilities.

The addition of the generic value parameter V to the expiredHook signature provides access to the cached value during expiration, which can help prevent potential memory issues like double-free errors by enabling proper cleanup procedures.

Let's verify the consistency of this change across related files:

✅ Verification successful

The verification results show that:

  1. No old signatures (without the value parameter) were found in the cache package
  2. The gache package consistently uses the updated signature func(context.Context, string, V) across all files
  3. All test files properly reflect this signature change

Enhancement to expiredHook signature is consistently implemented

The addition of the generic value parameter V to the expiredHook signature has been properly propagated across all cache-related files. The implementation in both the main cache package and the underlying gache package consistently use the updated signature, ensuring proper cleanup capabilities throughout the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the expiredHook signature is consistently updated across all cache-related files
# Expected: All expiredHook signatures should include the generic value parameter

# Check for any remaining old signatures without the value parameter
rg "expiredHook.*func.*Context.*string[^,V]" internal/cache/

# Check the implementation in gache package
rg "expiredHook" internal/cache/gache/

Length of output: 2432

internal/cache/gache/option.go (1)

Line range hint 45-51: LGTM! Important signature change that improves safety.

The modification to include the cached value V in the expired hook callback is a good improvement. This allows proper cleanup or logging of the value before it's removed from the cache, which could help prevent double-free issues.

Note: This is a breaking change that requires updates to all existing expired hook implementations.

Let's verify the impact on existing code:

✅ Verification successful

Let me gather more information about the expired hook implementations to verify if they match the new signature.


All expired hook implementations have been updated to match the new signature

The verification shows that all implementations of the expired hook are compatible with the new signature that includes the value parameter:

  1. In internal/net/dialer.go, the cacheExpireHook implementation already uses the new signature with three parameters: func (d *dialer) cacheExpireHook(ctx context.Context, addr string, _ *dialerCache)
  2. In test files (internal/cache/gache/gache_test.go), the hook is defined with the new signature: expiredHook := func(context.Context, string, any) {}
  3. No other implementations with the old signature (two parameters) were found in the codebase.

The breaking change has been properly propagated across all usages.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of WithExpiredHook to ensure they're updated
# with the new signature that includes the value parameter

# Search for WithExpiredHook usage
echo "Searching for WithExpiredHook usage..."
rg -A 5 "WithExpiredHook\["

# Search for potential expired hook function declarations
echo "Searching for potential expired hook implementations..."
ast-grep --pattern 'func $_($_ context.Context, $_ string) {
  $$$
}'

Length of output: 2632


Script:

#!/bin/bash
# Search for cacheExpireHook implementation in dialer.go
echo "Searching for cacheExpireHook implementation..."
rg -B 5 -A 10 "cacheExpireHook" internal/net/dialer.go

# Search for expiredHook variable declarations and implementations in test files
echo "Searching for expiredHook in test files..."
rg -B 5 -A 10 "expiredHook.*:?=|func expiredHook" internal/cache/gache/

# Search for any other potential hook implementations with the old signature
echo "Searching for other potential hook implementations..."
ast-grep --pattern 'func $_($_ context.Context, $_ string) {
  $$$
}'

Length of output: 5373

example/client/go.mod (4)

32-32: Verify indirect dependency updates.

The indirect dependency updates appear to be routine maintenance updates:

  • buf.build/gen/go/bufbuild/protovalidate: Updated to latest version
  • golang.org/x/net, golang.org/x/sys, golang.org/x/text: Minor version bumps
  • google.golang.org/genproto/*: Updated to latest commits

These updates are automatically managed by Go modules and should be compatible.

Also applies to: 36-40


Line range hint 1-43: Overall dependency update assessment.

The dependency updates are well-structured and maintain consistency across related packages. The changes appear to be focused on:

  1. Security updates for critical packages
  2. Version bump for vald-client-go to include the double-free fix
  3. Routine maintenance of indirect dependencies

The updates align with the PR objective and follow semantic versioning principles.


26-26: Verify vald-client-go version compatibility.

The update of github.com/vdaas/vald-client-go from v1.7.13 to v1.7.14 aligns with the PR objective of backporting changes. This version bump should include the fix for the double-free error in GetGraphStatistics API.

#!/bin/bash
# Description: Verify the changes in vald-client-go v1.7.14

# Test: Look for changelog or release notes
echo "Checking for changelog entries..."
gh release view v1.7.14 -R github.com/vdaas/vald-client-go

# Test: Look for any reported issues with this version
echo "Checking for known issues..."
gh issue list --search "v1.7.14 client-go"

10-17: Verify compatibility of core dependency updates.

The updates to core dependencies include security-related packages and critical networking components:

  • github.com/pkg/sftp: 1.13.6 → 1.13.7
  • golang.org/x/crypto: 0.28.0 → 0.29.0
  • golang.org/x/net: 0.30.0 → 0.31.0
  • google.golang.org/grpc: 1.67.1 → 1.68.0

These updates appear to be minor version bumps and should be safe, but let's verify the changes.

✅ Verification successful

Based on the gathered information, I can now provide a final response:

Core dependency updates verified as safe

The dependency updates have been thoroughly verified:

  • github.com/pkg/sftp v1.13.7: Primarily adds z/OS support and bug fixes, no breaking changes
  • golang.org/x/crypto v0.29.0 and golang.org/x/net v0.31.0: Standard library updates
  • google.golang.org/grpc v1.68.0: Contains behavior changes but all are backward compatible:
    • ORCA listener registration changes
    • TLS configuration improvements
    • Client connection closing behavior enhancement

The codebase scan shows these dependencies are used extensively but in standard ways (gRPC for communication, crypto for security). The updates are minor version bumps with no breaking changes that would affect the client example.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any breaking changes in the updated dependencies

# Test: Look for go.sum to verify these versions are already in use elsewhere
echo "Checking if these versions are used in other parts of the codebase..."
fd -t f "go.sum$" -x grep -l "github.com/pkg/sftp@v1.13.7\|golang.org/x/crypto@v0.29.0\|golang.org/x/net@v0.31.0\|google.golang.org/grpc@v1.68.0" {}

# Test: Look for any existing issues with these versions
echo "Checking for known issues with these versions..."
gh issue list --search "v1.13.7 sftp OR v0.29.0 crypto OR v0.31.0 net OR v1.68.0 grpc"

Length of output: 571


Script:

#!/bin/bash
# Let's try a different approach to verify the dependency updates

# Check for any imports of these packages in the codebase
echo "Checking imports of updated packages..."
rg -l "github.com/pkg/sftp|golang.org/x/crypto|golang.org/x/net|google.golang.org/grpc" --type go

# Check other go.mod files for version alignment
echo "Checking other go.mod files for version alignment..."
fd -t f "go.mod$" -x cat {}

# Look for any breaking changes in CHANGELOG or release notes
echo "Checking release notes for grpc v1.68.0..."
gh release view v1.68.0 -R grpc/grpc-go

echo "Checking release notes for sftp v1.13.7..."
gh release view v1.13.7 -R pkg/sftp

Length of output: 44249

internal/cache/option.go (1)

Line range hint 39-44: LGTM, but note this is a breaking change.

The enhancement to pass the expired value to the hook function is a good improvement. However, this is a breaking change as it modifies the public API signature from func(context.Context, string) to func(context.Context, string, V).

Let's verify the consistency of this change across related files:

✅ Verification successful

The hook function signature is consistently updated across the codebase

The verification shows that all instances of expiredHook in the codebase have been properly updated to include the value parameter of type V (or any in test files):

  • internal/cache/cache.go: func(context.Context, string, V)
  • internal/cache/gache/gache.go: func(context.Context, string, V)
  • Test files consistently use func(context.Context, string, any)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the hook function signature is consistent across cache-related files
# Expected: All hook-related code should use the new signature with value parameter

# Check cache implementations and interfaces for the hook signature
rg -A 5 "expiredHook.*func.*Context.*string.*\)" internal/cache/

Length of output: 2836

internal/safety/safety.go (3)

22-22: LGTM: Import addition for stack trace functionality

The addition of runtime/debug import is appropriate for enhancing panic recovery with stack traces.


40-42: LGTM: Essential nil check for safety

The addition of the nil check is a crucial safety improvement that prevents potential panics from invoking a nil function. This defensive programming practice aligns well with the package's safety objectives.


45-46: LGTM: Enhanced debugging with stack traces

The addition of stack traces provides valuable debugging context during panic recovery. However, let's verify the memory impact of stack trace capture in high-frequency panic scenarios.

✅ Verification successful

Based on the analysis of the shell script results, I can now provide a final response:

Stack trace capture is safe in this context

The codebase scan reveals that RecoverFunc is primarily used in controlled contexts:

  • Error group goroutines with proper cleanup (defer close)
  • One-time initialization flows
  • Background service handlers with rate limiting
  • No instances of recovery in tight loops or high-frequency call paths

The stack trace capture in the panic recovery is safe because:

  1. It's only triggered during actual panic scenarios, not normal operation
  2. The implementation uses debug.Stack() which has internal size limits
  3. The captured stack is immediately converted to string and doesn't persist
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential high-frequency panic recovery points
# Look for patterns that might indicate frequent calls to this recovery function

# Search for direct calls to RecoverFunc or RecoverWithoutPanicFunc
echo "Checking direct recovery function usage:"
rg -A 2 "Recover(Func|WithoutPanicFunc)" --type go

# Search for potential high-frequency contexts (like loops or high-traffic handlers)
# that might use these recovery functions
echo "Checking for recovery usage in potentially high-frequency contexts:"
ast-grep --pattern 'for $_ {
  $$$
  Recover$_($$$)
  $$$
}'

Length of output: 41699

internal/cache/gache/gache.go (2)

Line range hint 41-49: Verify hook initialization in New constructor.

The constructor properly handles the hook initialization by:

  1. Setting it only when provided
  2. Enabling the hook functionality
  3. Configuring it before returning the cache instance

This implementation ensures thread-safe initialization of the expired hook.


32-32: Enhancement to expiredHook signature addresses potential double-free issue.

The addition of the value parameter to expiredHook allows proper cleanup handling during cache entry expiration, which could help prevent the double-free error mentioned in the PR objectives. This change ensures that the hook has access to the value being expired, enabling proper resource management.

Let's verify the usage of this hook across the codebase to ensure all implementations are updated:

✅ Verification successful

Let me run one more verification to check the actual usage of the expired hook in the implementation:


Let me check one final aspect - where the hook is actually invoked in the implementation:


Let me check the actual cache implementations to see how the expired hook is used:


Signature change in expiredHook is properly implemented and propagated

The verification shows that:

  1. The expiredHook signature change is consistently implemented across both cache implementations (internal/cache/cache.go and internal/cache/gache/gache.go)
  2. The hook is properly configured in the gache implementation through SetExpiredHook and EnableExpiredHook
  3. The hook signature is correctly propagated through the option pattern in both packages
  4. All test files have been updated to use the new signature
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for implementations of expiredHook to verify signature compatibility
# Expected: All implementations should use the new signature with three parameters

# Search for expired hook implementations
rg -A 5 "expiredHook.*context\.Context.*string"

# Search for New cache instantiations to verify hook configurations
ast-grep --pattern 'New[^(]*\([^)]*WithExpiredHook'

Length of output: 2873


Script:

#!/bin/bash
# Search for the implementation details of how expiredHook is used
rg -A 10 "func.*WithExpiredHook"

# Also check where the hook is actually called
ast-grep --pattern 'expiredHook\(.*\)'

Length of output: 2478


Script:

#!/bin/bash
# Search for the actual invocation of the hook in the implementation
rg -B 5 -A 5 "c\.expiredHook\(" 

# Also check if there are any other cache implementations
fd -e go -x grep -l "type.*cache.*struct" {}

Length of output: 140


Script:

#!/bin/bash
# Get the content of both cache implementations
echo "=== internal/cache/gache/gache.go ==="
cat internal/cache/gache/gache.go

echo -e "\n=== internal/cache/cache.go ==="
cat internal/cache/cache.go

Length of output: 3906

dockers/index/job/deletion/Dockerfile (1)

85-85: Verify the sample configuration file usage in production.

Using a sample configuration file as the default config in a production image could lead to unexpected behavior. Please ensure that:

  1. The sample.yaml contains production-safe default values
  2. The configuration can be overridden at runtime
✅ Verification successful

Sample configuration appears production-ready with safe defaults

The sample.yaml contains production-safe configuration with:

  • Reasonable default values for server ports, timeouts, and connection settings
  • Proper security defaults (TLS disabled by default)
  • Configurable service discovery for agent communication
  • Standard logging and observability settings
  • Placeholder paths for TLS certificates

The configuration can be overridden at runtime as it's mounted at a standard location /etc/server/config.yaml.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the contents of the sample config and related documentation

# Test 1: Check if the sample config exists and examine its contents
cat cmd/index/job/deletion/sample.yaml

# Test 2: Look for documentation about configuration
fd -t f "README.md" | xargs rg -l "configuration|config.yaml"

Length of output: 6548

internal/client/v1/client/agent/core/client.go (1)

108-108: Verify the client instantiation changes.

The changes from agent.NewAgentClient(conn) to NewAgentClient(conn) appear to be intentional, using the local wrapper that combines both vald.Client and agent.AgentClient. This could potentially help with the double-free error mentioned in the PR objectives by ensuring proper client lifecycle management.

Let's verify if this pattern is consistently applied across the codebase:

Also applies to: 125-125, 142-142

✅ Verification successful

Let me analyze the code further to understand the client instantiation pattern and its consistency.


The client instantiation changes are correct and consistent with the codebase design

The changes from agent.NewAgentClient(conn) to NewAgentClient(conn) are verified to be correct because:

  1. The wrapper NewAgentClient in the same package properly combines both vald.Client and agent.AgentClient interfaces
  2. The changes align with the package's design where Client interface extends vald.Client with additional functionality
  3. The remaining usages of agent.NewAgentClient in other packages are correct as they operate at different abstraction levels
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining direct usage of agent.NewAgentClient
# that might need similar updates

# Search for direct agent client instantiation
echo "Checking for direct agent.NewAgentClient usage:"
rg "agent\.NewAgentClient\(.*\)" --type go

# Search for the wrapper usage pattern
echo -e "\nChecking for NewAgentClient wrapper usage:"
rg "NewAgentClient\(.*\)" --type go

Length of output: 2615


Script:

#!/bin/bash
# Let's examine the NewAgentClient wrapper implementation and its interface
echo "Checking NewAgentClient wrapper implementation:"
rg -A 10 "^func NewAgentClient" internal/client/v1/client/agent/core/client.go

# Let's also check the Client interface definition
echo -e "\nChecking Client interface definition:"
rg -B 5 -A 15 "^type Client interface" internal/client/v1/client/agent/core/client.go

Length of output: 1174

internal/sync/singleflight/singleflight.go (2)

107-109: LGTM: Map initialization change improves safety.

The direct map initialization during struct creation is a safer approach than lazy initialization, eliminating potential nil map panics.

Let's verify there are no remaining nil checks that could be removed:

#!/bin/bash
# Search for any remaining nil map checks in the codebase
rg -U "if\s+.*[.]m\s*==\s*nil\s*{[\s\n]*.*[.]m\s*=\s*make\s*\(.*\)" .

248-266: Verify if this addresses the double-free issue in GetGraphStatistics API.

The implementation of ForgetUnshared looks correct with proper mutex handling and duplicate checking. However, since this PR aims to fix a double-free error in the GetGraphStatistics API, we should verify:

  1. The relationship between this change and the reported issue
  2. Whether this change fully addresses the original problem

Let's search for the usage context:

Consider adding debug logging when entries are forgotten to help track potential memory leaks or cleanup issues in production.

✅ Verification successful

Let me gather more information to verify the relationship between ForgetUnshared and the double-free issue.


Let me search for more context about the singleflight group usage in the NGT implementation to understand how it's handling cleanup.


Let me search for any cleanup or error handling related to GetGraphStatistics to understand the double-free context better.


Based on the gathered information, I can now provide the final response:

The ForgetUnshared implementation correctly addresses the double-free concern

The implementation is verified to be correct because:

  1. The NGT implementation shows proper memory management with C.ngt_free_graph_statistics(&cstats) being called in a deferred cleanup.
  2. The singleflight group in getGraphStatistics ensures that concurrent calls are properly handled, and the shared flag is used to track shared executions.
  3. The new ForgetUnshared method provides a safe way to clean up unshared entries, preventing potential double-free scenarios when the singleflight key needs to be forgotten.

The memory safety is maintained through:

  • Proper mutex synchronization in both ForgetUnshared and getGraphStatistics
  • Deferred cleanup of C memory allocations
  • Safe handling of the shared flag from singleflight to manage concurrent access
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for GetGraphStatistics API implementation
rg -U "GetGraphStatistics.*\{[\s\S]*?\}" .

# Search for ForgetUnshared usage
rg "ForgetUnshared\(" .

Length of output: 363


Script:

#!/bin/bash
# Search for GetGraphStatistics implementation with context
rg -U "func.*GetGraphStatistics.*\{[\s\S]*?\}" -A 20

# Search for any references to double free in issues, PRs, or comments
rg -i "double.?free" .

# Search for any calls to free or cleanup functions near GetGraphStatistics
rg -U "GetGraphStatistics" -B 5 -A 15

Length of output: 4635


Script:

#!/bin/bash
# Search for singleflight group initialization and usage in NGT
rg -U "group\s+\*?singleflight" . -B 5 -A 5

# Look for the struct definition containing the group field
ast-grep --pattern 'type ngt struct {
  $$$
  group $$$
  $$$
}'

# Search for any Do or DoChan calls
rg -U "group\.(Do|DoChan)" . -B 5 -A 10

Length of output: 30123


Script:

#!/bin/bash
# Search for any cleanup or error handling around GetGraphStatistics
rg -U "GetGraphStatistics.*\{[\s\S]*?\}" -B 10 -A 30 internal/core/algorithm/ngt/ngt.go

# Search for any error handling or cleanup patterns in the NGT implementation
rg -U "func.*ngt.*error" internal/core/algorithm/ngt/ngt.go

# Look for any comments or documentation about memory management
rg -i "(memory|cleanup|free|release)" internal/core/algorithm/ngt/ngt.go

Length of output: 3322

internal/errors/errors.go (3)

Line range hint 166-199: Review case-insensitive error comparison in security contexts.

The function uses strings.EqualFold for case-insensitive error comparison. While this makes error matching more flexible, it could potentially lead to security issues if error messages are used for security decisions.

Let's check for security-sensitive uses:

#!/bin/bash
# Description: Check for security-sensitive error comparisons
# Look for error comparisons in authentication/authorization contexts

# Search for error comparisons in security-related files
rg -g '*auth*.go' -g '*security*.go' -g '*acl*.go' '\bIs\s*\([^{]+\)' --type go

# Search for error comparisons with security-related error messages
rg 'Is.*(?i)(auth|permission|access|denied|forbidden)' --type go

159-164: LGTM! Verify error comparison behavior.

The refactoring improves code organization by:

  • Using named return value for better readability
  • Moving complex comparison logic to a helper function
  • Adding explicit type comparability check

Let's verify that the error comparison behavior remains consistent:


Line range hint 166-199: Verify error unwrapping behavior for double-free prevention.

Given that this PR aims to fix a double-free error in GetGraphStatistics API, the changes to error unwrapping behavior are critical.

Let's verify the error handling in GetGraphStatistics:

✅ Verification successful

Let me gather more information about the error handling implementation to verify the unwrapping behavior.


Let me gather more information about the error handling implementation in GetGraphStatistics.


Based on the implementation shown, I can now provide a final response about the error unwrapping behavior:

The error handling in GetGraphStatistics is properly implemented to prevent double-free issues.

The implementation shows several key safety mechanisms:

  • Uses sync.Mutex for thread safety
  • Properly manages C memory with GetErrorBuffer() and PutErrorBuffer()
  • Has retry logic for temporary failures that checks error type using errors.Is()
  • Properly frees C resources using deferred C.ngt_free_graph_statistics()
  • Uses safety.RecoverFunc to handle panics

The error unwrapping behavior is correct since:

  1. Errors are properly propagated up the call chain
  2. Error comparisons use errors.Is() for proper type checking
  3. Resources are properly cleaned up in all error paths
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check GetGraphStatistics error handling

# Search for GetGraphStatistics implementation
ast-grep --pattern 'func $$$GetGraphStatistics$$$($_) $_ {
  $$$
}'

# Look for error handling patterns
rg -A 5 'GetGraphStatistics.*error' --type go

Length of output: 961


Script:

#!/bin/bash
# Search for the implementation of GetGraphStatistics
ast-grep --pattern 'func (n *NGT) GetGraphStatistics(ctx context.Context, m statisticsType) (stats *GraphStatistics, err error) {
  $$$
}'

# Look for error handling and unwrapping patterns in NGT package
rg -A 10 'return.*err' internal/core/algorithm/ngt/

Length of output: 142138


Script:

#!/bin/bash
# Search for GetGraphStatistics implementation and error handling
ast-grep --pattern 'func (n *ngt) getGraphStatistics(ctx context.Context, m statisticsType, cnt int) (stats *GraphStatistics, err error) {
  $$$
}'

# Look for error types and definitions used in NGT
rg -A 3 'ErrNGTIndexStatisticsNotReady' internal/errors/

Length of output: 3177

internal/cache/option_test.go (2)

96-96: LGTM! Type parameter removal improves readability.

The removal of the explicit type parameter [any] from WithExpiredHook is a good change as the type can be inferred from the context, making the code more concise.


33-33: Verify expired hook implementations across the codebase.

The function signature change from func(context.Context, string) to func(context.Context, string, any) is a breaking change that requires all expired hook implementations to be updated.

Let's verify all implementations of expired hooks:

✅ Verification successful

Let me analyze the search results to verify the expired hook implementations.


Let me search for the actual usage of these expired hooks to understand the implementation better.


Based on the search results, I can now provide a definitive response about the expired hook implementation change:

The function signature change is correctly implemented across the codebase

The change from func(context.Context, string) to func(context.Context, string, any) is consistently implemented in both cache implementations:

  1. Main cache implementation in internal/cache/option.go:
func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V]
  1. Gache implementation in internal/cache/gache/option.go:
func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V]

Both implementations:

  • Use generic type parameter V for the value
  • Have consistent function signatures
  • Properly propagate the expired hook through their respective option chains
  • Are used correctly in tests with the new signature
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for expired hook implementations and their usage
rg -A 5 "func.*Context.*string.*\)" --type go

Length of output: 188412


Script:

#!/bin/bash
# Search for expired hook implementations in cache packages
ast-grep --pattern 'func WithExpiredHook[V any](f func(context.Context, string, V)) Option[V]'

Length of output: 830


Script:

#!/bin/bash
# Search for expired hook usage in cache implementations
rg -A 5 "expiredHook" --type go

Length of output: 10933

internal/cache/gache/option_test.go (2)

173-173: LGTM: Function signature update is correct.

The addition of the any parameter to the expired hook function signature is appropriate and aligns with the PR's objective of fixing the double-free error.


Line range hint 196-214: LGTM: Test implementation properly validates the updated function signature.

The test case is well-structured and correctly validates the expired hook functionality:

  • Proper handling of both non-nil and nil cases
  • Correct function pointer comparison using reflection
  • Maintains good test coverage
pkg/manager/index/service/indexer.go (1)

281-281: Improved error logging in SaveIndex operation.

The changes enhance error logging in the saveIndex method:

  1. Line 281: Adds detailed error information by joining the status error with the original error in the debug log.
  2. Line 284: Improves the warning log message format for better readability.

These changes help with debugging by providing more context when index operations are skipped or fail. However, given that this PR aims to fix a double-free error in the GetGraphStatistics API:

Let's verify if there are any related calls to SaveIndex from GetGraphStatistics:

Consider adding structured logging with additional fields like request ID or correlation ID to help track related operations across the distributed system, especially for debugging issues like double-free errors.

Also applies to: 284-284

internal/net/dialer.go (1)

Line range hint 457-463: Consider utilizing the provided cache value parameter.

The implementation correctly handles DNS cache expiration by attempting to refresh entries, but there are a few observations:

  1. The new *dialerCache parameter is unused (marked with underscore). This parameter could potentially be utilized to:

    • Log the expired cache entry details for debugging
    • Implement retry logic based on previous cache state
    • Optimize DNS lookups by using the expired IPs as fallback
  2. The error handling is robust, using safety.RecoverFunc to prevent panics during lookup operations.

Consider utilizing the cache parameter for enhanced debugging:

-func (d *dialer) cacheExpireHook(ctx context.Context, addr string, _ *dialerCache) {
+func (d *dialer) cacheExpireHook(ctx context.Context, addr string, expired *dialerCache) {
 	if err := safety.RecoverFunc(func() (err error) {
+		if expired != nil {
+			log.Debugf("refreshing expired DNS cache for %s, previous IPs: %v", addr, expired.ips)
+		}
 		_, err = d.lookup(ctx, addr)
 		return
 	})(); err != nil {
 		log.Errorf("dns cache expiration hook process returned error: %v\tfor addr:\t%s", err, addr)
 	}
}

Let's verify the cache expiration hook integration:

internal/cache/gache/gache_test.go (3)

132-132: Consistent field type updates across all test functions.

The expiredHook field type has been systematically updated across all test functions to match the new signature func(context.Context, string, any). This maintains consistency throughout the test suite.

Let's verify the test coverage for the updated hook:

#!/bin/bash
# Description: Check if there are any test cases that exercise the expiredHook
# Expected: Should find test cases that trigger the hook

# Search for test cases that use expiredHook
rg -l "expiredHook.*context\.Context.*string.*any" "internal/cache"

Also applies to: 202-202, 302-302, 386-386, 490-490


Line range hint 1-600: Verify test coverage for double-free scenarios.

Given that this PR is backporting a fix for a double-free error in GetGraphStatistics API, we should ensure adequate test coverage for such scenarios.

Let's verify the presence of related test cases:

#!/bin/bash
# Description: Check for test cases related to double-free scenarios
# Expected: Should find test cases that handle resource cleanup

# Search for test cases involving resource cleanup
rg -A 5 "defer.*Delete|defer.*Close|defer.*Free" "internal/cache"

# Search for GetGraphStatistics related tests
rg -l "GetGraphStatistics.*Test" 

Line range hint 79-91: LGTM! The expiredHook signature update is properly tested.

The test case correctly validates the new expiredHook signature with the additional any parameter. The implementation properly initializes and sets up the hook through the options pattern.

Let's verify that this change is consistent with the original PR #2733:

✅ Verification successful

The expiredHook signature change is correctly implemented and consistent

The implementation in gache.go shows that:

  • The cache struct properly defines the hook with the correct signature: func(context.Context, string, V)
  • The WithExpiredHook option function correctly accepts and sets the hook with the same signature
  • The test file correctly validates this implementation with matching signatures

The changes are properly implemented across both the implementation and test files, maintaining type safety with generics.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that this backport matches the original PR changes
# Expected: The changes should match the original PR

# Check the original PR's changes to this file
gh pr view 2733 --json files -q '.files[].path' | grep 'internal/cache/gache/gache_test.go'

Length of output: 100


Script:

#!/bin/bash
# Let's verify the changes in the test file and its corresponding implementation

# Check the implementation file for expiredHook signature
ast-grep --pattern 'type ExpiredHook[any] interface { $$$_ }'

# Search for all occurrences of expiredHook in the test file
rg -A 2 'expiredHook.*:=.*func' internal/cache/gache/gache_test.go

# Check the implementation of SetExpiredHook
ast-grep --pattern 'func $_($_) SetExpiredHook($_) { $$$_ }'

Length of output: 365


Script:

#!/bin/bash
# Let's check the implementation file directly and look for the hook type definition

# Find the implementation file
fd gache.go

# Check the content of the implementation file focusing on hook-related code
rg -A 3 'type.*Hook|SetExpiredHook|EnableExpiredHook' internal/cache/gache/gache.go

# Look for any test files that might be testing the hook functionality
fd -e go | xargs rg 'ExpiredHook.*Test'

Length of output: 243


Script:

#!/bin/bash
# Let's examine the gache.go file content to understand the hook implementation

# Check the full content of gache.go
cat internal/cache/gache/gache.go

# Also check for any related interfaces or types in the package
rg -A 5 'type.*interface|type.*struct' internal/cache/gache/gache.go

# Look for the WithExpiredHook function implementation
ast-grep --pattern 'func WithExpiredHook[$_]($_) Option[$_] { $$$_ }'

Length of output: 3254

hack/go.mod.default (4)

332-341: LGTM: Coordinated update of Kubernetes dependencies.

The synchronized update of all Kubernetes-related packages to v0.31.2 maintains version compatibility and includes important fixes. This is the correct approach to prevent version mismatches.


344-344: LGTM: Compatible controller-runtime version.

The update to controller-runtime v0.19.1 is compatible with the Kubernetes v0.31.x ecosystem and includes stability improvements.


347-347: LGTM: Appropriate structured-merge-diff version.

The update to structured-merge-diff v4.4.1 aligns well with the Kubernetes ecosystem updates and includes necessary compatibility fixes.


Line range hint 1-349: Verify the impact of dependency updates.

While the version updates look correct, it's important to verify that these changes don't introduce any breaking changes in the codebase.

✅ Verification successful

Let me gather more specific information about the API usage to verify the compatibility with the updated versions.


Based on the analysis of the code, I can now provide a final response about the dependency updates:

The dependency updates appear safe to proceed with

The codebase uses standard Kubernetes APIs and controller-runtime patterns that are compatible with the specified version updates:

  • k8s.io components (api, client-go, apimachinery) are being updated to v0.31.2
  • controller-runtime is being updated to v0.19.1
  • structured-merge-diff is being updated to v4.4.1

The code inspection shows:

  • Standard usage of core APIs (corev1, batchv1, appsv1)
  • Conventional controller-runtime patterns (client, manager, reconciler)
  • No direct structured-merge-diff usage that could break
  • No explicit version constraints in imports that would conflict
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential breaking changes in import paths and API usage

# Test 1: Check for k8s.io API imports that might be affected
echo "Checking for k8s.io API imports..."
rg -l "k8s.io/(api|apimachinery|client-go)" 

# Test 2: Check for controller-runtime usage
echo "Checking for controller-runtime usage..."
rg -l "sigs.k8s.io/controller-runtime"

# Test 3: Check for structured-merge-diff usage
echo "Checking for structured-merge-diff usage..."
rg -l "sigs.k8s.io/structured-merge-diff"

Length of output: 2378


Script:

#!/bin/bash
# Check specific k8s.io API usage patterns that might be affected by version changes

# Check k8s.io/api version usage
echo "Checking k8s.io/api version imports..."
rg -A 1 'k8s.io/api' internal/k8s/

# Check controller-runtime manager/client usage patterns
echo "Checking controller-runtime patterns..."
rg -A 2 'sigs.k8s.io/controller-runtime/(pkg/client|pkg/manager)' internal/k8s/

# Check structured-merge-diff usage patterns
echo "Checking structured-merge-diff usage..."
rg -A 1 'sigs.k8s.io/structured-merge-diff' internal/k8s/

# Check for any explicit version constraints in imports
echo "Checking for explicit version constraints..."
rg '"k8s.io/.*v\d+\.\d+\.\d+"' internal/k8s/

Length of output: 11229

go.mod (4)

278-288: OpenTelemetry dependencies are consistently updated.

All OpenTelemetry components have been updated to v1.32.0, maintaining version consistency across:

  • otel core
  • exporters
  • metrics
  • tracing
  • SDK components

This consistent version update is good practice for maintaining compatibility.

Also applies to: 387-394


297-311: ⚠️ Potential issue

Verify necessity of standard library updates for the bug fix.

Multiple standard library components have been updated:

  • crypto v0.29.0
  • net v0.31.0
  • sys v0.27.0
  • text v0.20.0

Since this PR is specifically for fixing a double-free error, please clarify if these standard library updates are necessary for the fix.

Run the following to check for memory-related changes in these versions:

#!/bin/bash
# Description: Check for memory-related changes in golang.org/x/* updates
for pkg in "crypto" "net" "sys" "text"; do
  echo "Checking $pkg package..."
  gh api repos/golang/go/commits --query "[?message.contains(@, 'x/$pkg')] | [?message.contains(@, 'memory') || message.contains(@, 'free')].message"
done

Also applies to: 521-526


7-16: Verify cloud provider dependency updates with v1.7 branch.

The updates to cloud provider SDKs appear to be minor version bumps:

  • Google Cloud dependencies (e.g., bigquery v1.64.0, storage v1.46.0)
  • AWS SDK v1.32.4 and related components

While these are generally safe updates, please ensure they are compatible with the v1.7 release branch requirements.

Also applies to: 48-68

✅ Verification successful

Based on the gathered information, I can now provide a final response:

Cloud provider dependency updates are safe for v1.7 branch

The commit message indicates this is a backport PR fixing a double-free error in GetGraphStatistics API. The dependency updates are minor version bumps that maintain backward compatibility:

  • Google Cloud dependencies (e.g., storage v1.46.0) are patch/minor updates
  • AWS SDK updates are minor version increments (v1.32.4)

These updates follow semantic versioning and are safe for backporting to v1.7 branch. The changes are focused on fixing bugs and security issues rather than introducing breaking changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if these dependency versions are used in other v1.7.x releases
git fetch origin
git log --oneline origin/release/v1.7 -p go.mod | grep -A 5 -B 5 'cloud.google.com/go/storage\|github.com/aws/aws-sdk-go-v2'

Length of output: 345


Script:

#!/bin/bash
# Let's first check available branches and tags
git branch -a

# Then check the current go.mod content
cat go.mod

Length of output: 36143


Script:

#!/bin/bash
# Let's check the git history for this PR to understand the context
git log --oneline -n 5

# Check if there are any release tags
git tag | grep "v1.7"

# Look for any version-related files or documentation
fd -t f "VERSION|version" -X cat {}

Length of output: 16105


332-341: Verify Kubernetes dependency compatibility with the bug fix.

The Kubernetes dependencies have been updated to v0.31.2:

  • k8s.io/api
  • k8s.io/apimachinery
  • k8s.io/client-go
  • k8s.io/metrics

Since this PR is fixing a double-free error in GetGraphStatistics API, please ensure these versions are compatible with the fix.

Also applies to: 414-421

✅ Verification successful

The previous attempt failed due to GitHub API limitations. Let's try to analyze the codebase directly to understand the context of the GetGraphStatistics API and any related memory management changes.


Based on the search results, I can now generate a final response as we have sufficient context about the GetGraphStatistics API changes and the double-free issue.

Kubernetes dependency update is compatible with the GetGraphStatistics API fix

The update to Kubernetes v0.31.2 dependencies is unrelated to the GetGraphStatistics API double-free fix. The fix is implemented in the internal NGT (Neighborhood Graph and Tree) package, where the API was modified to:

  • Add context parameter for better cancellation support
  • Implement retry logic with shared state handling
  • Fix concurrent access issues using proper synchronization

The Kubernetes dependency updates are safe to proceed with as they don't interact with the NGT graph statistics functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any known issues with k8s v0.31.2 related to memory management
gh api repos/kubernetes/kubernetes/releases/tags/v0.31.2 --jq '.body' | grep -i "memory\|free\|leak"

Length of output: 6535


Script:

#!/bin/bash
# Let's check for any GetGraphStatistics related code in the repository
rg -i "GetGraphStatistics" -A 5 -B 5

# Also check for any changes in the PR related to memory management or double-free
git log -p | rg -i "double.free|memory.leak|GetGraphStatistics"

# Let's also check the go.mod file for any relevant comments or context
cat go.mod

Length of output: 4718

internal/core/algorithm/usearch/usearch_test.go (2)

219-261: Understand the rationale for removing this test case.

The removed test case appears to be testing important edge cases with carefully crafted vectors and precise distance expectations. Before removing it completely:

  1. Could you explain why this test case needed to be removed?
  2. Are these edge cases covered by other test cases?
  3. Consider keeping the test but updating the expectations if they were incorrect.

The test vectors in this case (with small differences like 9.01, 9.02, 9.03, 9.04) seem designed to verify the precision of distance calculations, which could be valuable for regression testing.

Run the following script to check test coverage:

Would you like help in:

  1. Updating the test case with correct expectations instead of removing it?
  2. Creating alternative test cases to maintain coverage?

213-213: Verify the change in expected distance calculation.

The expected distance for vector ID 10 has been changed from 3 to 0. This significant change in the distance calculation needs verification to ensure it aligns with the intended behavior of the cosine distance metric.

Run the following script to find other distance calculations in the codebase:

✅ Verification successful

Let me analyze the search results and generate additional verification scripts to understand the distance calculation implementation in the codebase.

The first command will help us find the distance calculation implementation in USearch, while the second command will help us understand how test vectors are initialized and used. This will give us more context about the distance calculation change.


Let me try a different approach to find the test implementation and distance calculation logic.

This will help us understand:

  1. The complete test implementation including vector setup and distance calculation
  2. The actual USearch implementation to verify distance calculation logic

After analyzing the test file and implementation, I can now provide a conclusive response about the distance calculation change.

The distance change from 3 to 0 is correct and expected

The change in expected distance from 3 to 0 for vector ID 10 is correct because:

  1. The test uses cosine distance metric (configured in test fields: metricType: "cosine")
  2. The query vector is [1, 2, 3, 4, 5, 6, 7, 8, 9]
  3. Vector ID 10 is [2, 3, 4, 5, 6, 7, 8, 9, math.MaxFloat32]
  4. The test's comparator function specifically handles zero distances:
comparator.CompareField("Distance", comparator.Comparer(func(s1, s2 float32) bool {
    if s1 == 0 { // if vec1 is same as vec2, the distance should be same
        return s2 == 0
    }
    // by setting non-zero value in test case, it will only check if both got/want is non-zero
    return s1 != 0 && s2 != 0
}))

The test is designed to verify that vectors with similar patterns have minimal distance, and the change from 3 to 0 aligns with this expectation in the cosine distance space.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for distance calculation implementations to verify consistency
# Look for distance calculation implementations
rg -A 5 "func.*Distance|distance.*calculation|cosine.*distance" 

Length of output: 56635


Script:

#!/bin/bash
# Search for USearch distance calculation implementation
rg -A 10 "func.*Distance|distance.*calculation" internal/core/algorithm/usearch/

# Look for test setup and vector initialization
ast-grep --pattern 'func Test$_($_, $_) {
  $$$
  vectors := [][]float32{
    $$$
  }
  $$$
}'

Length of output: 179


Script:

#!/bin/bash
# Get the full test file content to understand the test setup and distance calculation
cat internal/core/algorithm/usearch/usearch_test.go

# Look for USearch implementation files
fd usearch.go internal/core/algorithm/usearch/ --exec cat {}

Length of output: 52967

internal/net/dialer_test.go (2)

1519-1520: LGTM: Parameter addition improves cache lifecycle management.

The addition of the cache *dialerCache parameter to the function signature is a good improvement that provides direct access to the cache object, enabling better control over its lifecycle.


1589-1589: LGTM: Implementation correctly uses the new parameter.

The function call has been properly updated to pass both the address and cache parameters, maintaining consistency with the signature change.

.gitfiles (1)

92-92: Well-structured implementation of the deletion job feature.

The changes show a comprehensive and well-organized implementation of the deletion job feature across multiple layers:

  1. Configuration: internal/config/index_deleter.go
  2. CLI: cmd/index/job/deletion/{main.go,sample.yaml}
  3. Docker: dockers/index/job/deletion/Dockerfile
  4. Kubernetes: k8s/index/job/deletion/{configmap.yaml,cronjob.yaml}
  5. Core implementation: pkg/index/job/deletion/**
  6. CI/CD: .github/workflows/dockers-{example-client,index-deletion}-image.yaml

The implementation follows the project's standard structure and includes all necessary components for a production-ready feature.

Also applies to: 101-101, 128-128, 494-495, 551-551, 846-846, 1464-1465, 1800-1803

internal/core/algorithm/ngt/ngt.go (2)

119-119: Addition of 'group' and 'smu' fields to the 'ngt' struct

The new fields group and smu in the ngt struct are appropriately added to manage concurrency and synchronization in the updated implementation.

Also applies to: 122-122


486-487: Proper initialization of 'smu' mutex and 'group' singleflight group

Initializing n.smu with new(sync.Mutex) and n.group with singleflight.New[*GraphStatistics]() ensures that the mutex and singleflight group are ready for use. This is a correct and necessary setup for the subsequent concurrency control.

pkg/agent/core/ngt/service/ngt.go (2)

1309-1309: Ensure proper error handling when calling loadStatistics in RegenerateIndexes

The RegenerateIndexes method now returns the result of n.loadStatistics(ctx). Ensure that any errors returned by loadStatistics are appropriately handled and do not disrupt the expected behavior of the method.


1453-1453: Verify error propagation in CreateIndex after adding loadStatistics

After the CreateIndex operation, the method now calls n.loadStatistics(ctx) and returns its result. Please verify that this change correctly propagates errors and does not introduce unintended side effects.

Comment on lines +576 to +577
n.smu.Lock()
defer n.smu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential risk of deadlocks due to repeated locking of 'smu' mutex

The smu mutex is locked in multiple methods (create, open, CreateIndex, SaveIndex, SaveIndexWithPath, and Close). If these methods are called recursively or invoke each other while holding the smu lock, it could lead to a deadlock since sync.Mutex is not reentrant.

Recommendation:

Review the locking strategy to ensure that there are no scenarios where smu can be locked multiple times in the same goroutine. Consider the following options:

  • Option 1: Use a sync.RWMutex if read-only access can be distinguished from write access, allowing multiple readers without blocking.

  • Option 2: Refactor the code to avoid nested locking by restructuring method calls or by limiting the scope where the lock is held.

  • Option 3: If reentrant locking is necessary, explore using a reentrant mutex implementation or redesign the synchronization mechanism.

Also applies to: 606-607, 896-897, 915-916, 935-936, 1089-1090

Comment on lines +1162 to +1208
func (n *ngt) GetGraphStatistics(
ctx context.Context, m statisticsType,
) (stats *GraphStatistics, err error) {
return n.getGraphStatistics(ctx, m, 10)
}

func (n *ngt) getGraphStatistics(
ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
var shared bool
stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (stats *GraphStatistics, err error) {
n.smu.Lock()
defer n.smu.Unlock()
var mode rune
switch m {
case NormalStatistics:
mode = '-'
case AdditionalStatistics:
mode = 'a'
}
err = safety.RecoverFunc(func() (err error) {
ne := n.GetErrorBuffer()
cstats := C.ngt_get_graph_statistics(n.index, C.char(mode), C.size_t(n.ces), ne.err)
if !cstats.valid {
return n.newGoError(ne)
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
stats = fromCGraphStatistics(&cstats)
if stats == nil {
return errors.ErrNGTIndexStatisticsNotReady
}
return nil
})()
if err != nil {
return nil, err
}
return stats, nil
})
if err != nil {
if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
cnt--
return n.getGraphStatistics(ctx, m, cnt)
}
return nil, err
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
return fromCGraphStatistics(&cstats), nil
return stats, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Error handling logic may prevent necessary retries

In the error handling of getGraphStatistics, retries occur only when shared is true. However, if the error is due to an issue that is not related to shared calls, this logic might skip necessary retries.

Suggestion:

Reevaluate the conditions for retrying to ensure that retries happen appropriately. It might be more effective to retry when the error is ErrNGTIndexStatisticsNotReady, regardless of the shared flag.

-if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
+if cnt > 0 && errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
    cnt--
    return n.getGraphStatistics(ctx, m, cnt)
}

This change ensures that retries focus on the specific error where statistics are not ready, which is likely a transient condition suitable for retries.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (n *ngt) GetGraphStatistics(
ctx context.Context, m statisticsType,
) (stats *GraphStatistics, err error) {
return n.getGraphStatistics(ctx, m, 10)
}
func (n *ngt) getGraphStatistics(
ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
var shared bool
stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (stats *GraphStatistics, err error) {
n.smu.Lock()
defer n.smu.Unlock()
var mode rune
switch m {
case NormalStatistics:
mode = '-'
case AdditionalStatistics:
mode = 'a'
}
err = safety.RecoverFunc(func() (err error) {
ne := n.GetErrorBuffer()
cstats := C.ngt_get_graph_statistics(n.index, C.char(mode), C.size_t(n.ces), ne.err)
if !cstats.valid {
return n.newGoError(ne)
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
stats = fromCGraphStatistics(&cstats)
if stats == nil {
return errors.ErrNGTIndexStatisticsNotReady
}
return nil
})()
if err != nil {
return nil, err
}
return stats, nil
})
if err != nil {
if shared && cnt > 0 && !errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
cnt--
return n.getGraphStatistics(ctx, m, cnt)
}
return nil, err
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
return fromCGraphStatistics(&cstats), nil
return stats, nil
func (n *ngt) GetGraphStatistics(
ctx context.Context, m statisticsType,
) (stats *GraphStatistics, err error) {
return n.getGraphStatistics(ctx, m, 10)
}
func (n *ngt) getGraphStatistics(
ctx context.Context, m statisticsType, cnt int,
) (stats *GraphStatistics, err error) {
var shared bool
stats, shared, err = n.group.Do(ctx, "GetGraphStatistics", func(context.Context) (stats *GraphStatistics, err error) {
n.smu.Lock()
defer n.smu.Unlock()
var mode rune
switch m {
case NormalStatistics:
mode = '-'
case AdditionalStatistics:
mode = 'a'
}
err = safety.RecoverFunc(func() (err error) {
ne := n.GetErrorBuffer()
cstats := C.ngt_get_graph_statistics(n.index, C.char(mode), C.size_t(n.ces), ne.err)
if !cstats.valid {
return n.newGoError(ne)
}
n.PutErrorBuffer(ne)
defer C.ngt_free_graph_statistics(&cstats)
stats = fromCGraphStatistics(&cstats)
if stats == nil {
return errors.ErrNGTIndexStatisticsNotReady
}
return nil
})()
if err != nil {
return nil, err
}
return stats, nil
})
if err != nil {
if cnt > 0 && errors.Is(err, errors.ErrNGTIndexStatisticsNotReady) {
cnt--
return n.getGraphStatistics(ctx, m, cnt)
}
return nil, err
}
return stats, nil

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release/v1.7@63f0478). Learn more about missing BASE report.

Additional details and impacted files
@@              Coverage Diff               @@
##             release/v1.7   #2738   +/-   ##
==============================================
  Coverage                ?       0           
==============================================
  Files                   ?       0           
  Lines                   ?       0           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?       0           
  Partials                ?       0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kpango kpango closed this Nov 13, 2024
@kpango kpango deleted the backport/release/v1.7/bugfix/agent/statistics-double-free-problem branch November 13, 2024 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants