feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. #632

junhaoliao · 2024-12-11T14:38:39Z

Description

Enable replica set for the MongoDB results cache.
Configure replica set in the indices building script.
Change the results cache's Docker network mode to host and specify --bind-ip and --port when starting the mongo image.

Validation performed

Search query performance compassion

With & Without the replica set enabled, performed a search query in the WebUI with a screen recording of 30fps, and extract below times:

T0 (relative 0 second): the query was started.
T1: the first batch of results were available and displayed in the browser.
T2: the total number of results are available (either from the results-metadata or summed from the aggregated time buckets) and displayed in the browser.
T3: the job is marked as done and displayed in the browser.

	T1	T2	T3
Trial 1 (with replica-set)	0.7	1.266667	2.1
Trial 2 (with replica-set)	0.633333	1.433333	2.166667
Trial 3 (with replica-set)	0.433333	1.4	2.1
Trial 4 (w/o replica-set)	0.466667	1.433333	2.1
Trial 5 (w/o replica-set)	0.5	1.433333	2.066667
Trial 6 (w/o replica-set)	0.6	1.433333	2.2

The speed performance does not seem to significantly differ with and without replica-set enabled.

Result cache host switching

Previous attempts to enable replica set failed because there were host reachability issues observed. Below validations ensure the clp-config.yml/results_cache host and port are configurable without affecting the result cache's avaialbility.

build/clp-package/lib/python3/site-packages/clp_py_utils/create-results-cache-indices.py

Built the clp-package.

clp-package/sbin/start-clp.sh with the default config and observed below logs:

2024-12-25T22:45:58.892 INFO [start_clp] Creating results_cache indices...
2024-12-26 03:45:59,548 [DEBUG] Replica set initialization requested for localhost:27017
2024-12-26 03:45:59,551 [DEBUG] Replica set has not been previously initialized.
2024-12-26 03:45:59,551 [DEBUG] Initializing replica set at localhost:27017
2024-12-26 03:45:59,693 [DEBUG] Replica set initialized successfully.
2024-12-25T22:46:00.517 INFO [start_clp] Created results_cache indices.

Changed the result_cache's host and port config in clp-package/etc/clp-config.yml to be localhost:27018. e.g.,

results_cache:
  host: "localhost"
  port: 27018
  db_name: "clp-query-results"
  stream_collection_name: "stream-files"

clp-package/sbin/start-clp.sh and observed below logs:

2024-12-25T22:50:29.309 INFO [start_clp] Creating results_cache indices...
2024-12-26 03:50:29,761 [DEBUG] Replica set initialization requested for localhost:27018
2024-12-26 03:50:30,266 [DEBUG] Replica set is already initialized but reports invalid config.
2024-12-26 03:50:30,266 [DEBUG] Initializing replica set at localhost:27018
2024-12-26 03:50:30,269 [DEBUG] Replica set initialized successfully.
2024-12-25T22:50:31.042 INFO [start_clp] Created results_cache indices.

Used ip addr to find another address that belongs to the clp-package host (i got 172.22.78.194). Changed the result_cache's host and port config in clp-package/etc/clp-config.yml to be 172.22.78.194:27018. clp-package/sbin/start-clp.sh and observed below logs:

2024-12-25T22:54:43.400 INFO [start_clp] Creating results_cache indices...
2024-12-26 03:54:43,919 [DEBUG] Replica set initialization requested for 172.22.78.194:27018
2024-12-26 03:54:44,422 [DEBUG] Replica set is already initialized but reports invalid config.
2024-12-26 03:54:44,422 [DEBUG] Initializing replica set at 172.22.78.194:27018
2024-12-26 03:54:44,427 [DEBUG] Replica set initialized successfully.
2024-12-25T22:54:45.239 INFO [start_clp] Created results_cache indices.

Changed the result_cache's host and port config in clp-package/etc/clp-config.yml to be localhost:27017. clp-package/sbin/start-clp.sh and observed below logs:

2024-12-25T22:57:01.151 INFO [start_clp] Creating results_cache indices...
2024-12-26 03:57:01,619 [DEBUG] Replica set initialization requested for localhost:27017
2024-12-26 03:57:02,122 [DEBUG] Replica set is already initialized but reports invalid config.
2024-12-26 03:57:02,122 [DEBUG] Initializing replica set at localhost:27017
2024-12-26 03:57:02,125 [DEBUG] Replica set initialized successfully.
2024-12-25T22:57:02.894 INFO [start_clp] Created results_cache indices.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1208964046828607

Summary by CodeRabbit

Configuration Changes
- Updated MongoDB configuration to support replica sets
- Removed explicit IP binding settings
- Added replica set name configuration
MongoDB Management
- Introduced new functions for managing MongoDB replica set initialization and status checking
Deployment Updates
- Modified container startup scripts for results cache and reducer
- Updated network configuration settings for containers

…en starting the package.

coderabbitai · 2024-12-11T14:38:47Z

Warning

Rate limit exceeded

@junhaoliao has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 17 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 69ea477 and 7dc8c55.

📒 Files selected for processing (2)

components/clp-package-utils/clp_package_utils/scripts/start_clp.py (3 hunks)
components/clp-py-utils/clp_py_utils/initialize-results-cache.py (1 hunks)

Walkthrough

The pull request introduces modifications to support MongoDB replica set initialization in a Python utility script and updates the MongoDB configuration file. The changes enable automatic replica set configuration during the index creation process for a stream collection. The script now checks the replica set status and initializes it if not already configured, ensuring proper MongoDB replication setup.

Changes

File	Change Summary
`components/clp-py-utils/clp_py_utils/create-results-cache-indices.py`	Added functions `check_replica_set_status()`, `init_replica_set_for_oplog()`, and `init_replica_set_for_oplog_if_needed()` for MongoDB replica set management. Updated `main()` to call `init_replica_set_for_oplog_if_needed()` after establishing a connection. Imported `OperationFailure` and `urlparse`.
`components/package-template/src/etc/mongo/mongod.conf`	Removed `net.bindIp` parameter and added `replication.replSetName` set to "rs0" to enable replica set support.
`components/clp-package-utils/clp_package_utils/scripts/start_clp.py`	Modified `start_results_cache` and `start_reducer` functions to directly append network settings (`--bind_ip` and `--port`) in the MongoDB container start command. Added `--network` option for the reducer container.

Sequence Diagram

sequenceDiagram
    participant Script as Create Results Cache Indices Script
    participant MongoDB as MongoDB Client
    
    Script->>MongoDB: Establish Connection
    Script->>MongoDB: Check Replica Set Status
    alt Replica Set Not Initialized
        Script->>MongoDB: Initialize Replica Set
        MongoDB-->>Script: Replica Set Configured
    else Replica Set Already Initialized
        Script-->>MongoDB: Continue Processing
    end
    Script->>MongoDB: Create Indexes

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

junhaoliao · 2024-12-26T03:10:40Z

An issue is found when in clp-config.yml I configure a non-27017 port on host localhost for the replica set:

pymongo.errors.OperationFailure: No host described in new configuration with {version: 36921, term: -1} for replica set rs0 maps to this node, full error: {'ok': 0.0, 'errmsg': 'No host described in new configuration with {version: 36921, term: -1} for replica set rs0 maps to this node', 'code': 74, 'codeName': 'NodeNotFound'}

However, such issue is only observed on localhost hosts. If the host is not localhost / 127.0.0.1 (e.g., my WSL's virtual eth0 interface gets assigned 172.22.78.194), with any port the replica set initialization would succeed and there is no issue connecting to the clp-configured mongo address.

By looking into the community Mongo server source code, it was found they have two ways to check whether a node is up before the server configures itself as a replica node: isSelfFastPath and isSelfSlowPath.

isSelfFastPath is purely logic based as it tries to check identity of the server's address with a given config's host and port. It first checks if the port matches then it checks for host.
isSelfSlowPath handles the case that the fast path cannot find a match of the server's address in any of the configs. It tries to make a connection with the given config host and port and issue a command to check for identity.

Why `localhost`: `any non-27017 port` fails

Since our mongo server instance runs within a Docker container with only the docker's localhost's port 27017 mapped to the Docker host, if we request replica set to be configured on localhost:27017, the fast path check should succeed; if we request for localhost: any non-27017 port, the fast path wouldn't find a match because the server is started with 27017, and the slow path wouldn't find a match because in the Docker container's own network, the Docker host's port (configured by clp-config and used in the replica set config) is not reachable.

Why `any non-localhost host`: `any port` works (e.g., `172.22.78.194:27018`)

The fast path would fail but the slow path would succeed as Docker knows non-localhost is outside of its network and Docker routes the connection to the Docker host.

Proposed solution

Create a docker network and put all services (docker containers) that use Mongo into the same network.
docker run the result cache with --network "host" so that the mongo Docker container sees / exposes everything on the Docker host.

Considering existing CLP utilities, Option 2. seems the quickest way. @haiqi96 may comment on the impacts on CLP cloud.

junhaoliao · 2024-12-26T03:16:47Z

You may also see from the {version: 36921, term: -1} in the error message that they may have some uninitialized memory issue, lol

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (2)

19-33: Consider parametrising the replica set host and port.

Although this initialisation flow works well for the default port (27017) on localhost, it may not be sufficient if the user wants to set up a replica set on a non-standard port or host. Passing in the host and port as parameters will improve flexibility and will align with the PR’s objective of making replica set configuration more adaptable.

47-49: Avoid creating two separate MongoClient connections.

Re-using the same MongoClient after invoking initialize_replica_set can be more efficient and simpler. If you do require a new client connection for index creation, consider adding a short comment explaining why a fresh connection is needed, so it is clear to maintainers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd8fc1 and fb4516e.

📒 Files selected for processing (2)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (3 hunks)
components/package-template/src/etc/mongo/mongod.conf (1 hunks)

🔇 Additional comments (2)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (1)

6-6: Good use of the OperationFailure import.

This import from pymongo.errors is essential for handling replica set status checks and ensures that any non-initialised replica set is properly caught and configured.

components/package-template/src/etc/mongo/mongod.conf (1)

3-4: Replica set configuration approved.

Enabling the replica set by specifying replSetName is a straightforward and effective strategy. If you later find a need to run multiple replica sets on the same machine, you may consider making rs0 a configurable parameter in the environment or in a generated config file.

…arting `mongo`.

haiqi96 · 2025-01-09T15:55:44Z

Quick question: for Result cache host switching validation, I assumed you also ensured the search in webui is still working with different IPs and ports?

haiqi96 · 2025-01-09T16:48:22Z

Discussed with Junhao offline and clarified the two points:

The current replica set only adds one member, so it doesn't actually provide redundancy.
The purpose of replica set is not support redundancy, but to enable oplog that supports watch(), which replaces polling to check updates in the collection.

Imo, we should make these two points clear in the code. We have agreed on renaming some replica related functions and adding docstrings so developer who only reads code can also have the context above.

haiqi96 · 2025-01-09T16:49:15Z

components/clp-package-utils/clp_package_utils/scripts/start_clp.py

@@ -491,6 +491,7 @@ def start_results_cache(instance_id: str, clp_config: CLPConfig, conf_dir: pathl
    cmd = [
        "docker", "run",
        "-d",
+        "--network", "host",


Just curious, do you know exactly why we need --network host? is there any IP other than results_cache.port that will be accessed?

Using --network host with MongoDB in Docker ensures proper replica set configuration because Docker's network containerization interferes with MongoDB's isSelf checks when using non-localhost-&-27017 ports. By exposing the host network to the MongoDB container, we can avoid this issue.

More details of the investigation can be found at #632 (comment) Let me know if I can elaborate on any part

junhaoliao · 2025-01-11T03:13:37Z

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py

shall we also consider renaming this script as initialize-results-cache.py to better reflect its updated purpose?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (3)
1-1: Consider renaming the script to better reflect its purpose.

As suggested in the previous review, renaming this script to initialize-results-cache.py would better represent its expanded functionality, which now includes both index creation and replica set initialization.

20-48: Well-structured replica set status checking with comprehensive error handling.

The function properly handles different MongoDB error codes and provides appropriate logging for each case:

Code 94: Not yet initialized

Code 93: Invalid replica set configuration

Other codes: Re-raised for proper error propagation

Consider adding docstring documentation:
 def check_replica_set_status(client, netloc):
+    """
+    Check the status of MongoDB replica set.
+    
+    Args:
+        client: MongoDB client instance
+        netloc: Network location string (host:port)
+    
+    Returns:
+        tuple: (is_already_initialized, should_configure_replica_set)
+        
+    Raises:
+        OperationFailure: For unexpected MongoDB operation errors
+    """
51-69: Robust replica set configuration implementation.

The function handles both initial configuration and reconfiguration scenarios appropriately. The use of force=True for reconfiguration is a good choice to avoid version management complexity.

Consider adding a warning log before force reconfiguration:
     if is_already_initialized:
+        logger.warning("Force reconfiguring existing replica set")
         # Use `force=True` so that we do not have to increment the version
         client.admin.command("replSetReconfig", config, force=True)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb4516e and 66e6b93.

📒 Files selected for processing (3)

components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2 hunks)
components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (3 hunks)
components/package-template/src/etc/mongo/mongod.conf (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint-check (ubuntu-latest)
GitHub Check: lint-check (macos-latest)

🔇 Additional comments (4)

components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)

496-496: Network configuration change aligns with MongoDB replica set requirements.

The addition of --network host is necessary for proper replica set initialization in Docker, as explained in the previous discussion. This allows MongoDB to perform its isSelf checks correctly.

508-511: Direct port and IP binding replaces Docker port mapping.

The change from Docker port mapping to direct MongoDB configuration using --bind_ip and --port is more suitable for replica set configuration. This approach:

Avoids network translation issues that could interfere with replica set communication

Provides clearer network configuration that matches the replica set configuration
components/package-template/src/etc/mongo/mongod.conf (1)

1-2: Replica set configuration is properly defined.

The configuration correctly sets up the replica set name as "rs0", which matches the configuration used in the Python initialization code.

Let's verify that this replica set name is consistently used across the codebase:

✅ Verification successful

Replica set configuration is consistent across the codebase

The replica set name "rs0" is correctly configured in mongod.conf and properly referenced in the initialization script. The configuration follows MongoDB best practices with explicit host specifications.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for "rs0" references to ensure consistency
rg "rs0" --type yaml --type python
Length of output: 65

Script:
#!/bin/bash
# Search for "rs0" across all files
rg "rs0"

# Find MongoDB-related Python files
fd -e py -e yaml -e conf -e config | rg -i "mongo"
Length of output: 279

Script:
#!/bin/bash
# Check the context of rs0 usage in the indices script
rg -A 5 -B 5 "rs0" components/clp-py-utils/clp_py_utils/create-results-cache-indices.py
Length of output: 462
components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (1)

97-98: Ensure replica set is configured before creating indices.

The code correctly uses directConnection=True for the initial connection to avoid replica set connection attempts before configuration. However, there's a potential race condition between the two client connections.

Consider adding a retry mechanism for the second connection:
         with MongoClient(results_cache_uri, directConnection=True) as results_cache_client:
             configure_replica_set_if_needed(results_cache_client, results_cache_uri)
+            
+        # Wait for replica set to become available
+        max_retries = 30
+        retry_interval = 1
+        for attempt in range(max_retries):
+            try:
+                with MongoClient(results_cache_uri, serverSelectionTimeoutMS=1000) as client:
+                    client.admin.command('ping')
+                    break
+            except Exception as e:
+                if attempt == max_retries - 1:
+                    raise
+                time.sleep(retry_interval)
✅ Verification successful

Add retry mechanism for MongoDB replica set initialization

The review comment is correct. After configuring a replica set, MongoDB needs time to elect a primary and stabilize. The suggested retry mechanism will help handle the transition period properly and prevent connection failures.
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found

Script:
#!/bin/bash
# Search for configure_replica_set_if_needed implementation
ast-grep --pattern 'def configure_replica_set_if_needed($$$):'

# Look for MongoDB connection patterns and retry mechanisms
rg -A 10 "MongoClient\(" --type py

# Search for retry-related patterns
rg "retry|attempt|max_retries" --type py
Length of output: 6544

Script:
#!/bin/bash
# Search for configure_replica_set_if_needed implementation with content
rg -A 10 "configure_replica_set_if_needed" --type py
Length of output: 2583

…improve readability.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (3)

20-64: Consider adding more detailed logging for replica set status changes.

The logging could be more consistent. Consider adding INFO level logs for important state changes, while keeping DEBUG level for detailed operations.

-            logger.debug("Replica set is already initialized at %s", existing_netloc)
+            logger.info("Replica set is already initialized at %s", existing_netloc)

94-115: Add validation for MongoDB URI components.

The URI validation could be more thorough to prevent issues with invalid configurations.

     parsed_uri = urlparse(uri)
     netloc = parsed_uri.netloc
-    if 0 == len(netloc):
+    if not netloc:
         raise ValueError("Invalid URI: %s", uri)
+    
+    # Validate URI components
+    if not parsed_uri.hostname:
+        raise ValueError("Missing hostname in URI: %s" % uri)
+    if not parsed_uri.port:
+        raise ValueError("Missing port in URI: %s" % uri)

129-131: Consider consolidating MongoDB connections and improving error handling.

The code uses two separate client connections: one for replica set initialization and another for index creation. This could be confusing and potentially inefficient.

-        with MongoClient(results_cache_uri, directConnection=True) as results_cache_client:
-            init_replica_set_for_oplog_if_needed(results_cache_client, results_cache_uri)
-
-        with MongoClient(results_cache_uri) as results_cache_client:
+        # Initialize with direct connection for replica set setup
+        results_cache_client = MongoClient(results_cache_uri, directConnection=True)
+        try:
+            init_replica_set_for_oplog_if_needed(results_cache_client, results_cache_uri)
+            # Reconnect to use replica set
+            results_cache_client = MongoClient(results_cache_uri)
             stream_collection = results_cache_client.get_default_database()[stream_collection_name]
+        except OperationFailure as e:
+            logger.error("Failed to initialize replica set: %s", str(e))
+            raise
+        finally:
+            results_cache_client.close()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66e6b93 and 69ea477.

📒 Files selected for processing (1)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: lint-check (ubuntu-latest)
GitHub Check: lint-check (macos-latest)

🔇 Additional comments (3)

components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (3)

4-7: LGTM! The new imports support the replica set functionality.

The added imports are appropriate for URI parsing and MongoDB error handling.

1-1: Consider renaming the script to better reflect its purpose.

As suggested in previous reviews, the script now handles both index creation and replica set initialization. Consider renaming it to initialize-results-cache.py to better reflect its expanded purpose.

66-92: 🛠️ Refactor suggestion

Add safety checks and clarify single-node nature in function name.

The function name doesn't indicate it's specifically for single-node setup

Using force=True in replSetReconfig could be dangerous without additional checks
-def init_replica_set_for_oplog(client: MongoClient, netloc: str, force: bool):
+def init_single_node_replica_set(client: MongoClient, netloc: str, force: bool):
     """
-    Initializes or reconfigures a single-node MongoDB replica set for enabling the oplog.
+    Initializes or reconfigures a single-node MongoDB replica set.
+    
+    WARNING: This function is specifically designed for single-node replica sets.
+    Using force=True in reconfiguration can be dangerous in multi-node setups.

     :param client:
     :param netloc: The network location of the MongoDB instance to configure
     as a replica set member.
     :param force: Whether this is a reconfiguration operation.
     """
+    if not netloc:
+        raise ValueError("netloc cannot be empty")
Likely invalid or redundant comment.

junhaoliao · 2025-01-11T18:31:07Z

@haiqi96 I added docstrings for the newly added functions in the init script, and renamed the methods & parameters & the init script name, which hopefully has clarified the purpose of those methods. Please help taking a look again.

components/clp-py-utils/clp_py_utils/initialize-results-cache.py

haiqi96

Left a few comments on the docstrings but in general the change looks reasonable to me.

haiqi96 · 2025-01-13T15:36:41Z

components/clp-py-utils/clp_py_utils/initialize-results-cache.py

+def check_replica_set_status(client: MongoClient, netloc: str) -> tuple[bool, bool]:
+    """
+    Checks the current replica set status of the MongoDB server and determines whether it needs to
+    be configured or reconfigured.


components/clp-py-utils/clp_py_utils/initialize-results-cache.py

Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>

components/clp-py-utils/clp_py_utils/initialize-results-cache.py

haiqi96

Looks good to me, assuming it is tested and verified.

Enable MongoDB replica set for the results cache and initialize it wh…

fb4516e

…en starting the package.

junhaoliao changed the title ~~WIP - Enable MongoDB replica set for the results cache and initialize it when starting the package.~~ feat(package): Enable MongoDB replica set for the results cache and initialize it when starting the package. Dec 26, 2024

junhaoliao changed the title ~~feat(package): Enable MongoDB replica set for the results cache and initialize it when starting the package.~~ feat(package): Enable MongoDB replica set for the results cache and configure it when starting the package. Dec 26, 2024

junhaoliao marked this pull request as ready for review December 26, 2024 03:59

junhaoliao requested a review from haiqi96 December 26, 2024 03:59

junhaoliao changed the title ~~feat(package): Enable MongoDB replica set for the results cache and configure it when starting the package.~~ feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. Dec 26, 2024

coderabbitai bot reviewed Dec 26, 2024

View reviewed changes

junhaoliao added 3 commits December 25, 2024 23:02

Enable host network mode and specify --bind_ip and --port when st…

4eaa025

…arting `mongo`.

Add new replica set configuration logic.

aa859e4

Add back empty line.

7298026

haiqi96 reviewed Jan 9, 2025

View reviewed changes

junhaoliao commented Jan 11, 2025

View reviewed changes

Merge branch 'main' into results-cache-replica-set

66e6b93

coderabbitai bot reviewed Jan 11, 2025

View reviewed changes

Add docs; Rename functions & parameters to clarify their purpose and …

69ea477

…improve readability.

coderabbitai bot reviewed Jan 11, 2025

View reviewed changes

junhaoliao added 2 commits January 11, 2025 13:27

Rename results cache initialization script.

ab31bee

Docs nit.

f5ee187

junhaoliao requested a review from haiqi96 January 11, 2025 18:29

haiqi96 reviewed Jan 13, 2025

View reviewed changes

components/clp-py-utils/clp_py_utils/initialize-results-cache.py Outdated Show resolved Hide resolved

haiqi96 reviewed Jan 13, 2025

View reviewed changes

Docs - Apply suggestions from code review

d1a6a9c

Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>

junhaoliao commented Jan 18, 2025

View reviewed changes

components/clp-py-utils/clp_py_utils/initialize-results-cache.py Outdated Show resolved Hide resolved

Put "or reconfigured" into parentheses in docs.

46a6a56

Docs - Apply suggestions from code review

7dc8c55

junhaoliao requested a review from haiqi96 January 19, 2025 21:06

haiqi96 approved these changes Jan 19, 2025

View reviewed changes

junhaoliao merged commit 5857bcf into y-scope:main Jan 20, 2025
5 checks passed

junhaoliao mentioned this pull request Jan 24, 2025

CLP package cannot start on Ubuntu Focal due to initialize-results-cache.py. #693

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. #632

feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. #632

junhaoliao commented Dec 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

junhaoliao commented Dec 26, 2024 •

edited

Loading

junhaoliao commented Dec 26, 2024

coderabbitai bot left a comment

haiqi96 commented Jan 9, 2025

haiqi96 commented Jan 9, 2025

haiqi96 Jan 9, 2025

junhaoliao Jan 9, 2025

junhaoliao Jan 11, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

junhaoliao commented Jan 11, 2025

haiqi96 left a comment

haiqi96 Jan 13, 2025

haiqi96 left a comment

feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. #632

feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. #632

Conversation

junhaoliao commented Dec 11, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Search query performance compassion

Result cache host switching

Summary by CodeRabbit

coderabbitai bot commented Dec 11, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

junhaoliao commented Dec 26, 2024 • edited Loading

Why localhost: any non-27017 port fails

Why any non-localhost host: any port works (e.g., 172.22.78.194:27018)

Proposed solution

junhaoliao commented Dec 26, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

haiqi96 commented Jan 9, 2025

haiqi96 commented Jan 9, 2025

haiqi96 Jan 9, 2025

Choose a reason for hiding this comment

junhaoliao Jan 9, 2025

Choose a reason for hiding this comment

junhaoliao Jan 11, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

junhaoliao commented Jan 11, 2025

haiqi96 left a comment

Choose a reason for hiding this comment

haiqi96 Jan 13, 2025

Choose a reason for hiding this comment

haiqi96 left a comment

Choose a reason for hiding this comment

junhaoliao commented Dec 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 11, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

junhaoliao commented Dec 26, 2024 •

edited

Loading

Why `localhost`: `any non-27017 port` fails

Why `any non-localhost host`: `any port` works (e.g., `172.22.78.194:27018`)