-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metricbeat/module/mongodb: Improve logic to calculate oplog info and window #42224
base: main
Are you sure you want to change the base?
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
So, to test the consumption of cpu and memory resources over the changes made to calculate oplog info/ window: Let's consider 3 cases that we want to compare:
We will use the following script to track docker stats i.e., cpu and memory usage over time for the mongodb replicaset setup where 3 nodes are set: #!/bin/bash
COMPOSE_PROJECT_DIR="$1"
OUTPUT_FILE="$2"
INTERVAL=1
# Set default values
if [ -z "$COMPOSE_PROJECT_DIR" ]; then
COMPOSE_PROJECT_DIR="."
fi
if [ -z "$OUTPUT_FILE" ]; then
OUTPUT_FILE="cpu_memory_usage.log"
fi
cd "$COMPOSE_PROJECT_DIR"
# Clear previous logs
> "$OUTPUT_FILE"
while true; do
echo "----------------------------------------"
echo "Docker Container Usage - $(date)"
echo "----------------------------------------"
docker compose ps -q | while read -r container_id; do
container_name=$(docker inspect --format '{{.Name}}' "$container_id" | sed 's/\///')
stats=$(docker stats --no-stream --format "{{.CPUPerc}}\t{{.MemPerc}}" "$container_id")
cpu_usage=$(echo "$stats" | cut -f1)
mem_usage=$(echo "$stats" | cut -f2)
timestamp=$(date +%s)
echo "$timestamp,$container_name,$cpu_usage,$mem_usage" >> "$OUTPUT_FILE"
done
sleep $INTERVAL
done We will use a custom benchmarking suite: func runBenchmarkModeXXX(client *mongo.Client) {
start := time.Now()
iterations := 10000
workers := 10
var wg sync.WaitGroup
errChan := make(chan error, iterations)
batchSize := iterations / workers
for w := 0; w < workers; w++ {
wg.Add(1)
go func() {
defer wg.Add(-1)
for i := 0; i < batchSize; i++ {
if _, err := getReplicationInfoXXX(client); err != nil {
errChan <- fmt.Errorf("iteration failed: %w", err)
}
}
}()
}
go func() {
wg.Wait()
close(errChan)
}()
// Process errors
for err := range errChan {
log.Printf("Error: %v", err)
}
duration := time.Since(start)
fmt.Printf("Benchmark Results:\n")
fmt.Printf("Total iterations: %d\n", iterations)
fmt.Printf("Total time: %v\n", duration)
fmt.Printf("Average time per operation: %v\n", duration/time.Duration(iterations))
fmt.Printf("Operations per second: %.2f\n", float64(iterations)/duration.Seconds())
} To do the benchmarking and observe the usage. I have taken out the logic from all 3 cases and made them a standalone program so that we benchmark the change in isolation. Notice that cpu peaks at 800%+ and memory peaks at 80%+ which is bad and hence we received so many issues around it that claimed that the this calculation is causing memory spikes. This greatly improved but only the memory consumption. The introduction of aggregation pipeline did a good job. Now memory peaks at 1.6% and cpu still peaks at around 800%. And hence, we were not getting issues reported for memory but now we were getting issues reported for CPU spikes. Case 3: Here, the CPU peaks at 50% that too for a short while and memory peaks at 0.36%, a considerable improvement for both. Please also note that, for Case 1 and Case 2 benchmark did not even complete after a minute in my setup but Case 3 took <10s to do the benchmarking and it reported the final stats. From this, we can surely that Case 3 (current PR changes) does massive improvement in calculating the oplog window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one small comment.
Also I'm not sure how i feel about such a verbose code comments (on one hand they are useful, on another hand - they can get out of sync with code very easily in the future which will give a negative effect)
// only need the timestamp (ts) field. FindOne() is used to retrieve a single | ||
// document from the collection (limit: 1). | ||
|
||
ctx := context.TODO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a proper context with timeout ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we don't how much time it can take, it varies cluster to cluster. Although, I asked our users with big production cluster to use this and for them it took just a few seconds whereas the previous one was taking 10 mins sometimes too.
So, that's why did not put a timeout here as I do not have a number. It will vary user to user and how big is their oplog.
// https://www.mongodb.com/docs/manual/reference/method/db.collection.stats/#mongodb-method-db.collection.stats | ||
// or use this: https://github.com/percona/mongodb_exporter/blob/95d1865e34940d0d610bb1fbff9745bc66ddbc73/exporter/collstats_collector.go#L100 | ||
res := db.RunCommand(context.Background(), bson.D{ | ||
{Key: "collStats", Value: oplogCol}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would work for v6.2 onwards as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we need a version based implementation of db.collection.stats() here as mentioned in your comment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it works. Mongosh (mongosh uses the very same thing for oplog window).
@shmsr : since we are updating the mongo driver here, do we need to test other functionalities of the module to ensure the latest driver has no deprecated features that we might be using ? |
Although no. I did test this with MongoDB 4.x as well which is pretty old and it worked. But yes, as we know driver is not the issue for this particular problem we can revert it back too. I updated this because the driver was too old, and 1.17 was latest of 1.x (latest is 2.x). But for sure we cant revert the driver update and take it up as a separate task where we test everything and then upgrade. Wdyt? |
I think that would be a better approach. If this perf gain is not dependent on driver, we can remove that change and have a failsafe approach fopr the whole module. |
Proposed commit message
This change tries to even improve the implementation done here that was made some time back. The previous changes did help a lot but this PR aims to further improve as several users have been reporting high CPU usage. Now we are following some recommended ways used by MongoDB themselves to calculate the oplog window (i.e., lastTs - firstTs of the log). The change now again leverages the natural order of the log along with using Limit (to restrict to just one doc) and Projection. Only expensive process right now when we do sort the the log in reverse i.e.,
$natural: -1
. We also have ugpraded the client library to further reduce any issues in the client side related to query, etc.Please see for a detailed comparison: #42224 (comment). Also, please read the inline comments in the code itself to understand the implemented logic; as I've documented it properly for future reference.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Author's Checklist
How to test this PR locally
Related issues
getOpTimestamp
inreplstatus
to fix sort and temp files generation issue #37688