Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark performance for xAPI over Redis bus to Ralph #203

Closed
Tracked by #195
bmtcril opened this issue Mar 7, 2024 · 4 comments
Closed
Tracked by #195

Benchmark performance for xAPI over Redis bus to Ralph #203

bmtcril opened this issue Mar 7, 2024 · 4 comments

Comments

@bmtcril
Copy link
Contributor

bmtcril commented Mar 7, 2024

Tests for various Redis bus configurations.

Remaining tests:

  • Multiple workers with batching
@bmtcril
Copy link
Contributor Author

bmtcril commented Apr 4, 2024

Test 1 (74ef02) - 1k rows, no batching

Test system configuration:

  • Tutor version: tutor, version 17.0.2-nightly
  • Aspects version: 0.91.0
  • Environment specifications:
  • Relevant settings
    • RUN_CLICKHOUSE: true
    • RUN_KAFKA_SERVER: false
    • RUN_RALPH: true
    • RUN_SUPERSET: true
    • RUN_VECTOR: false
    • EVENT_ROUTING_BACKEND_BATCHING_ENABLED: False
    • EVENT_BUS_BACKEND: redis

Load generation specifications:

  • Tool - platform-plugin-aspects load_test_tracking_events management command
  • Exact scripts
    • tutor local run lms ./manage.py lms monitor_load_test_tracking --sleep_time 5 --backend redis_bus
    • tutor local run cms ./manage.py cms load_test_tracking_events --num_events 1000 --sleep_time 0 --tags redis 1k local novector nobatch

Data captured for results:

  • Length of run
    • Event generation: 0:00:27.412890
    • Monitoring / total run length: 0:07:05.685837
  • Sleep time: 0
  • Events: 1000
  • Raw stats attached: 74ef02_stats.txt

Findings:

  • The consumer could not keep up, ClickHouse was 390 seconds behind on a 420 second run at the end
  • Redis stream queue grew to 941 pending events before the generation stopped and then took 400 seconds to catch up
  • Inserted rows per second: 2.4 - 2.6

@bmtcril
Copy link
Contributor Author

bmtcril commented Apr 4, 2024

Test 2 (98daac) - 1k rows, batch size 10

Test system configuration:

  • Tutor version: tutor, version 17.0.2-nightly
  • Aspects version: 0.91.0
  • Environment specifications:
  • Relevant settings
    • RUN_CLICKHOUSE: true
    • RUN_KAFKA_SERVER: false
    • RUN_RALPH: true
    • RUN_SUPERSET: true
    • RUN_VECTOR: false
    • EVENT_BUS_BACKEND: redis
    • EVENT_ROUTING_BACKEND_BATCH_SIZE: 10
    • EVENT_ROUTING_BACKEND_BATCHING_ENABLED: True

Load generation specifications:

  • Tool - platform-plugin-aspects load_test_tracking_events management command
  • Exact scripts
    • tutor local run lms ./manage.py lms monitor_load_test_tracking --sleep_time 5 --backend redis_bus
    • tutor local run cms ./manage.py cms load_test_tracking_events --num_events 1000 --sleep_time 0 --tags redis 1k local novector batch10

Data captured for results:

  • Length of run
    • Event generation: 0:00:57.751662
    • Monitoring / total run length: 0:01:09.289254
  • Sleep time: 0
  • Events: 1000
  • Raw stats attached: 98daac_stats.txt

Findings:

  • The consumer was able keep up, ClickHouse was never more than 1 second behind on a 55 second run
  • Maximum redis stream queue size was 2
  • Inserted rows per second grew to 24 by the end of the test, which is in line with the sustained 10k row test below

@bmtcril
Copy link
Contributor Author

bmtcril commented Apr 4, 2024

Test 3 (64dec5) - 10k rows, batch size 100

Test system configuration:

  • Tutor version: tutor, version 17.0.2-nightly
  • Aspects version: 0.91.0
  • Environment specifications:
  • Relevant settings
    • RUN_CLICKHOUSE: true
    • RUN_KAFKA_SERVER: false
    • RUN_RALPH: true
    • RUN_SUPERSET: true
    • RUN_VECTOR: false
    • EVENT_BUS_BACKEND: redis
    • EVENT_ROUTING_BACKEND_BATCH_SIZE: 100
    • EVENT_ROUTING_BACKEND_BATCHING_ENABLED: True

Load generation specifications:

  • Tool - platform-plugin-aspects load_test_tracking_events management command
  • Exact scripts
    • tutor local run lms ./manage.py lms monitor_load_test_tracking --sleep_time 5 --backend redis_bus
    • tutor local run cms ./manage.py cms load_test_tracking_events --num_events 10000 --sleep_time 0 --tags redis 10k local novector batch100

Data captured for results:

  • Length of run
    • Event generation: 0:03:02.900450
    • Monitoring / total run length: 0:03:17.300679
  • Sleep time: 0
  • Events: 10,000
  • Raw stats attached:
    64dec5_stats.txt

Findings:

  • The consumer was able keep up, ClickHouse was never more than 5 seconds behind, usually 2-4.
  • Maximum redis stream queue size was 83, which is to be expected on a batch size of 100.
  • Inserted rows per second moved between 40-60, with a high of 80, pretty well in line with the ~55 events per second being generated

@bmtcril
Copy link
Contributor Author

bmtcril commented Aug 6, 2024

What we've found in #202 and earlier tests is that insert performance exceeds our ability to generate events up to that ~55 / sec line. Once we have more production information from partners we can determine if we should test to a higher threshold, but for now I'm closing these tasks out.

@bmtcril bmtcril closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant