Skip to content

madesroches/micromegas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Micromegas - Scalable Observability

Crates.io Apache licensed Build Status

rust API documentation

python API

grafana plugin

design presentation

unreal observability

Objectives

  • Unified observability: logs, metrics and traces in the same database.

  • Spend less time reproducing problems

    • Collect enough data to understand how to correct the problems.

    • Quantify the frequency and severity of the issues instead of debugging the first one you can reproduce.

  • Achieve better quality: monitor & catch problems before they get noticed by users.

Design Strategies

Low overhead instrumentation

20 ns / event in the calling thread, one additional thread for the preparation and upload to the server.

High frequency of events

Up to 100000 events / second for a single instrumented process.

Scalability of ingestion service

Scalable backend can accept data from millions of concurrent instrumented processes.

Tail sampling & ETL on demand

In order to keep costs down, most payloads will remain unprocessed until they expire.

Query using SQL

Status

Soon

  • Migration of Python API to use FlightSQL

Decembre 2024

Novembre 2024

Released version 0.2.1

  • FlightSQL support
  • Measures and log entries can now be tagged with properties
    • Not yet available in SQL queries

October 2024

Released version 0.2.0

Septembre 2024

Released version 0.1.9

  • Updating global views every second
  • Caching metadata (processes, streams & blocks) in the lakehouse & allow sql queries on them

August 2024

Released version 0.1.7

  • New global materialized views for logs & metrics of all processes
  • New daemon service to keep the views updated as data is ingested
  • New analytics API based on SQL powered by Apache Datafusion

July 2024

Released version 0.1.5

Unreal

  • Better reliability, retrying failed http requests
  • Spike detection

Maintenance

  • Delete old blocks, streams & processes using cron task

June 2024

Released version 0.1.4

Good enough for dogfooding :)

Unreal

  • Metrics publisher
  • FName scopes

Analytics

  • Metric queries
  • Convert cpu traces in perfetto format

May 2024

Released version 0.1.3

Better unreal engine instrumentation

  • new protocol
  • http request callbacks no longer binded to the main thread
  • custom authentication of requests

Analytics

  • query process metadata
  • query spans of a thread

April 2024

Telemetry ingestion from rust & unreal are working :)

Released version 0.1.1

Not actually useful yet, I need to bring back the analytics service to a working state.

January 2024

Starting anew. I'm extracting the tracing/telemetry/analytics code from https://github.com/legion-labs/legion to jumpstart the new project. If you are interested in collaborating, please reach out.