Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpMon for drunc #214

Open
5 tasks
PawelPlesniak opened this issue Aug 19, 2024 · 0 comments
Open
5 tasks

OpMon for drunc #214

PawelPlesniak opened this issue Aug 19, 2024 · 0 comments

Comments

@PawelPlesniak
Copy link
Contributor

PawelPlesniak commented Aug 19, 2024

Here is a minimal list of metrics we need from drunc. Basically it corresponds to the overall quantities that are reported by the main dashboard.

  • State of the system (FSM state)
  • Global error state
  • Run Time
  • Run Number
  • Configuration used

Once drunc has a complete interface with opmon, every drunc instance should publish metric related to its operations. A naive list that is supposed to be a suggestion to be elaborated and expanded by expert is

  • status (of the controllers rather than the FSM state)
  • Number of message exchanged, including its success rate, message sizes, etc
  • Number of applications alive (both daq app and drunc app)
  • number of rebooted applications (very long term concept, available once we have some recovery capability in place)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant