Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wb | bench storage #5704

Merged
merged 10 commits into from
Mar 15, 2024
Merged

wb | bench storage #5704

merged 10 commits into from
Mar 15, 2024

Conversation

fmaste
Copy link
Contributor

@fmaste fmaste commented Mar 5, 2024

Description

  • Remove Cardano World QA profiles
    • QA class machines of Cardano World Nomad cluster are gone, support for it is removed along with any sensible defaults that were added to avoid deployment errors when both cluster (CW and the P&T exclusive one) were available.
    • Changes were made having in mind support for sporadic clusters to bench different hardware configurations.
  • Add ssh utilities for perf Nomad client nodes
    • Commands were renamed to resolve the confusion it makes using Nomad "client" and Nomad "node" interchangeably. Nomad CLI uses nomad nodes to query the agents but we are trying to use "node" for a Cardano cluster node and "client" for a Nomad agent running as client (a client used to run Cardano nodes =) ).
    • Subcommands added/renamed: wb nomad clients ready|machines|ssh
    • wb nomad clients ready: Returns JSON array with "id", "name", "datacenter" and "ip" of all SRE-managed P&T exclusive Nomad clients available (status=ready).
    • wb nomad clients machines: Like ready but adds EC2 machines info for fingerprinting, to be used to ensure cloud runs are reproducible.
    • wb nomad clients ssh (all|producers|CLIENT_NAME): self explanatory, producers is without the explorer node.
  • Check available storage space before starting a cloud run
    • We defined the minimum we want available for the "value" profile.
  • Replace idle-nomadperf with fast-nomadperf variants
    • A profile that tests the whole cycle.
  • Better error reporting when incompatible genesis file format
    • With genesis files around 500 mega that have an incompatible format the node fails further in time and the workbench was assuming it started without errors, showing a misleading message later when the healthcheck failed.
    • The time needed for supervisord to consider a node as "running" now depends on the number of UTxO generated.
  • Fix cloud runs that were not fetching the entrypoints' stdout and stderr
    • This only happened with cloud runs that had finished without problems.
  • Add latency service and profile for the 52 perf nodes
    • A profile were every nodes makes multiple pings with different sizes to its producers.
    • The output format is not defined and there's no automatic parsing of the results.
  • Use the regions defined in the profile with no assumptions.
    • Removed validateNodeSpecs.
    • Only differentiate between region "loopback" and the others.
  • Add support for "perf-ssd"
    • Using Nomad namespace and class abstractions we can support both types of machines.
    • Now the profile includes things that were part of the workbench, like the Nomad address, machine types, if we can download the logs using ssh, etc.
    • Made NOMAD_CLIENTS (pinning) work with any profile
    • Commands like wb clients now use the profile to build the Nomad server queries.
  • Small changes

Checklist

  • Commit sequence broadly makes sense and commits have useful messages
  • New tests are added if needed and existing tests are updated. These may include:
    • golden tests
    • property tests
    • roundtrip tests
    • integration tests
      See Runnings tests for more details
  • Any changes are noted in the CHANGELOG.md for affected package
  • The version bounds in .cabal files are updated
  • CI passes. See note on CI. The following CI checks are required:
    • Code is linted with hlint. See .github/workflows/check-hlint.yml to get the hlint version
    • Code is formatted with stylish-haskell. See .github/workflows/stylish-haskell.yml to get the stylish-haskell version
    • Code builds on Linux, MacOS and Windows for ghc-8.10.7 and ghc-9.2.7
  • Self-reviewed the diff

Note on CI

If your PR is from a fork, the necessary CI jobs won't trigger automatically for security reasons.
You will need to get someone with write privileges. Please contact IOG node developers to do this
for you.

@fmaste fmaste force-pushed the bench-storage branch 11 times, most recently from cc628bb to a2dd838 Compare March 9, 2024 14:09
@fmaste fmaste changed the title Bench storage wb | Bench storage Mar 9, 2024
@fmaste fmaste changed the title wb | Bench storage wb | bench storage Mar 9, 2024
@fmaste fmaste marked this pull request as ready for review March 11, 2024 15:18
@fmaste fmaste requested review from a team as code owners March 11, 2024 15:18
@fmaste fmaste force-pushed the bench-storage branch 12 times, most recently from 8269957 to 2343d54 Compare March 14, 2024 17:15
Copy link
Contributor

@mgmeier mgmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, @fmaste , thank you very much.

@fmaste fmaste added this pull request to the merge queue Mar 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 15, 2024
@fmaste fmaste enabled auto-merge March 15, 2024 16:13
@fmaste fmaste added this pull request to the merge queue Mar 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Mar 15, 2024
@fmaste fmaste added this pull request to the merge queue Mar 15, 2024
Merged via the queue into master with commit e72aee4 Mar 15, 2024
22 checks passed
@fmaste fmaste deleted the bench-storage branch March 15, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants