Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: further documentation on xdmod_plugin usage, core_hours and service units #626

Open
ds-04 opened this issue Oct 22, 2024 · 3 comments
Labels
documentation documentation to be updated enhancement Improvement to existing feature

Comments

@ds-04
Copy link
Contributor

ds-04 commented Oct 22, 2024

Description

It would be very useful to have some further information on the xdmod_plugin, core hours (displayed in coldfront on the allocation) and service units.

At the moment im testing Coldfront using two systems (VM1 coldfront+slurm master/client and VM2 xdmod).

  • I've run some jobs through a test slurm instance and with the account created by an allocation (and updated within slurm)
  • This has been shredded and ingested in xdmod - I can see the graphs
  • I'm executing the xdmod_plugin (for total_cpu_hours) but am not seeing anything happening in CF dashboard.
  • At the moment the updating of the core_hours shown on an allocation isn't clear to me.
  • I cant see anything in the documentation about service units within CF. Is this something sites have developed completely themselves?

Component

Other

Additional information

No response

@ds-04 ds-04 added documentation documentation to be updated enhancement Improvement to existing feature labels Oct 22, 2024
@ds-04
Copy link
Contributor Author

ds-04 commented Oct 24, 2024

For coldfront xdmod_usage to work... e.g. coldfront xdmod_usage -sx -m total_cpu_hours

It appears that in CF django admin a Resource Attribute Type needs added:

Name Attribute type name
xdmod_resource Text

Then the Resource (in this case a slurm cluster) needs this Resource attribute type added

Resource attribute type Value
xdmod_resource slurm_cluster_as_named_in_slurm

Now the syncer appears to be connecting (in my case had to deal with self signed test cert on the test xdmod instance I'm running - python requests) ... graphs not updated in CF yet, because it can't find the data xdmod end, but working on that

@ds-04
Copy link
Contributor Author

ds-04 commented Oct 25, 2024

Further notes:

  • Xdmod self signed cert in testing environ needs to be appended into the Coldfront Certifi cert bundle, in Coldfront venv, so the coldfront xdmod_usage syncer can trust the XMOD url
  • Need to ensure XDMOD is using the Slurm account - you must use "pi_column": "account_name" in the the resources.json - https://open.xdmod.org/11.0/configuration.html

@ds-04
Copy link
Contributor Author

ds-04 commented Nov 7, 2024

Created PR #633 to add a setup section to the xdmod plugin README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation documentation to be updated enhancement Improvement to existing feature
Projects
None yet
Development

No branches or pull requests

1 participant