Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wanyun su/threading child look up #325

Merged
merged 5 commits into from
Dec 11, 2024
Merged

Conversation

wanyunSu
Copy link
Contributor

@wanyunSu wanyunSu commented Dec 9, 2024

threading children look up, using progress_bar flag for rich.progress

@wanyunSu wanyunSu requested a review from plasorak December 9, 2024 15:52
@plasorak
Copy link
Collaborator

I am getting a lot of these errors when I try to run integration tests:

 129   │ Exception in thread Thread-1 (process_application):
 130   │ Traceback (most recent call last):
 131   │   File "/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.1/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-12.1.0/python-3.10.10-gcsatsf5lmzrhmprzux7uv67w2omc7e3/lib/python3.10/thread
       │ ing.py", line 1016, in _bootstrap_inner
 132   │     self.run()
 133   │   File "/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.1/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-12.1.0/python-3.10.10-gcsatsf5lmzrhmprzux7uv67w2omc7e3/lib/python3.10/thread
       │ ing.py", line 953, in run
 134   │     self._target(*self._args, **self._kwargs)
 135   │   File "/nfs/home/plasorak/NFD_DEV_241210_A9/drunc/src/drunc/controller/configuration.py", line 104, in process_application
 136   │     new_node = ChildNode.get_child(
 137   │   File "/nfs/home/plasorak/NFD_DEV_241210_A9/drunc/src/drunc/controller/children_interface/child_node.py", line 139, in get_child
 138   │     return RESTAPIChildNode(
 139   │   File "/nfs/home/plasorak/NFD_DEV_241210_A9/drunc/src/drunc/controller/children_interface/rest_api_child.py", line 385, in __init__
 140   │     self.response_listener.register(self.name, self.commander)
 141   │   File "/nfs/home/plasorak/NFD_DEV_241210_A9/drunc/src/drunc/controller/children_interface/rest_api_child.py", line 140, in register
 142   │     if app in cls.handlers:
 143   │ AttributeError: type object 'ResponseListener' has no attribute 'handlers'

Copy link
Collaborator

@plasorak plasorak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as expected now: If I prevent the starting of 2 applications on the same segment (say the mlt and the tc-maker-1) "manually" (by hardcoding the process manager driver to ignore them, not by OKS-disabling them) I get:

  • with develop, the system starts shortly after 2 min
  • with this branch, the system starts shortly after 1 min.

This is expected behaviour: the trigger controller has a timeout for starting of 2 min in the root controller (because the root controller has children and grand children, so 1 min + 1 min). This assumes that all the children of the trigger would start in 1 min, however with a sequential lookup, this timeout should be 1 min x number of children. So with this, we can have many applications that fail to start on the same segment.

@plasorak plasorak merged commit 87571b9 into develop Dec 11, 2024
1 check passed
@plasorak plasorak deleted the wanyunSu/ThreadingChildLookUp branch December 11, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants