Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem observed in ru-controller when running minimum_system_quick_test #268

Closed
bieryAtFnal opened this issue Oct 16, 2024 · 2 comments · Fixed by #273
Closed

Problem observed in ru-controller when running minimum_system_quick_test #268

bieryAtFnal opened this issue Oct 16, 2024 · 2 comments · Fixed by #273

Comments

@bieryAtFnal
Copy link
Contributor

           INFO     core.py:182     FSM:    Post transition:                                                                                          
           INFO     controller.py:142       Controller:     'ru-det-conn-0@131.225.193.20:5501' (type ControlType.REST_API)                           
           INFO     rest_api_child.py:509   ru-det-conn-0-rest-api-child:   Ignoring command 'take_control' sent to 'ru-det-conn-0'                   
           INFO     broadcast_sender.py:65  Broadcast:      ready                                                                                     
           INFO     controller.py:57        controller_cli: 'ru-controller' was started on '5500'                                                     
           INFO     controller.py:280       Controller:     Registering ru-controller to the connectivity service at grpc://131.225.193.20:5500       
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/nfs/biery/dunedaq/16OctFDDevPostMergeTest/.venv/lib/python3.10/site-pa │
│ ckages/drunc/broadcast/server/decorators.py:30 in wrap                       │
│                                                                              │
│    27 │   │                                                                  │
│    28 │   │   try:                                                           │
│    29 │   │   │   log.debug('Executing wrapped function')                    │
│ ❱  30 │   │   │   ret = cmd(obj, request) # we strip the context here, no ne │
│    31 │   │   except Exception as e:                                         │
│    32 │   │   │   stack = traceback.format_exc().split("\n")                 │
│    33                                                                        │
│                                                                              │
│ /home/nfs/biery/dunedaq/16OctFDDevPostMergeTest/.venv/lib/python3.10/site-pa │
│ ckages/drunc/authoriser/decorators.py:34 in check_token                      │
│                                                                              │
│   31 │   │   │   │   #     drunc_system = obj.name,                          │
│   32 │   │   │   │   # )                                                     │
│   33 │   │   │   log.debug('Executing wrapped function')                     │
│ ❱ 34 │   │   │   ret = cmd(obj, request)                                     │
│   35 │   │   │   log.debug('Exiting')                                        │
│   36 │   │   │   return ret                                                  │
│   37 │   │   return check_token                                              │
│                                                                              │
│ /home/nfs/biery/dunedaq/16OctFDDevPostMergeTest/.venv/lib/python3.10/site-pa │
│ ckages/drunc/controller/decorators.py:11 in wrap                             │
│                                                                              │
│    8 │   │   if not obj.actor.token_is_current_actor(request.token):         │
│    9 │   │   │   from druncschema.request_response_pb2 import Response, Resp │
│   10 │   │   │   from druncschema.generic_pb2 import PlainText               │
│ ❱ 11 │   │   │   return Response(                                            │
│   12 │   │   │   │   name = obj.name,                                        │
│   13 │   │   │   │   token = request.token,                                  │
│   14 │   │   │   │   data = PlainText(                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: Parameter to MergeFrom() must be instance of same class: expected 
<class 'google.protobuf.any_pb2.Any'> got <class 
'druncschema.generic_pb2.PlainText'>.
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Handling signal: hup
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Hang up: Master
Received 1
Requested termination
[2024-10-16 14:16:41 -0500] [2224161] [WARNING] Worker with pid 2224185 was terminated due to signal 1
[2024-10-16 14:16:41 -0500] [2225657] [INFO] Booting worker with pid: 2225657
[14:16:41] INFO     controller.py:315       Controller:     Unregistering from the connectivity service                                               
           INFO     controller.py:324       Controller:     Stopping children                                                                         
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Handling signal: term
[2024-10-16 14:16:41 -0500] [2225657] [INFO] Worker exiting (pid: 2225657)
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Shutting down: Master
           INFO     flask_manager.py:193    response-listener-flaskmanager-flaskmanager:    response-listener-flaskmanager-flaskmanager terminated    

@bieryAtFnal
Copy link
Contributor Author

I only see this occasionally, and I haven't been able to reproduce it on np04 computers in the last couple of hours, but it happens fairly reliably when I run daqsystemtest_integtest_bundle.sh -l 0 -N 5 --stop-on-fail on daq.fnal.gov.

@plasorak
Copy link
Collaborator

I don't know if #273 fixed the issue (in fact, I don't know what did), but I just ran with develop of drunc on daq.fnal.gov with NFD_DEV_241023_A9and got:

(dbt) daq:~/NFD_DEV_241023_A9 > daqsystemtest_integtest_bundle.sh -f 0 -l 0 -N 10 --stop-on-failure
...
===== Running minimal_system_quick_test.py
============================== 3 passed in 42.29s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 40.09s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 40.33s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 42.00s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 40.44s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 39.95s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 40.41s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 39.96s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 40.31s ==============================
===== Running minimal_system_quick_test.py
============================== 3 passed in 39.84s ==============================

So I'm going to assume this indeed is fixed. The branch plasorak/command_lock can possibly be taken as a starting point if this turns out to be a problem again (this branch leads to the same successful result when the command above is ran).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants