Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could ConnectionInstanceNotFound in NetworkReceiverModel be reported as an ERS error? #48

Open
bieryAtFnal opened this issue Mar 1, 2023 · 4 comments
Assignees

Comments

@bieryAtFnal
Copy link
Contributor

I noticed that the error messages associated with this line were not being noticed by the log file checking in our integtests.

The reason was that the severity of the message is "LOG" based on the use of TLOG().

Could this line be changed to report an ers::error?

@eflumerf
Copy link
Member

eflumerf commented Mar 1, 2023

Interesting...I recently changed that from an error to a log since it was occurring in places where there wasn't actually a problem...

(dbt) [eflumerf@ironvirt7 network]$ git show 19576e85
commit 19576e8565c9a9cbf78418112c2892dc224d0aa4
Author: Eric Flumerfelt <eflumerf@fnal.gov>
Date:   Thu Feb 9 08:12:08 2023 -0600

    Change error message to log in try_receive

diff --git a/include/iomanager/network/NetworkReceiverModel.hpp b/include/iomanager/network/NetworkReceiverModel.hpp
index 359534a..584818d 100644
--- a/include/iomanager/network/NetworkReceiverModel.hpp
+++ b/include/iomanager/network/NetworkReceiverModel.hpp
@@ -149,7 +149,7 @@ private:
     std::lock_guard<std::mutex> lk(m_receive_mutex);
     get_receiver(timeout);
     if (m_network_receiver_ptr == nullptr) {
-      ers::error(ConnectionInstanceNotFound(ERS_HERE, this->id().uid));
+      TLOG() << ConnectionInstanceNotFound(ERS_HERE, this->id().uid);
       return std::nullopt;
     }

@eflumerf
Copy link
Member

eflumerf commented Mar 1, 2023

At a guess, I would say that it should be a log for "connect"-type endpoints (Senders and Publishers), and an error for "bind"-type...which means additional logic should be added somewhere to distinguish those cases in NetworkReceiverModel and NetworkSenderModel (Line 137 is currently a TLOG as well)...

@bieryAtFnal
Copy link
Contributor Author

In the particular scenario that I saw, the problem occurred when I tried to create a receiver with an empty string for the connection ID. Of course, it's true that this was a bug in my code, but it would have been nice for the error to be caught by the integrationtest log checking.

Here is the TLOG output, in case it is of any use...

log_dqmrulocalhost0_4337.txt:2023-Mar-01 15:12:29,364 LOG [typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type dunedaq::iomanager::NetworkReceiverModel<Datatype>::try_read_network(const dunedaq::iomanager::Receiver::timeout_t&) [with MessageType = dunedaq::dfmessages::TRMonRequest; Datatype = dunedaq::dfmessages::TRMonRequest; typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type = std::optional<dunedaq::dfmessages::TRMonRequest>; dunedaq::iomanager::Receiver::timeout_t = std::chrono::duration<long int, std::ratio<1, 1000> >] at /cvmfs/dunedaq-development.opensciencegrid.org/nightly/N23-02-28/spack-0.18.1-gcc-12.1.0/spack-0.18.1/opt/spack/gcc-12.1.0/iomanager-N23-02-28-eqhipmafjmtaoxfrpm3v2uw5mzf3urql/include/iomanager/network/NetworkReceiverModel.hpp:152] 2023-Mar-01 15:12:29,364 ERROR [typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type dunedaq::iomanager::NetworkReceiverModel<Datatype>::try_read_network(const dunedaq::iomanager::Receiver::timeout_t&) [with MessageType = dunedaq::dfmessages::TRMonRequest; Datatype = dunedaq::dfmessages::TRMonRequest; typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type = std::optional<dunedaq::dfmessages::TRMonRequest>; dunedaq::iomanager::Receiver::timeout_t = std::chrono::duration<long int, std::ratio<1, 1000> >] at /cvmfs/dunedaq-development.opensciencegrid.org/nightly/N23-02-28/spack-0.18.1-gcc-12.1.0/spack-0.18.1/opt/spack/gcc-12.1.0/iomanager-N23-02-28-eqhipmafjmtaoxfrpm3v2uw5mzf3urql/include/iomanager/network/NetworkReceiverModel.hpp:152] Connection Instance not found for name

@ArturSztuc
Copy link

@eflumerf & @bieryAtFnal I'm sorry for reviving this. We see these logs in MLT and HSI application, when listening to TimeSync events but nothing generating the TimeSync events:

2024-Nov-14 17:23:08,655 LOG [typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type dunedaq::iomanager::NetworkReceiverModel<Datatype>::try_read_network(const dunedaq::iomanager::Receiver::timeout_t&) [with MessageType = dunedaq::dfmessages::TimeSync; Datatype = dunedaq::dfmessages::TimeSync; typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type = std::optional<dunedaq::dfmessages::TimeSync>; dunedaq::iomanager::Receiver::timeout_t = std::chrono::duration<long int, std::ratio<1, 1000> >] at /cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_241114_A9/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-12.1.0/iomanager-NB_DEV_241114_A9-7lmprkuiivlfqc5a33adkxo62k2uwxmr/include/iomanager/network/detail/NetworkReceiverModel.hxx:145] 2024-Nov-14 17:23:08,655 ERROR [typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type dunedaq::iomanager::NetworkReceiverModel<Datatype>::try_read_network(const dunedaq::iomanager::Receiver::timeout_t&) [with MessageType = dunedaq::dfmessages::TimeSync; Datatype = dunedaq::dfmessages::TimeSync; typename std::enable_if<dunedaq::serialization::is_serializable<MessageType>::value, std::optional<_Up> >::type = std::optional<dunedaq::dfmessages::TimeSync>; dunedaq::iomanager::Receiver::timeout_t = std::chrono::duration<long int, std::ratio<1, 1000> >] at /cvmfs/dunedaq-development.opensciencegrid.org/nightly/NB_DEV_241114_A9/spack-0.22.0/opt/spack/linux-almalinux9-x86_64/gcc-12.1.0/iomanager-NB_DEV_241114_A9-7lmprkuiivlfqc5a33adkxo62k2uwxmr/include/iomanager/network/detail/NetworkReceiverModel.hxx:145] Connection Instance not found for name time_sync_.*

The logs get into tens of megabytes after just a few minutes. This can be replicated in v5 by setting all the:
generate_timesync in DataHandlerConf to 0, and keep the timestamp_method in RandomTCMakerConf as default (kTimeSync).

Network connection to TimeSync will be made, but nothing is sending them - so the if statement HERE is begin triggered.

This is connected to the issue here: DUNE-DAQ/trigger#354, listed by @roland-sipos.

Is there an alternative way way of registering callback function/handler to network connection that might not have any receivers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants