Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on RPi 4B + WiFi on Console #490

Closed
davefilip opened this issue Oct 21, 2024 · 32 comments
Closed

Crash on RPi 4B + WiFi on Console #490

davefilip opened this issue Oct 21, 2024 · 32 comments

Comments

@davefilip
Copy link

davefilip commented Oct 21, 2024

Don't see this crash with same build on 1B+Ethernet, 2B+Ethernet, 3B+Ethernet, 3B+WiFi, or 4B+Ethernet, but can reproduce fairly easily on 4B+WiFi.

It could be that it is just more easily reproducible on 4B+WiFi do to timing issues, as I usually don't test more than +/- 10 mins on the other hardware configurations, and can usually reproduce within 1 - 2 mins on 4B+WiFi.

I should note that when it does crash, it is not consistently on the same command, but usually a command that touches the network (in this example getting an HTTP URL, but sometimes a ping, sometimes executing a command on another Circle node, all of which are commands that even on 4B+WiFi "console" will work one more times, but then eventually cause a crash).

If I stay off the console and send commands over the network (using a telnet server I've written), I have left it running > 15 hours and sent dozens of similar commands over the network (telnet) and not had a crash.

@davefilip
Copy link
Author

CircleCrash

@davefilip
Copy link
Author

kernel8-rpi4.lst.zip

@rsta2
Copy link
Owner

rsta2 commented Oct 21, 2024 via email

@davefilip
Copy link
Author

davefilip commented Oct 21, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 21, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 21, 2024 via email

@davefilip
Copy link
Author

davefilip commented Oct 21, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 21, 2024 via email

@davefilip
Copy link
Author

davefilip commented Oct 21, 2024 via email

@davefilip
Copy link
Author

davefilip commented Oct 21, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 21, 2024

Dave, from the synchronous exception info and your kernel8-rpi4.lst I wasn't able to find the reason for the issue so far, it may be a side effect. I modified the sample mbedtls/06-webclient in the circle-stdlib project by a simple command interpreter, which reads a command (currently only "list" or "reboot") from an USB keyboard and displays the RPi revision list or reboots.

With this I found a problem in CMachineInfo, which can be fixed with this patch, but I'm afraid that's not the reason for the issue, you have reported:

diff --git a/lib/machineinfo.cpp b/lib/machineinfo.cpp
index 744a150f..66fbe8ec 100644
--- a/lib/machineinfo.cpp
+++ b/lib/machineinfo.cpp
@@ -308,13 +308,13 @@ CMachineInfo::~CMachineInfo (void)
 {
        m_MachineModel = MachineModelUnknown;
 
+       if (s_pThis == this)
+       {
 #if RASPPI >= 4
-       delete m_pDTB;
-       m_pDTB = 0;
+               delete m_pDTB;
+               m_pDTB = 0;
 #endif
 
-       if (s_pThis == this)
-       {
                s_pThis = 0;
        }
 }

But perhaps you will try your program with this patch. With it I can enter "list" several times without a problem.


For example, I currently have absolutely no interest in the RPi 5. To be honest, I think the RPi Foundation jumped the shark on that one, because for me, the idea of a single board computer like a RPi is something that is cheap, low power, low heat, and solidly reliable. Not adding more and more powerful more cost so that it basically becomes a desktop computer. To me it is ridiculous that the RPi 5 now requires 3000 ma, and throws off so much heat that there are now liquid cooled heat sinks for it!

I think, the RPi 5 is good for very CPU time consuming applications. It has much more CPU power than the RPi 4. So for example audio synthesizers or AI apps can profit from it. But yes, for many other apps the older RPi models should be sufficient. I have have a small heat sink on my RPi 5, but usually use it without a fan. I don't think, that it draws 3A all the time. This is required, when you have some USB devices connected.

@davefilip
Copy link
Author

davefilip commented Oct 21, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 22, 2024 via email

@davefilip
Copy link
Author

davefilip commented Oct 22, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 22, 2024

Dave, so what to do? When there were that difficult (possibly side effect) problems in the past, it was possible to reproduce the issue in a small program, which I was able to debug here locally. We did this years ago, when there were a problem with the spin lock (EnterCritical()) implementation, also difficult to find. How it seems, this is not possible in our case. I'm afraid the reason cannot be found without JTAG debugging.

Would it be possible, that you send your kernel8-rpi4.exe file to me (e.g. per Email, my address is on GitHub), so that I can debug it. I need the .exe file for GDB, not the .img file. I don't know, if it is difficult to configure, so that I am able to enter some commands to trigger the exception? I think I can debug this without the sources. At the moment I do not have an other idea.

Thanks,

Rene

@davefilip
Copy link
Author

davefilip commented Oct 22, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 22, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 31, 2024

Dave, I hope you enjoyed your trip to NYC and you are back again in the meantime. We spoke about the possibility to JTAG debug your application here on my machine to be able to solve this issue. I want to ask for your status regarding this. I would need your kernel8-rpi4.elf file and maybe a few (?) configuration files.

Thanks,

Rene

@davefilip
Copy link
Author

davefilip commented Oct 31, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Oct 31, 2024 via email

@davefilip
Copy link
Author

davefilip commented Nov 3, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Nov 4, 2024

Dave, thanks for the update and for making the build process cleaner. I'm looking forward to get access to your program and I hope, I can find the reason for the synchronous exception.

Rene

@davefilip
Copy link
Author

davefilip commented Nov 5, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Nov 5, 2024 via email

@davefilip
Copy link
Author

davefilip commented Nov 5, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Nov 5, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Nov 7, 2024

For the record: We were not able to solve this issue so far. While @davefilip can reproduce the exception, I was not able to do this on my local RPi 4B. Therefore I'm also not able to JTAG debug the issue to find the reason.

@rsta2
Copy link
Owner

rsta2 commented Dec 15, 2024

Dave, is this issue also fixed now?

@davefilip
Copy link
Author

davefilip commented Dec 15, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Dec 15, 2024 via email

@davefilip
Copy link
Author

davefilip commented Dec 15, 2024 via email

@davefilip
Copy link
Author

Resolved by increasing the kernel stack size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants