Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed assertion in common-channel.c:705: `!channel->sent_close' #321

Closed
rsflo opened this issue Aug 30, 2024 · 11 comments · Fixed by #326
Closed

failed assertion in common-channel.c:705: `!channel->sent_close' #321

rsflo opened this issue Aug 30, 2024 · 11 comments · Fixed by #326

Comments

@rsflo
Copy link

rsflo commented Aug 30, 2024

Hi,

dropbear is used on the server side to control an application that allows to transfers files.
Sporadically the session breaks entirely as part of an automated test that continuously fetches files from the server.

The mentioned assert is triggered in dropbear-2024.85 server on linux/x86_64 (linux-6.1.86 / glibc-2.38) and is no longer reproducible when commits a7ef149 and 8e6f73e are reverted (related to issue #85).
Issue first occurred after having updated to 2022.83.
2020.81 (and likely earlier versions) do not show this behavior.

Server is invoked using a systemd socket unit and the call
/usr/sbin/dropbear -vvvv -i -r /etc/dropbear/dropbear_rsa_host_key -w -W 1048576

stderr is redirected to a file:
dropbear-2024.85_assert.log

It seems a "channel" is tried to be used, which has just been closed.

The used build of dropbear is based on Yocto, which slightly patches the source (these changes are assumed to be not relevant).
As much as I understand the upstream default build options are used, localoptions.h contains
#define DEBUG_TRACE 5
only.

The remote client triggering the assert is based on libssh and intends to copy a couple of files. Most of the files are a few hundred bytes in size, largest is about 100K. It usually happens during a sequence of smaller files.
(Yet there is little information as to how this is implemented, I could dig into it if it helps to understand the general flow. I mostly have a proprietary CLI tool that allows to initiate the copy operation.)

@mkj
Copy link
Owner

mkj commented Sep 5, 2024

Thanks for the debug log. It looks like something is going wrong when a channel is opened around the same time as another is being closed (which should work fine). I'll try and reproduce it here to debug, if you have any details of the libssh call sequence that would help.

@mkj
Copy link
Owner

mkj commented Sep 6, 2024

if you have any details of the libssh call sequence that would help.

I've reproduced it here so don't need any more details.

@rsflo
Copy link
Author

rsflo commented Sep 9, 2024

I've reproduced it here so don't need any more details.

Great, thanks for looking into it!
Please let me know if you need anything from my side.

@attila-lendvai
Copy link

FTR, i'm also seeing this while using guix deply. it seems to be consistently reproducible, it breaks every time.

the guix bug report: https://issues.guix.gnu.org/73306

@mkj
Copy link
Owner

mkj commented Sep 30, 2024

I've figured what's going on and have a fix here, will push it after a bit of cleaning up later this week.

mkj added a commit that referenced this issue Oct 4, 2024
If check_close() ran prior to a server channel exec/shell
request, it would send a close immediately.
This fix changes it to exclude write_fd==FD_UNINIT from
being closed there.

When a channel was closed by the time shell/exec request
was received, then data sent hits an assertion.
This fixes #321 on Github.

The "pid == 0" check was initially added to avoid waiting
to close a channel when a process has never been launched
(which is correct), but that isn't correct in the case
of the closed-fd test.

Fixes: 8e6f73e ("- Remove "flushing" handling for exited processes)
@mkj
Copy link
Owner

mkj commented Oct 4, 2024

I think #326 should fix this, I haven't yet added the testcase for it.

@rsflo
Copy link
Author

rsflo commented Oct 4, 2024

I think #326 should fix this, I haven't yet added the testcase for it.

Good news! I'm going to apply this on top of 2024.85 next week and will retry it based on our use case.

@rsflo
Copy link
Author

rsflo commented Oct 7, 2024

I think #326 should fix this, I haven't yet added the testcase for it.

Good news! I'm going to apply this on top of 2024.85 next week and will retry it based on our use case.

The change appears to be good.
Ran the mentioned test again and manually stopped it after about 5400 iterations with no error.

mkj added a commit that referenced this issue Oct 21, 2024
This reproduces the problem reported in github #321

asyncssh is used to drive the connection for this test.
mkj added a commit that referenced this issue Oct 21, 2024
This reproduces the problem reported in github #321

asyncssh is used to drive the connection for this test.
@mkj mkj closed this as completed in #326 Oct 21, 2024
mkj added a commit that referenced this issue Oct 21, 2024
If check_close() ran prior to a server channel exec/shell
request, it would send a close immediately.
This fix changes it to exclude write_fd==FD_UNINIT from
being closed there.

When a channel was closed by the time shell/exec request
was received, then data sent hits an assertion.
This fixes #321 on Github.

The "pid == 0" check was initially added to avoid waiting
to close a channel when a process has never been launched
(which is correct), but that isn't correct in the case
of the closed-fd test.

Fixes: 8e6f73e ("- Remove "flushing" handling for exited processes)
mkj added a commit that referenced this issue Oct 21, 2024
This reproduces the problem reported in github #321

asyncssh is used to drive the connection for this test.
mkj added a commit that referenced this issue Oct 21, 2024
This reproduces the problem reported in github #321

asyncssh is used to drive the connection for this test.
@rsflo
Copy link
Author

rsflo commented Oct 23, 2024

@mkj I'm going to suggest this fix for Yocto, trying to also get it into its current LTS version, which is still using dropbear-2082.83. Is there anything wrong with applying it to that version?
The patch applies mostly clean on top of 2082.83 and our test case does not show any issues.

@mkj
Copy link
Owner

mkj commented Oct 24, 2024

@mkj I'm going to suggest this fix for Yocto, trying to also get it into its current LTS version, which is still using dropbear-2082.83. Is there anything wrong with applying it to that version?

It should be right to apply for 2082.83

@rsflo
Copy link
Author

rsflo commented Nov 11, 2024

@mkj I'm going to suggest this fix for Yocto, trying to also get it into its current LTS version, which is still using dropbear-2082.83. Is there anything wrong with applying it to that version?

It should be right to apply for 2082.83

Sorry for the typo, of course I meant 2022.83.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants