failed assertion in common-channel.c:705: `!channel->sent_close' #321

rsflo · 2024-08-30T09:09:09Z

Hi,

dropbear is used on the server side to control an application that allows to transfers files.
Sporadically the session breaks entirely as part of an automated test that continuously fetches files from the server.

The mentioned assert is triggered in dropbear-2024.85 server on linux/x86_64 (linux-6.1.86 / glibc-2.38) and is no longer reproducible when commits a7ef149 and 8e6f73e are reverted (related to issue #85).
Issue first occurred after having updated to 2022.83.
2020.81 (and likely earlier versions) do not show this behavior.

Server is invoked using a systemd socket unit and the call
/usr/sbin/dropbear -vvvv -i -r /etc/dropbear/dropbear_rsa_host_key -w -W 1048576

stderr is redirected to a file:
dropbear-2024.85_assert.log

It seems a "channel" is tried to be used, which has just been closed.

The used build of dropbear is based on Yocto, which slightly patches the source (these changes are assumed to be not relevant).
As much as I understand the upstream default build options are used, localoptions.h contains
#define DEBUG_TRACE 5
only.

The remote client triggering the assert is based on libssh and intends to copy a couple of files. Most of the files are a few hundred bytes in size, largest is about 100K. It usually happens during a sequence of smaller files.
(Yet there is little information as to how this is implemented, I could dig into it if it helps to understand the general flow. I mostly have a proprietary CLI tool that allows to initiate the copy operation.)

The text was updated successfully, but these errors were encountered:

mkj · 2024-09-05T21:31:32Z

Thanks for the debug log. It looks like something is going wrong when a channel is opened around the same time as another is being closed (which should work fine). I'll try and reproduce it here to debug, if you have any details of the libssh call sequence that would help.

mkj · 2024-09-06T15:53:33Z

if you have any details of the libssh call sequence that would help.

I've reproduced it here so don't need any more details.

rsflo · 2024-09-09T06:35:19Z

I've reproduced it here so don't need any more details.

Great, thanks for looking into it!
Please let me know if you need anything from my side.

attila-lendvai · 2024-09-30T08:47:42Z

FTR, i'm also seeing this while using guix deply. it seems to be consistently reproducible, it breaks every time.

the guix bug report: https://issues.guix.gnu.org/73306

mkj · 2024-09-30T09:03:29Z

I've figured what's going on and have a fix here, will push it after a bit of cleaning up later this week.

If check_close() ran prior to a server channel exec/shell request, it would send a close immediately. This fix changes it to exclude write_fd==FD_UNINIT from being closed there. When a channel was closed by the time shell/exec request was received, then data sent hits an assertion. This fixes #321 on Github. The "pid == 0" check was initially added to avoid waiting to close a channel when a process has never been launched (which is correct), but that isn't correct in the case of the closed-fd test. Fixes: 8e6f73e ("- Remove "flushing" handling for exited processes)

mkj · 2024-10-04T15:21:26Z

I think #326 should fix this, I haven't yet added the testcase for it.

rsflo · 2024-10-04T20:41:46Z

I think #326 should fix this, I haven't yet added the testcase for it.

Good news! I'm going to apply this on top of 2024.85 next week and will retry it based on our use case.

rsflo · 2024-10-07T19:33:29Z

I think #326 should fix this, I haven't yet added the testcase for it.

Good news! I'm going to apply this on top of 2024.85 next week and will retry it based on our use case.

The change appears to be good.
Ran the mentioned test again and manually stopped it after about 5400 iterations with no error.

This reproduces the problem reported in github #321 asyncssh is used to drive the connection for this test.

If check_close() ran prior to a server channel exec/shell request, it would send a close immediately. This fix changes it to exclude write_fd==FD_UNINIT from being closed there. When a channel was closed by the time shell/exec request was received, then data sent hits an assertion. This fixes #321 on Github. The "pid == 0" check was initially added to avoid waiting to close a channel when a process has never been launched (which is correct), but that isn't correct in the case of the closed-fd test. Fixes: 8e6f73e ("- Remove "flushing" handling for exited processes)

This reproduces the problem reported in github #321 asyncssh is used to drive the connection for this test.

rsflo · 2024-10-23T16:18:24Z

@mkj I'm going to suggest this fix for Yocto, trying to also get it into its current LTS version, which is still using dropbear-2082.83. Is there anything wrong with applying it to that version?
The patch applies mostly clean on top of 2082.83 and our test case does not show any issues.

mkj · 2024-10-24T01:25:52Z

@mkj I'm going to suggest this fix for Yocto, trying to also get it into its current LTS version, which is still using dropbear-2082.83. Is there anything wrong with applying it to that version?

It should be right to apply for 2082.83

rsflo · 2024-11-11T07:56:02Z

@mkj I'm going to suggest this fix for Yocto, trying to also get it into its current LTS version, which is still using dropbear-2082.83. Is there anything wrong with applying it to that version?

It should be right to apply for 2082.83

Sorry for the typo, of course I meant 2022.83.

mkj added bug regression labels Sep 5, 2024

mkj mentioned this issue Oct 4, 2024

Don't close channels when a PID hasn't started #326

Merged

mkj added a commit that referenced this issue Oct 21, 2024

test: Test for concurrent channel open/close

9ba833a

This reproduces the problem reported in github #321 asyncssh is used to drive the connection for this test.

mkj mentioned this issue Oct 21, 2024

test: Test for concurrent channel open/close #327

Merged

mkj added a commit that referenced this issue Oct 21, 2024

test: Test for concurrent channel open/close

432d5f3

This reproduces the problem reported in github #321 asyncssh is used to drive the connection for this test.

mkj closed this as completed in #326 Oct 21, 2024

mkj added a commit that referenced this issue Oct 21, 2024

test: Test for concurrent channel open/close

3a93feb

This reproduces the problem reported in github #321 asyncssh is used to drive the connection for this test.

mkj added a commit that referenced this issue Oct 21, 2024

test: Test for concurrent channel open/close

b2a2d9e

This reproduces the problem reported in github #321 asyncssh is used to drive the connection for this test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed assertion in common-channel.c:705: `!channel->sent_close' #321

failed assertion in common-channel.c:705: `!channel->sent_close' #321

rsflo commented Aug 30, 2024

mkj commented Sep 5, 2024

mkj commented Sep 6, 2024

rsflo commented Sep 9, 2024

attila-lendvai commented Sep 30, 2024

mkj commented Sep 30, 2024

mkj commented Oct 4, 2024

rsflo commented Oct 4, 2024

rsflo commented Oct 7, 2024

rsflo commented Oct 23, 2024

mkj commented Oct 24, 2024

rsflo commented Nov 11, 2024

failed assertion in common-channel.c:705: `!channel->sent_close' #321

failed assertion in common-channel.c:705: `!channel->sent_close' #321

Comments

rsflo commented Aug 30, 2024

mkj commented Sep 5, 2024

mkj commented Sep 6, 2024

rsflo commented Sep 9, 2024

attila-lendvai commented Sep 30, 2024

mkj commented Sep 30, 2024

mkj commented Oct 4, 2024

rsflo commented Oct 4, 2024

rsflo commented Oct 7, 2024

rsflo commented Oct 23, 2024

mkj commented Oct 24, 2024

rsflo commented Nov 11, 2024