Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoll_fd set CLOEXEC? #2369

Closed
kylemilz opened this issue Aug 31, 2023 · 15 comments
Closed

epoll_fd set CLOEXEC? #2369

kylemilz opened this issue Aug 31, 2023 · 15 comments
Assignees

Comments

@kylemilz
Copy link

Hi,

We are using Mongoose 7.11 on ARM using GCC, and epoll.

We have an application that does many exec(3). I noticed that after a while, the program runs out of file descriptors. I have tracked the descriptor leak down to the mongoose mgr->epoll_fd. It looks like it's being created without the EPOLL_CLOEXEC flag.

I don't think this is desired behavior, and indeed some of the other file descriptors created by mongoose are setting CLOEXEC, notably in mg_set_non_blocking_mode(). What do you think of this diff?

diff --git a/src/net.c b/src/net.c
index 45b60904..879f22ac 100644
--- a/src/net.c
+++ b/src/net.c
@@ -238,7 +238,7 @@ void mg_mgr_free(struct mg_mgr *mgr) {
 void mg_mgr_init(struct mg_mgr *mgr) {
   memset(mgr, 0, sizeof(*mgr));
 #if MG_ENABLE_EPOLL
-  if ((mgr->epoll_fd = epoll_create1(0)) < 0) MG_ERROR(("epoll: %d", errno));
+  if ((mgr->epoll_fd = epoll_create1(EPOLL_CLOEXEC)) < 0) MG_ERROR(("epoll: %d", errno));
 #else
   mgr->epoll_fd = -1;
 #endif
@scaprile
Copy link
Collaborator

Can you please honor the issue template and fill it ? "ARM" in my head means a microcontroller but I guess in yours perhaps includes Linux/WIndows/MacOS/other

@kylemilz
Copy link
Author

oops, yeah this is on Linux 5.15.

@scaprile
Copy link
Collaborator

We don't set O_CLOEXEC when we open(), we set FD_CLOEXEC here:

mongoose/mongoose.c

Lines 5742 to 5771 in d5b5cec

static void mg_set_non_blocking_mode(MG_SOCKET_TYPE fd) {
#if defined(MG_CUSTOM_NONBLOCK)
MG_CUSTOM_NONBLOCK(fd);
#elif MG_ARCH == MG_ARCH_WIN32 && MG_ENABLE_WINSOCK
unsigned long on = 1;
ioctlsocket(fd, FIONBIO, &on);
#elif MG_ENABLE_RL
unsigned long on = 1;
ioctlsocket(fd, FIONBIO, &on);
#elif MG_ENABLE_FREERTOS_TCP
const BaseType_t off = 0;
if (setsockopt(fd, 0, FREERTOS_SO_RCVTIMEO, &off, sizeof(off)) != 0) (void) 0;
if (setsockopt(fd, 0, FREERTOS_SO_SNDTIMEO, &off, sizeof(off)) != 0) (void) 0;
#elif MG_ENABLE_LWIP
lwip_fcntl(fd, F_SETFL, O_NONBLOCK);
#elif MG_ARCH == MG_ARCH_AZURERTOS
fcntl(fd, F_SETFL, O_NONBLOCK);
#elif MG_ARCH == MG_ARCH_TIRTOS
int val = 0;
setsockopt(fd, SOL_SOCKET, SO_BLOCKING, &val, sizeof(val));
// SPRU524J section 3.3.3 page 63, SO_SNDLOWAT
int sz = sizeof(val);
getsockopt(fd, SOL_SOCKET, SO_SNDBUF, &val, &sz);
val /= 2; // set send low-water mark at half send buffer size
setsockopt(fd, SOL_SOCKET, SO_SNDLOWAT, &val, sizeof(val));
#else
fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK); // Non-blocking mode
fcntl(fd, F_SETFD, FD_CLOEXEC); // Set close-on-exec
#endif
}

What you post is executed only once when calling mg_mgr_init():

mongoose/mongoose.c

Lines 3806 to 3812 in d5b5cec

void mg_mgr_init(struct mg_mgr *mgr) {
memset(mgr, 0, sizeof(*mgr));
#if MG_ENABLE_EPOLL
if ((mgr->epoll_fd = epoll_create1(0)) < 0) MG_ERROR(("epoll: %d", errno));
#else
mgr->epoll_fd = -1;
#endif

and then for EPOLL, we do this:

mongoose/mongoose.c

Lines 5880 to 5891 in d5b5cec

static void close_conn(struct mg_connection *c) {
if (FD(c) != MG_INVALID_SOCKET) {
#if MG_ENABLE_EPOLL
epoll_ctl(c->mgr->epoll_fd, EPOLL_CTL_DEL, FD(c), NULL);
#endif
closesocket(FD(c));
#if MG_ENABLE_FREERTOS_TCP
FreeRTOS_FD_CLR(c->fd, c->mgr->ss, eSELECT_ALL);
#endif
}
mg_close_conn(c);
}

You know, the issue template exists for a reason, to save yours and our time, so please tell us exactly what are you doing.

WRT

an application that does many exec(3)

my best guess would be:
From the docs

NOTE: Since Mongoose's core is not protected against concurrent accesses, make sure that all mg_* API functions are called from the same thread or RTOS task.

If you absolutely need multi-threading, take the time to go through the user guide (linked above) and tutorials, there is one that shows you how to do multi-threading.

@kylemilz
Copy link
Author

kylemilz commented Aug 31, 2023

sorry, your reply does not make much sense to me, let me attempt to clarify.

given the following test program, which is taken from the Minimal HTTP server example:

#include <err.h>
#include "mongoose.h"


static void
fn(struct mg_connection *c, int ev, void *ev_data, void *fn_data)
{
        if (ev == MG_EV_HTTP_MSG) {
                struct mg_http_message *hm = ev_data;
                if (mg_http_match_uri(hm, "/api/hello")) {
                        mg_http_reply(c, 200, "", "{%m:%d}\n", MG_ESC("status"), 1);
                }
                else {
                        struct mg_http_serve_opts opts = { .root_dir = "." };
                        mg_http_serve_dir(c, hm, &opts);
                }
        }
}

int
main(int argc, char *argv[])
{
        struct mg_mgr mgr;

        mg_mgr_init(&mgr);
        mg_http_listen(&mgr, "http://0.0.0.0:8000", fn, &mgr);

        while (1) {
                mg_mgr_poll(&mgr, 1000);
                if (execl(argv[0], argv[0], NULL) == -1)
                        err(1, "execl(%s)", argv[0]);
        }

        mg_mgr_free(&mgr);

        return 0;
}

i have made a simple modification: every time the while (1) loop runs, we exec(3) a new program. this example is contrived, and not exactly what we are doing, but illustrates the problem in a minimal way.

compile and run via:

~/mongoose$ cc -o exec_test test.c mongoose.c
~/mongoose$ ./exec_test

Now the easiest way to see the problem is to introspect the amount of open file descriptors, one way is via /proc (with ./exec_test running):

~/mongoose$ ps | grep exec_test
3836 zadmin    1680 S    ./exec_test
~/mongoose$ ls -l /proc/3836/fd | wc -l
25
~/mongoose$ ls -l /proc/3836/fd | wc -l
27
~/mongoose$ ls -l /proc/3836/fd | wc -l
31

if i then instrument the mongoose codebase with the following diff:

diff --git a/mongoose.c b/mongoose.c
index 9edde71d..9340e7e4 100644
--- a/mongoose.c
+++ b/mongoose.c
@@ -3807,6 +3807,7 @@ void mg_mgr_init(struct mg_mgr *mgr) {
   memset(mgr, 0, sizeof(*mgr));
 #if MG_ENABLE_EPOLL
   if ((mgr->epoll_fd = epoll_create1(0)) < 0) MG_ERROR(("epoll: %d", errno));
+  fprintf(stderr, ">>> create epoll_fd %d\n", mgr->epoll_fd);
 #else
   mgr->epoll_fd = -1;
 #endif
@@ -5767,6 +5768,7 @@ static void mg_set_non_blocking_mode(MG_SOCKET_TYPE fd) {
 #else
   fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, 0) | O_NONBLOCK);  // Non-blocking mode
   fcntl(fd, F_SETFD, FD_CLOEXEC);                          // Set close-on-exec
+  fprintf(stderr, ">>> set FD_CLOEXEC on %d\n", fd);
 #endif
 }

you can see the output of exec_test is:

sitecntrl:~/mongoose$ ./exec_test
>>> create epoll_fd 3
>>> set FD_CLOEXEC on 4

you can clearly see the fd 3 does not have CLOEXEC set, but 4 does. so i think that what we want is 3 also having CLOEXEC set.

no multithreading here.

@scaprile
Copy link
Collaborator

scaprile commented Sep 2, 2023

I fail to grasp why you might want to do something like that. I don't see a usage model like that in any of our examples nor the user guide, I wouldn't call yours "a simple modification".
In my limited view, that of a microcontroller-oriented old dog, I can only see that every time you replace yourself by a clone of yourself, you run the initialization sequence again without freeing any of the allocated resources. You will not only run out of file descriptors, but you will also run out of memory.
I've assigned this issue to our master mind, let's see what he thinks.

@cpq
Copy link
Member

cpq commented Sep 4, 2023

Thanks @scaprile

@kylemilz , the change looks reasonable. We missed the fact that the epoll descriptors are also inherited.

By the way, we also don't set CLOSEXEC on the static UI files opened by the HTTP server - a standard fopen without e
flag is used. So I guess you can see static UI files could be inherited by a child process.

Quick question. Are your child processes not getting closed?
My understanding is that if child exits, all descriptors are released by the OS. Thus if mongoose (the parent) closes epoll fd in the normal cause of action, and if children get exited, then no resource exhaustion should be observed. Basically, I don't anticipate a parent (mongoose) leaking descriptors, even in the current scenario.
What is you observation?

@cpq
Copy link
Member

cpq commented Sep 5, 2023

@kylemilz appreciate your BSD style in the test program, btw.

#2371 should fix the issue

@gvanem
Copy link
Contributor

gvanem commented Sep 9, 2023

This has broken the Windows version. Dr. Watson kicks in. I fixed it by:

--- a/src/fs_posix.c 2023-09-08 07:15:48
+++ b/src/fs_posix.c 2023-09-09 09:06:12
@@ -166,13 +166,14 @@
 }

 static void *p_open(const char *path, int flags) {
-  const char *mode = flags == MG_FS_READ ? "rbe" : "a+be";  // e for CLOEXEC
 #if MG_ARCH == MG_ARCH_WIN32
+  const char *mode = flags == MG_FS_READ ? "rb" : "a+b";
   wchar_t b1[MG_PATH_MAX], b2[10];
   MultiByteToWideChar(CP_UTF8, 0, path, -1, b1, sizeof(b1) / sizeof(b1[0]));
   MultiByteToWideChar(CP_UTF8, 0, mode, -1, b2, sizeof(b2) / sizeof(b2[0]));
   return (void *) _wfopen(b1, b2);
 #else
+  const char *mode = flags == MG_FS_READ ? "rbe" : "a+be";  // e for CLOEXEC
   return (void *) fopen(path, mode);
 #endif
 }

@cpq
Copy link
Member

cpq commented Sep 9, 2023

Oops. Our CI did not complain about windows tests.
I was under impression that the C lib fopen should ignore unknown flags.

@gvanem could you elaborate a bit more what exactly do you observe? An antivirus pops up?

@gvanem
Copy link
Contributor

gvanem commented Sep 9, 2023

I was under impression that the C lib fopen should ignore unknown flags.

Not here.

An antivirus pops up?

No. My JIT-debugger (WinDbg) jumps in with this call-stack when running e.g. huge-response.exe:

ucrtbase!invoke_watson+0x18
ucrtbase!_invalid_parameter+0x12d
ucrtbase!invalid_parameter_noinfo+0x9
ucrtbase!__acrt_stdio_parse_mode<wchar_t>+0x3aaf4
ucrtbase!common_openfile<wchar_t>+0x22
ucrtbase!common_fsopen<wchar_t>+0x70
huge_response!p_open(char * path = <Value unavailable error>, int flags = 0n1)+0x7b
huge_response!mg_fs_open(struct mg_fs * fs = 0x00007ff7`c5538640, char * path = 0x00000001`000ff3d0 "web_root/index.html", int flags = 0n1)+0x3b
huge_response!mg_http_serve_file(struct mg_connection * c = 0x00000175`57f979f0, struct mg_http_message * hm = 0x00000001`000ff690, char * path = 0x00000001`000ff3d0 "web_root/index.html", struct mg_http_serve_opts * opts = 0x00000001`000ff5d0)+0x104
huge_response!mg_http_serve_dir(struct mg_connection * c = 0x00000175`57f979f0, struct mg_http_message * hm = 0x00000001`000ff690, struct mg_http_serve_opts * opts = 0x00000001`000ff5d0)+0x1ec
huge_response!fn(struct mg_connection * c = 0x00000175`57f979f0, int ev = <Value unavailable error>, void * ev_data = 0x00000001`000ff690, void * fn_data = 0x00007ff7`c5538028)+0x275
huge_response!mg_call(struct mg_connection * c = 0x00000175`57f979f0, int ev = 0n10, void * ev_data = 0x00000001`000ff690)+0x2c
huge_response!http_cb(struct mg_connection * c = 0x00000175`57f979f0, int ev = 0n7, void * evd = 0x00000001`000ff8d0, void * fnd = 0x00000000`00000077)+0x281
huge_response!mg_call(struct mg_connection * c = 0x00000175`57f979f0, int ev = 0n7, void * ev_data = 0x00000001`000ff8d0)+0x49
huge_response!iolog(struct mg_connection * c = 0x00000175`57f979f0, char * buf = 0x00000175`57d4d6b0 "GET / HTTP/1.1..Host: localhost:8000..Connection: keep-alive..sec-ch-ua: "Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116"..sec-ch-ua-mobile: ?0..sec-ch-ua-platform: "Windows"..DNT: 1..Upgrade-Insecure-Requests: 1..User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36..Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7..Sec-Fetch-Site: none..Sec-Fetch-Mode: navigate..Sec-Fetch-User: ?1..Sec-Fetch-Dest: document..Accept-Encoding: gzip, deflate, br..Accept-Language: en-GB,en;q=0.9,nb;q=0.8..If-None-Match: "1656405575.2251"....", long n = 0n709, bool r = true)+0x12f
huge_response!mg_mgr_poll(struct mg_mgr * mgr = 0x00000001`000ff990, int ms = <Value unavailable error>)+0x255
huge_response!main(void)+0xaf
huge_response!invoke_main(void)+0x22
huge_response!__scrt_common_main_seh(void)+0x10c
KERNEL32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

Plain and simply an illegal fopen() mode.

@scaprile
Copy link
Collaborator

scaprile commented Sep 9, 2023

We only test examples for build, and that one is not in the main list so I'm not sure we actually test it. Plus, we are not building with every combination, we either build with POLL or EPOLL. However, we should catch that in the unit tests.
MG_ARCH_UNIX:

mongoose/mongoose.h

Lines 353 to 357 in bd59185

#if !defined(MG_ENABLE_EPOLL) && defined(__linux__)
#define MG_ENABLE_EPOLL 1
#elif !defined(MG_ENABLE_POLL)
#define MG_ENABLE_POLL 1
#endif

So Linux defaults to EPOLL and MacOS defaults to poll()
MG_ARCH_WIN32 has no macros so it defaults to select()

@kylemilz
Copy link
Author

I fail to grasp why you might want to do something like that. I don't see a usage model like that in any of our examples nor the user guide, I wouldn't call yours "a simple modification".

Our use case is quite valid. What we have is a configuration update coming from http, ie we have a user dashboard where they can set some values, then these are sent to mongoose. Upon validation of the configuration, it gets written to a database and then we use exec() to reinitialize the whole process, which reads the new configuration from the database. This is a very simple way to handle configuration changes, and makes sure you are always using the latest configuration. It's a common pattern in Unix.

What happened is we have tests that write many permutations of configuration, and we noticed after a while the tests would fail. The tests failed because the configuration database could not be opened, because the process had too many open file descriptors. That lead us to discover this leak.

In my limited view, that of a microcontroller-oriented old dog, I can only see that every time you replace yourself by a clone of yourself, you run the initialization sequence again without freeing any of the allocated resources. You will not only run out of file descriptors, but you will also run out of memory.

Unix process management is kind of weird. IIRC all other resources except file descriptors get free'd across an exec(). I don't really know why as there's no way to retain the file descriptor number.

@kylemilz
Copy link
Author

Quick question. Are your child processes not getting closed?

We are not doing a fork() then exec() for this use case, so we have no child processes.

My understanding is that if child exits, all descriptors are released by the OS. Thus if mongoose (the parent) closes epoll fd in the normal cause of action, and if children get exited, then no resource exhaustion should be observed. Basically, I don't anticipate a parent (mongoose) leaking descriptors, even in the current scenario.
What is you observation?

I believe your analysis above is correct, but we do not have that situation here.

@kylemilz
Copy link
Author

#2371 should fix the issue

Thanks, presuming this fix is in 7.12 i'll give it a try when it's released.

@scaprile
Copy link
Collaborator

Yes, it will be included in 7.12

This was referenced Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants