Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaks networks on every run #1407

Open
elafontaine opened this issue Oct 30, 2024 · 10 comments
Open

Leaks networks on every run #1407

elafontaine opened this issue Oct 30, 2024 · 10 comments
Labels
elaborate Further elaboration is needed

Comments

@elafontaine
Copy link

Minimal .gitlab-ci.yml illustrating the issue

---
docker_build:
  stage: package
  image: docker:latest
  services:
    - docker:dind
  script:
    - echo "blablabla"

Expected behavior
After running, clear the network that was needed for the service and the job container to talk together.

Host information
MacOS
gitlab-ci-local 4.55.0

Containerd binary
docker

Additional context
https://github.com/firecow/gitlab-ci-local/blob/master/src/job.ts#L543 < not tracked for cleanup

@firecow firecow added the bug Something isn't working label Oct 31, 2024
@firecow
Copy link
Owner

firecow commented Oct 31, 2024

I cannot reproduce

image

image

Plus the code you are referencing does in fact illustrate that a serviceNetworkId is stored and used in the cleanup function.

@firecow firecow added elaborate Further elaboration is needed and removed bug Something isn't working labels Oct 31, 2024
@elafontaine
Copy link
Author

Hi @firecow ,

This is the output of my "docker network ls" currently (minus some redacted stuff for my company);

➜   docker network ls
NETWORK ID     NAME                     DRIVER    SCOPE
51fbb22039db   bridge                   bridge    local
9fe84a65a347   docker_gwbridge          bridge    local
6df0aef1c8a5   gitlab-ci-local-130397   bridge    local
6e78377fe61e   gitlab-ci-local-200711   bridge    local
29b6022248e3   gitlab-ci-local-201744   bridge    local
176f9fc46cc9   gitlab-ci-local-235698   bridge    local
dd8619c29826   gitlab-ci-local-284263   bridge    local
d9cc612fdb5a   gitlab-ci-local-351190   bridge    local
884c9c02eee9   gitlab-ci-local-371592   bridge    local
d147c413d3f5   gitlab-ci-local-375682   bridge    local
1f7e90481cfc   gitlab-ci-local-501394   bridge    local
cdbf32f7f9e6   gitlab-ci-local-535650   bridge    local
1b4057b7b5f9   gitlab-ci-local-558862   bridge    local
b6b57e9795c8   gitlab-ci-local-574073   bridge    local
7e34c53c5bff   gitlab-ci-local-579972   bridge    local
ccd262ce6df9   gitlab-ci-local-654062   bridge    local
02cb192c820a   gitlab-ci-local-668695   bridge    local
e866a4a3540a   gitlab-ci-local-714030   bridge    local
23964309e2f7   gitlab-ci-local-738116   bridge    local
54d988391d24   gitlab-ci-local-768931   bridge    local
f8da1545297a   host                     host      local
y18vrgxhbg68   ingress                  overlay   swarm

I do see the network clean up after in theory, but in practice, something is probably going wrong. My guess is that I should be seeing some message somewhere based on https://github.com/firecow/gitlab-ci-local/blob/master/src/job.ts#L593 or maybe the assert of the containers is what caused the network cleanup to be skipped? I believe the latter may be the case considering those assert are within the catch. I'm not familiar with Javascript "assert" and the best practices, however, I fail to understand how it wouldn't be an "Error" instance...

@ANGkeith
Copy link
Collaborator

ANGkeith commented Nov 5, 2024

likely these are leaked from the test suites, you can replicate it by npm run test and after few seconds

@elafontaine
Copy link
Author

I never ran the test suite of gitlab-ci-local. I run gitlab-ci-local from the brew installation.

These are probably leaked from my job that run a docker in docker and failed to complete. The failure probably triggered some other failures (containers not being removed and skipping the rest of the cleanup ?)

@ANGkeith
Copy link
Collaborator

ANGkeith commented Nov 7, 2024

hmm, ic, not sure then.. i dont really run docker-in-docker pipeline

hopefully, it's something that is replicable

@elafontaine
Copy link
Author

I just had the problem with a job that just runs out of a container... no issue that I know of...

For those having the same issue, here is what I ran ;

 for network in $(docker network ls); do if [[ "$network" == *"gitlab"* ]]; then echo "$network"; docker network rm $network; fi ; done 
gitlab-ci-local-9409
gitlab-ci-local-9409
gitlab-ci-local-95666
gitlab-ci-local-95666
gitlab-ci-local-130397
gitlab-ci-local-130397
gitlab-ci-local-200711
gitlab-ci-local-200711
gitlab-ci-local-201744
gitlab-ci-local-201744
gitlab-ci-local-235698
gitlab-ci-local-235698
gitlab-ci-local-284263
gitlab-ci-local-284263
gitlab-ci-local-351190
gitlab-ci-local-351190
gitlab-ci-local-371592
gitlab-ci-local-371592
gitlab-ci-local-375682
gitlab-ci-local-375682
gitlab-ci-local-451685
gitlab-ci-local-451685
gitlab-ci-local-501394
gitlab-ci-local-501394
gitlab-ci-local-509319
gitlab-ci-local-509319
gitlab-ci-local-535650
gitlab-ci-local-535650
gitlab-ci-local-536928
gitlab-ci-local-536928
gitlab-ci-local-558862
gitlab-ci-local-558862
gitlab-ci-local-562280
gitlab-ci-local-562280
gitlab-ci-local-574073
gitlab-ci-local-574073
gitlab-ci-local-579972
gitlab-ci-local-579972
gitlab-ci-local-654062
gitlab-ci-local-654062
gitlab-ci-local-668695
gitlab-ci-local-668695
gitlab-ci-local-700167
gitlab-ci-local-700167
gitlab-ci-local-714030
gitlab-ci-local-714030
gitlab-ci-local-738116
gitlab-ci-local-738116
gitlab-ci-local-768931
gitlab-ci-local-768931
gitlab-ci-local-788165
gitlab-ci-local-788165
gitlab-ci-local-859507
gitlab-ci-local-859507

@elafontaine
Copy link
Author

elafontaine commented Nov 8, 2024

I've got it again this morning :) I'm pretty sure it accumulate on a job failure...

I'm currently trying to debug a job we have defined that starts a service container of mockserver, start our webcomponent and start testing against the webcomponent. I got a failure in my tests, which fails the job, but nothing special about it...

I've ran that job over 50 times at least yesterday...

@elafontaine
Copy link
Author

Got it again yesterday, and I had it today as well. I will try to notice which "job" leave some network behind... the problem I have is that my workflow is dependant on gitlab-ci-local to run anything 😅 (we're bought on the concept of everything needs to be runnable in the CI and locally).

However, the jobs I've been running were just starting a service for mockserver and the other was a shell job (no relation to docker)...

At this point, I'm pretty sure it's when the job fails and has a service that the network isn't cleaned... I don't know how I could dig out more information for this ticket. If you have an idea, please let me know.

@elafontaine
Copy link
Author

Ok, I can now say for sure that the leak is happening on successful run as well...

The job that leaks is using a service with an alias... I have yet to be able to determine what causes the leak... is it the service container not closing fast enough ?

@lordB8r
Copy link

lordB8r commented Dec 17, 2024

I get similar errors after an unspecific number of runs.

Error: Command failed with exit code 1: docker network create gitlab-ci-local-261361
Error response from daemon: all predefined address pools have been fully subnetted
    at makeError (/opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/node_modules/execa/lib/error.js:60:11)
    at handlePromise (/opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/node_modules/execa/index.js:118:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Job.createDockerNetwork (file:///opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/src/job.js:1053:39)
    at async Job.start (file:///opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/src/job.js:442:13)
    at async /opt/homebrew/Cellar/gitlab-ci-local/4.56.0/libexec/lib/node_modules/gitlab-ci-local/node_modules/p-map/index.js:57:22

I attempted to increase the number in the pool, but that's only a band-aid. The real fix would be killing the networks when docker kills the containers.

I run docker network prune to remove all networks and just start over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
elaborate Further elaboration is needed
Projects
None yet
Development

No branches or pull requests

4 participants