Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After deployment of ExApp "Test Deploy" (nc_app_test-deploy) returns/shows: "Heartbeat check failed" and "Healtchecking" #300

Open
architectonio opened this issue Jun 7, 2024 · 75 comments
Labels
daemon deploy Related ExApp deployment docker Docker Engine API

Comments

@architectonio
Copy link

architectonio commented Jun 7, 2024

Describe the bug

After having deployed the ExApp "Test Deploy", the NextCloud External App Admin Interface shows a "Healthchecking"infinite loop as well as "Heartbeat check failed"

Steps/Code to Reproduce

Deploy the "Test Deploy" on NextCloud

Expected Results

Deployed without any issue

Actual Results

NextCloud External App Admin Interface shows a "Healthchecking"infinite loop as well as "Heartbeat check failed"

Setup configuration

Software

  • Debian Bookworm
  • NextCloud 29.0.0, 29.0.1 and now 29.0.2
  • Latest Apache HTTPD Server
  • PHP 8.2 (FPM)

Hardware

  • Intel 28 Cores + 128 GB RAM
  • NVidia RTX A4000 (16 GB VRAM)

result of: docker logs nc_app_test-deploy
Started
INFO: Started server process [1]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
TRACE: ASGI [1] Receive {'type': 'lifespan.shutdown'}
TRACE: ASGI [1] Send {'type': 'lifespan.shutdown.complete'}
TRACE: ASGI [1] Completed
INFO: Application shutdown complete.
INFO: Finished server process [1]
Started
INFO: Started server process [1]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)

result of: docker volume inspect nc_app_test-deploy_data
[
{
"CreatedAt": "2024-05-28T10:36:06+02:00",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/nc_app_test-deploy_data/_data",
"Name": "nc_app_test-deploy_data",
"Options": null,
"Scope": "local"
}
]

@kyteinsky
Copy link
Collaborator

Please provide the following in addition to the above:

  1. The relevant section of server logs, last 10 minutes or the error entries (found at data/nextcloud.log inside nextcloud's directory)
  2. The docker socket proxy's container logs
  3. A screenshot of the Test Deploy page with the Developers console open. The Dev console can be opened by pressing F12 or Ctrl+Shift+I in the browser.

@architectonio
Copy link
Author

  1. The NextCloud log file is little complicated. I have several clients (both Mobile, Windows, Linux...) connected with my NextCloud instance and since it is logging everything, last 10 minutes (even just 5) would be a huge file, which I need to "clean from sensitive information like user ids..."

  2. I attached the docker socket proxy log.

  3. Where is to find that page?

@kyteinsky
Copy link
Collaborator

  1. grep app_api data/nextcloud.log should be good enough
  2. I can't see it for some reason although I received an email. Did you delete it by chance?
  3. i. Go to /index.php/settings/admin/app_api
    ii. Click on "Test deploy" inside a dropdown menu for the docker socket proxy's daemon
    iii. Press F12
    iv. Click on "Start Deploy test"

@architectonio
Copy link
Author

The resulting Log File is about 60 MB (59944187 Jun 7 14:29 nc_appapi.log) and as I said it contains sensitive information. I am going to replace such sensitive information with dummy/fake information and then upload the log file

@architectonio
Copy link
Author

Please find attache the log file

@architectonio
Copy link
Author

What do you exactly need from Developer Console?
There are a lot of tabs/Screens.....

@architectonio
Copy link
Author

The screenshot, hoping it contains the information you need

@kyteinsky
Copy link
Collaborator

It looks like the attachments are missing for the log file and the screenshot.

What do you exactly need from Developer Console?

Sorry for the confusion. I'm looking for errors in the "Console" tab or the "Network" tab.

@architectonio
Copy link
Author

architectonio commented Jun 7, 2024

Maybe github doesn't accept .png and .gz files?

@kyteinsky
Copy link
Collaborator

Seems to work for me. You can give it one more try by dragging the file in the text box or a link to the uploaded file elsewhere.

@architectonio
Copy link
Author

I uploaded the files again... both in a zip file

@architectonio
Copy link
Author

part 1/3

@architectonio
Copy link
Author

part 2/3

@architectonio
Copy link
Author

part 1/3

@kyteinsky
Copy link
Collaborator

@architectonio Nice that you managed to get context chat running. Did you check if this was solved as well?

For the attachments, I only see text messages this side (part 1/3, ...). You can use pastebin and imgur to upload the logs and screenshots and then paste a link here if the issue still persists.

@architectonio
Copy link
Author

@kyteinsky
I just created a zip file to download from my server.
How can I send you the link (privately)?

@architectonio
Copy link
Author

@architectonio Nice that you managed to get context chat running. Did you check if this was solved as well?

It seems to work, however when I make a question in the NextCloud Context Chat,
after a while I get "Context Chat task for NextCloud Assistant has filed".

@architectonio
Copy link
Author

The same happens by trying to generate image like "Draw a red Rose in a brown pot" with "NC Assistant Generate Image".
"Assistant has filed" after about a minute

@kyteinsky
Copy link
Collaborator

Sure, send it over at kyteinsky@gmail.com, I'll attach the files here. The issues might indicate that the app_api setup is not correct. Did it work before with image generation?

@architectonio
Copy link
Author

Check your mailbox!

@architectonio
Copy link
Author

@kyteinsky Any finding in the log file?

@kyteinsky
Copy link
Collaborator

sorry again for the late reply.

The only relevant line in the log was:

... Error during request to ExApp context_chat_backend: cURL error 28: Failed to connect to context_chat_backend port 23001 after 134424 ms: Couldn't connect to server (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://context_chat_backend:23001/loadSources ...  

The "134423 ms" part is interesting. Can you check the php timeout in your php.ini config and increase it if it is too low? A good value could be 1800 to 3000 seconds.

Also, I'd like to see the Test deploy modal. Please follow these steps to run a test deployment using AppAPI:

  1. Go to "/index.php/settings/admin/app_api"
  2. Create a deploy daemon if not done already (verify the connection here itself)
  3. Click on "Test deploy" in the actions menu
  4. Click on "Start deploy test"
  5. Send a screenshot of the browser or modal when done.

@kyteinsky
Copy link
Collaborator

Click on "Test deploy" in the actions menu

@architectonio
Copy link
Author

No worries. I have already tried with Test Deploy.
Below the screen shots.

screen-2024-06-14-09-12-50

screen-2024-06-14-09-13-21

@architectonio
Copy link
Author

And here the ExApp logs:

Started
INFO: Started server process [1]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
TRACE: ASGI [1] Receive {'type': 'lifespan.shutdown'}
TRACE: ASGI [1] Send {'type': 'lifespan.shutdown.complete'}
TRACE: ASGI [1] Completed
INFO: Application shutdown complete.
INFO: Finished server process [1]
Started
INFO: Started server process [1]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)

@kyteinsky
Copy link
Collaborator

Could be a network issue. Can you click on the deploy daemon (not on the 3 dots) and then on "Verify connection" ?

@architectonio
Copy link
Author

"Daemon connection successful"

@architectonio
Copy link
Author

This is what gives back "docker ps"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ef91dc5a85c ghcr.io/cloud-py-api/test-deploy-cuda:release "python3 main.py" 6 days ago Up 3 days (healthy) nc_app_test-deploy
6c441efca423 ghcr.io/nextcloud/context_chat_backend:2.1.1 "python3 main.py" 6 days ago Up 3 days nc_app_context_chat_backend
4d936805fe6e localai/localai:master-aio-gpu-nvidia-cuda-12 "/aio/entrypoint.sh" 12 days ago Up 3 days (healthy) 0.0.0.0:28890->8080/tcp, :::28890->8080/tcp local-ai
3d02fb1d6b04 ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release "/bin/bash start.sh" 2 weeks ago Up 3 days (healthy) 0.0.0.0:2375->2375/tcp, :::2375->2375/tcp nextcloud-appapi-dsp

@architectonio
Copy link
Author

By the way, the reason why I couldn't attach any file in the comment seems to be related to Firefox.
I tried with Chromium and it works.......strange

@architectonio
Copy link
Author

VerifyConnection button works in your case only for the reason that you specified "/var/run/docker.sock" in Host

I suggest to remove this daemon, deploy this container https://github.com/cloud-py-api/docker-socket-proxy and create daemon with Host: nextcloud-appapi-dsp:2375 after that.

After that "VerifyConnection" button will try to connect to nextcloud-appapi-dsp:2375 which will fail, I guess...

Something is resolving all those DNS names in your system to 84.170.215.125 - you need to find what is that.

OK, I'll do as you suggested, but on Sunday in the late afternoon. Now I have to travel a little....

@architectonio
Copy link
Author

Something is resolving all those DNS names in your system to 84.170.215.125 - you need to find what is that.

This is the Public IP Address I got (for today) by my ISP, on which points my domain

@bigcat88
Copy link
Member

This is the Public IP Address I got (for today) by my ISP, on which points my domain

I understand that, but that DNS names should be resolved to the local addresses(docker should do that) and not your public address.
They can be resolved to your public address only when "network=host" is set up, which should not be done for this type of setup(when Nextcloud and ExApps are on the same host).

@architectonio
Copy link
Author

architectonio commented Jun 16, 2024

VerifyConnection button works in your case only for the reason that you specified "/var/run/docker.sock" in Host

I suggest to remove this daemon, deploy this container https://github.com/cloud-py-api/docker-socket-proxy and create daemon with Host: nextcloud-appapi-dsp:2375 after that.

After that "VerifyConnection" button will try to connect to nextcloud-appapi-dsp:2375 which will fail, I guess...

Something is resolving all those DNS names in your system to 84.170.215.125 - you need to find what is that.

I created the network "nextcloud-aio"
and this is what a "docker inspect" gives back

docker network inspect nextcloud-aio

[
    {
        "Name": "nextcloud-aio",
        "Id": "5d26f704ae6c26bd2eb55e8b2389b040d36c19caec5e392e47732d9f795c9e64",
        "Created": "2024-06-14T10:08:31.757855622+02:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]


I removed the daemon and redeployed as "nextcloud-appapi-dsp:2375".

"docker ps" shows it up and running

3d02fb1d6b04   ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release   "/bin/bash start.sh"     2 weeks ago   Up 18 minutes (healthy)   0.0.0.0:2375->2375/tcp, :::2375->2375/tcp     nextcloud-appapi-dsp

A "nmap nextcloud-appapi-dsp -p2375" gives back:

PORT     STATE    SERVICE
2375/tcp filtered docker

And a ping shows my (currently) public IP Address
"ping nextcloud-appapi-dsp"

PING nextcloud-appapi-dsp.architectonio.net (93.224.198.141) 56(84) bytes of data.
64 bytes from p5de0c68d.dip0.t-ipconnect.de (93.224.198.141): icmp_seq=1 ttl=63 time=1.29 ms
64 bytes from p5de0c68d.dip0.t-ipconnect.de (93.224.198.141): icmp_seq=2 ttl=63 time=1.86 ms
^C64 bytes from 93.224.198.141: icmp_seq=3 ttl=63 time=1.28 ms

And NextCloud ExtApp Dashboard shows: "All ExApps are up-to-date. Default Deploy daemon is not accessible "

I do not know is something is wrong on my Server Network Configuration, however everything else just runs smoothly, without any noticeable issue.

@architectonio
Copy link
Author

What I also noticed is the fact that both "Test Deploy" and "Context Chat Backend" containers have no port exposed while all other containers are exposing a port.
It is so OK?

docker ps
05fb1eb96a0b ghcr.io/nextcloud/context_chat_backend:2.1.1 "python3 main.py" About an hour ago Up 17 seconds nc_app_context_chat_backend
c35572f9cd25 ghcr.io/cloud-py-api/test-deploy-cuda:release "python3 main.py" 11 hours ago Up 18 seconds (healthy) nc_app_test-deploy
c9ded5aa33ae ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release "/bin/bash start.sh" 12 hours ago Up 11 seconds (healthy) 0.0.0.0:2375->2375/tcp, :::2375->2375/tcp nextcloud-appapi-dsp
4d936805fe6e localai/localai:master-aio-gpu-nvidia-cuda-12 "/aio/entrypoint.sh" 2 weeks ago Up 14 seconds (health: starting) 0.0.0.0:28890->8080/tcp, :::28890->8080/tcp local-ai
1b3b3efb8d7f collabora/code "/start-collabora-on…" 2 weeks ago Up 18 seconds 0.0.0.0:9980->9980/tcp, :::9980->9980/tcp collabora-code

Another point I do not really catch is the "nextcloud-aio" network.
I created it and associated to the "Docker Socket Proxy" Container, however it is to me not clear why I cannot use another docker bridged network, since my Nextcloud Installation isn't an AIO but a bare installation and all other containers, including COLLABORA-CODE and LOCAL-AI works very well.

@architectonio
Copy link
Author

Any news on this?

@ericmail84
Copy link

I guess this is a reason: "NetworkMode": "bridge"

Ok, we need to move those Note about bridge from here: https://cloud-py-api.github.io/app_api/DeployConfigurations.html

image to somewhere else to be more visible...

Am I reading this correctly that the docker socket proxy cannot be on a remote host? Test deploy seems to fail for me, much as the original post here, because it cannot resolve the name http//:test-deploy:23000

@architectonio
Copy link
Author

I don't know if DSP must run on the local host, however my DSP runs on the local host and I tried with "host", "bridge" and also "nextcloud-aio". The issue remains the same, "Test Deploy" and "Context Chat Backend" are deployed but not reachable.
I also noticed that "Context Chat Backend" restarts every few seconds (by observing the results of "watch -n 0.5 docker ps" ).

@andrey18106 andrey18106 added daemon docker Docker Engine API deploy Related ExApp deployment labels Jul 2, 2024
@andrey18106
Copy link
Collaborator

@architectonio

Another point I do not really catch is the "nextcloud-aio" network.

It was mentioned as the assumption that you are using Nextcloud AIO - which has this custom network created for the AIO containers.

I don't know if DSP must run on the local host

The purpose of the DSP - is to provide a secure access for AppAPI to docker via network, it can be local or remote.

I also noticed that "Context Chat Backend" restarts every few seconds (by observing the results of "watch -n 0.5 docker ps" ).

Is there any logs or error that can give us a hint? Is there any errors related in system logs from docker (journalctl -u docker) or from Context Chat Backend container?

For now I can't say more that was said before on how to investigate networking issues, since the daemon connection is fine and the deployment working, the issue is only in communication part between ExApp and NC, which is likely some specifics of certain system setup. I'll back to you as soon as find something.

@ericmail84
Copy link

ericmail84 commented Jul 2, 2024 via email

@architectonio
Copy link
Author

@andrey18106
@andrey18106
Thank you for your reply, I mentioned earlier that I do not use and never have used NextCloud AIO. My installation is on the Server (Debian, with MariaDB, Apache, PHP and so on).

Is there any logs or error that can give us a hint? Is there any errors related in system logs from docker (journalctl -u docker) or from Context Chat Backend container?

As I wrote before, docker works perfectly, with no network or other issue.
I currently have about ten applications on docker including LocalAI (CUDA), Collabora CODE, Home Assistant, Libretranslate, SearxNG, and so on.

A "docker ps" gives back:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
123456789012 ghcr.io/cloud-py-api/test-deploy-cuda:release "python3 main.py" 45 seconds ago Up 43 seconds (healthy) nc_app_test-deploy
1234567890ab localai/localai:master-aio-gpu-nvidia-cuda-12 "/aio/entrypoint.sh" 4 days ago Up 4 days (healthy) 0.0.0.0:28890->8080/tcp, :::28890->8080/tcp local-ai
1234567890cd ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release "/bin/bash start.sh" 4 days ago Up 4 days (healthy) 0.0.0.0:2375->2375/tcp, :::2375->2375/tcp nextcloud-appapi-dsp
...................................

Notice that the Test Deploy has no network or port

Here the "docker network list"
NETWORK ID NAME DRIVER SCOPE
a9421922f6d8 bridge bridge local
371bcd681096 host host local
c92d51732e32 localai-webui_default bridge local
5d26f704ae6c nextcloud-aio bridge local
62ccc5fcbaa2 none null local

@architectonio
Copy link
Author

architectonio commented Jul 2, 2024

This is the running DSP (I removed Test-Deploy and Context_Chat Backend) which is reachable

ExApps installed: 0
Name: docker_socket_proxy
Protocol: http
Host: 127.0.0.1:2375
Deploy config
Docker network: bridge
Nextcloud URL: https://nextcloud.mydomain.net
HaProxy password: 12345678
GPUs support: true
Compute device: CUDA (NVIDIA

@ericmail84
Copy link

I think they suggested not to use bridge because bridge won't look things up by container name. I set mine to master_default, but no difference in the behavior.

@architectonio
Copy link
Author

I tried a lot of possible combinations, all without success.....I guess I invested at least 40 hours in testing.
I have now deleted everything and wait until NextCloud releases documentation that explains what to do in a clear way that works.

@gitwittidbit
Copy link

Same issue here. I have NC running in a VM (direct install, no docker). And I have docker for running this ExApp stuff.

The daemon connection test is successful but the deployment test fails after a long time during the heartbeat check.

No idea what else to try.

@architectonio
Copy link
Author

Same issue here. I have NC running in a VM (direct install, no docker). And I have docker for running this ExApp stuff.

The daemon connection test is successful but the deployment test fails after a long time during the heartbeat check.

No idea what else to try.

I had exactly the same issue.

@bigcat88
Copy link
Member

Same issue here. I have NC running in a VM (direct install, no docker). And I have docker for running this ExApp stuff.

The daemon connection test is successful but the deployment test fails after a long time during the heartbeat check.

No idea what else to try.

Please create a separate issue with describing of your configuration.
Without NC logs, container info/logs and information about setup we can't do much.

I had exactly the same issue.

Have you tried with the latest version 3.1.0 (where we fixed a critical bug with APCu), heartbeat still didn't work?

If you tried and it still didn't work, as an option I can offer if you have the opportunity to give VPN access to the test environment where you can't do it, and we'll try to figure out what the reason might be.

But with version 3.1.0 everything has already worked for most people, I hope that we can help you too.

@architectonio
Copy link
Author

I had exactly the same issue.

Have you tried with the latest version 3.1.0 (where we fixed a critical bug with APCu), heartbeat still didn't work?

If you tried and it still didn't work, as an option I can offer if you have the opportunity to give VPN access to the test environment where you can't do it, and we'll try to figure out what the reason might be.

But with version 3.1.0 everything has already worked for most people, I hope that we can help you too.

Yes I have tried with the latest version 3.1.0, however I just used the already deployed Docker Socket Proxy and I do not know if it affects in any way the AppAPI.
Tomorrow I am going to re-deploy the DSP and watch what happens.

Would be worth deleting the NextCloud Assistant, NextCloud Assistant Context Chat nad AppAPI and then reinstall again?

@gitwittidbit
Copy link

But with version 3.1.0 everything has already worked for most people, I hope that we can help you too.

Yes, I'm on 3.1.0 (I also updated NC to 29.0.8).

One thing I noticed is that the nextcloud-appapi-dsp container becomes unhealthy relatively quickly. Not sure, if this has anything to do with the issue? (I downloaded the most recent image and updated the container but it still becomes unhealthy a minute after starting or so)

@bigcat88
Copy link
Member

bigcat88 commented Aug 21, 2024

You have a docker-socket-proxy address where it is listens.

Look in the DB which port is assigned to the test-deploy application in the oc_ex_apps table.

Try to do curl 'http://{docker-socket-proxy-address}:{test-deploy-port}/heartbeat' from the Nextcloud instance.

If you use https you need to add authentification for request with -u app_api_haproxy_user:{your_haproxy_password}

This is literally what AppAPI does on heartbeat.

To not this issue longer(it is already 65+ messages) - please create a separate issue with posted configs.

@architectonio
Copy link
Author

You have a docker-socket-proxy address where it is listens.

Look in the DB which port is assigned to the test-deploy application in the oc_ex_apps table.

Try to do curl 'http://{docker-socket-proxy-address}:{test-deploy-port}/heartbeat' from the Nextcloud instance.

If you use https you need to add authentification for request with -u app_api_haproxy_user:{your_haproxy_password}

This is literally what AppAPI does on heartbeat.

To not this issue longer(it is already 65+ messages) - please create a separate issue with posted configs.

This is what i get with https: curl https://127.0.0.1:2375/ -u app_api_haproxyuser:mytestpassword
curl: (35) OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number

And this with http: curl http://127.0.0.1:2375/ -u app_api_haproxyuser:mytestpassword
**

401 Unauthorized


You need a valid user and password to access this content.

**

The DSP was deployed in this way:
docker run -v /var/run/docker.sock:/var/run/docker.sock -e NC_HAPROXY_PASSWORD="mytestpassword" --restart always --name nextcloud-appapi-dsp -h nextcloud-appapi-dsp --net nextcloud-aio -p 2375:2375 --privileged -d ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release

A docker ps shows:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c011616e7873 ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release "/bin/bash start.sh" 6 minutes ago Up 6 minutes (healthy) 0.0.0.0:2375->2375/tcp, :::2375->2375/tcp nextcloud-appapi-dsp

@andrey18106
Copy link
Collaborator

@architectonio Please correct the name of the user to app_api_haproxy_user and try again. Note: for HTTPS Docker Socket Proxy you can't use 127.0.0.1 host, in your case the error for https additionally means that the Docker Socket Proxy wasn't set up with SSL enabled (it enables if /certs/cert.pem is mounted in container during startup).

@architectonio
Copy link
Author

curl http://127.0.0.1:2375/ -u app_api_haproxy_user:mytestpassword

403 Forbidden

Request forbidden by administrative rules.

@andrey18106
Copy link
Collaborator

curl http://127.0.0.1:2375/ -u app_api_haproxy_user:mytestpassword

403 Forbidden

Request forbidden by administrative rules.

There is no route in your request, it's not allowed, so the response is correct, and auth is passed.

@architectonio
Copy link
Author

I assume this means that the AppAPI and everything that is deployed should work...

@architectonio
Copy link
Author

Unfortunately the issue persists.
Both "Context Chat Backend" and "Test Deploy" Apps, stuck by Healthchecking.

A docker ps shows:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
110c19421d84 ghcr.io/nextcloud/context_chat_backend:2.2.1 "python3 main.py" About a minute ago Up About a minute nc_app_context_chat_backend
cf74179d5b31 ghcr.io/cloud-py-api/test-deploy:release-cuda "python3 main.py" 8 minutes ago Up 7 minutes (healthy) nc_app_test-deploy

And Nextcloud "You Apps" Dashboard shows both Apps with a Healtchecking loop.

@kyteinsky
Copy link
Collaborator

hi, I was testing with a proxmox setup where your issue was reproducible. Docker socket proxy was reachable, deployment succeeded but the heartbeat thing was stuck. It happens because "http" deployments as in your case @architectonio (if you're around) bind to the localhost (127.0.0.1) address (for security reasons) when the network setup would want them bound to another address.
With my setup, lsof -i -P looked like this:

haproxy   225156     root    5u  IPv4 1336471      0t0  TCP pve.pmox.local:2375 (LISTEN)
python3   225470     root   11u  IPv4 1335149      0t0  TCP localhost.localdomain:23000 (LISTEN)

The haproxy is bound to pve.pmox.local correctly but the ex-app's container with the python3 process is not. To solve this, during the creation of the deploy daemon, in Add Additional Option, set OVERRIDE_APP_HOST to the desired IP you want it to listen at. In this example, it would be the IP pve.pmox.local resolves to. See https://nextcloud.github.io/app_api/CreationOfDeployDaemon.html#additional-options

Thanks to @bigcat88 for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
daemon deploy Related ExApp deployment docker Docker Engine API
Projects
None yet
Development

No branches or pull requests

6 participants