Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Refactor Dockerfile & entrypoint #8923

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
64be0ae
Refactor formatting & docs
upbqdn Oct 2, 2024
a248b14
Refactor the `runtime` stage in Dockerfile
upbqdn Oct 2, 2024
ca1620c
Remove unused code from `entrypoint.sh`
upbqdn Oct 9, 2024
ea8b119
Simplify `entrypoint.sh` setup
upbqdn Oct 9, 2024
7cba8cf
Revise docs & formatting
upbqdn Oct 9, 2024
56c65e6
Adjust default values for env vars
upbqdn Oct 9, 2024
be38132
Bump Rust v from 1.79 to 1.81 in Dockerfile
upbqdn Oct 9, 2024
0492b7a
Refactor `entrypoint.sh`
upbqdn Oct 10, 2024
6595740
Refactor `Dockerfile`
upbqdn Oct 10, 2024
22dc738
Add TODOs for monitoring stage to Dockerfile
upbqdn Oct 10, 2024
962d4d3
Merge branch 'main' into docker-refactor
upbqdn Oct 10, 2024
621754b
Refactor `Dockerfile`
upbqdn Oct 10, 2024
6b68592
Add TODOs for monitoring stage to Dockerfile
upbqdn Oct 10, 2024
c788ccc
Merge branch 'docker-refactor' of github.com:ZcashFoundation/zebra in…
upbqdn Oct 10, 2024
38837e4
Fix a typo
upbqdn Oct 10, 2024
b58a602
Merge branch 'main' into docker-refactor
upbqdn Oct 10, 2024
2ada296
Allow running `zebrad` in test mode
upbqdn Oct 11, 2024
99cd18f
Merge branch 'docker-refactor' of github.com:ZcashFoundation/zebra in…
upbqdn Oct 11, 2024
c718609
Merge branch 'main' into docker-refactor
upbqdn Oct 11, 2024
69b03d4
Allow custom config for `zebrad` in test mode
upbqdn Oct 11, 2024
6932d9a
Remove `curl` from the `runtime` Docker image
upbqdn Oct 11, 2024
6fe460d
Remove redundant echos
upbqdn Oct 11, 2024
c5010b8
Remove a malfunctioning CD test
upbqdn Oct 12, 2024
e05df78
Remove a redundant CI test
upbqdn Oct 12, 2024
e9f0479
Remove all packages from the `runtime` stage
upbqdn Oct 12, 2024
7422ecf
Merge branch 'main' into docker-refactor
upbqdn Oct 14, 2024
4fa064c
Docs cosmetics
upbqdn Oct 14, 2024
afeb05f
Merge branch 'main' into docker-refactor
upbqdn Oct 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions .github/workflows/cd-deploy-nodes-gcp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,18 +163,6 @@ jobs:
test_variables: '-e NETWORK'
network: 'Testnet'

# Test that Zebra works using $ZEBRA_CONF_PATH config
test-zebra-conf-path:
name: Test CD custom Docker config file
needs: build
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'custom-conf'
docker_image: ${{ vars.GAR_BASE }}/zebrad@${{ needs.build.outputs.image_digest }}
grep_patterns: '-e "loaded zebrad config.*config_path.*=.*v1.0.0-rc.2.toml"'
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="zebrad/tests/common/configs/v1.0.0-rc.2.toml"'
network: ${{ inputs.network || vars.ZCASH_NETWORK }}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although very simple, the objective of this test is to confirm that the entrypoint is able to handle a custom configuration path ($ZEBRA_CONF_PATH), run with it and confirm that path is being used.

This could be extended to validate is running with a mounted file using --mount type=bind,source="$(pwd)"/target,target=/app as part of the test_variables, but that's out of scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was using a custom config file set in test_variables.
However, the file was not included in the Docker image, and the
entrypoint script created a new, default one under the original file's
path. Zebra then loaded this new file, and the test passed because the
pattern in grep_patterns matched Zebra's output containing the
original path, even though the config file was different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fails in this PR due to the fixes in the entrypoint.

Copy link
Member

@gustavovalverde gustavovalverde Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest commenting this and adding a TODO in top of it, or creating an issue. Just so we don't forget later on.

Either of them I'd suggest indicating something like:

We need to create a test that validates we can mount a configuration file to a different path that the default used by ZEBRA_CONF_PATH, and that Zebra runs using this new file and path.

# Finds a `tip` cached state disk for zebra from the main branch
#
# Passes the disk name to subsequent jobs using `cached_disk_name` output
Expand Down
13 changes: 0 additions & 13 deletions .github/workflows/sub-ci-unit-tests-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,19 +151,6 @@ jobs:
test_variables: '-e NETWORK'
network: 'Mainnet'

# Test reconfiguring the the docker image for tesnet.
test-configuration-file-testnet:
name: Test CI testnet Docker config file
# Make sure Zebra can sync the genesis block on testnet
uses: ./.github/workflows/sub-test-zebra-config.yml
with:
test_id: 'testnet-conf'
docker_image: ${{ vars.GAR_BASE }}/${{ vars.CI_IMAGE_NAME }}@${{ inputs.image_digest }}
grep_patterns: '-e "net.*=.*Test.*estimated progress to chain tip.*Genesis" -e "net.*=.*Test.*estimated progress to chain tip.*BeforeOverwinter"'
# TODO: improve the entrypoint to avoid using `ENTRYPOINT_FEATURES=""`
test_variables: '-e NETWORK -e ZEBRA_CONF_PATH="/etc/zebrad/zebrad.toml" -e ENTRYPOINT_FEATURES=""'
network: 'Testnet'

Comment on lines -154 to -166
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was created to confirm that any change we do in CI or in Docker doesn't affect the ability to read the proper $NETWORK environment variable. As it had happened before that some changes breaks this behavior, and then the tests are running in Mainnet instead of Testnet, but we realized too late or had to wait for some tests to run to confirm it.

Copy link
Member Author

@upbqdn upbqdn Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing with this PR, similarly to the other one. I have a better approach in mind, which I didn't do in this PR. Let's add it back in a subsequent PR?

Moreover, there seem to be some parts that are bugs or hard to understand. For example, I couldn't figure out what -e NETWORK in test_variables is supposed to do. Also, setting ENTRYPOINT_FEATURES to the empty string to enable the test in the entrypoint makes it very hard to follow the execution path in the whole pipeline.

Copy link
Member

@gustavovalverde gustavovalverde Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-e NETWORK tells docker to use whatever value the $NETWORK env variable is set to, to override any default that was set at build time, or in the Dockerfile. This happens here:

env:
NETWORK: '${{ inputs.network }}'
and makes Zebra run with a Testnet configuration, and the test validates that Zebra is correctly running with it.

This has saved me (and others) from making mistakes multiple times while making CI refactors, so this is very important under those circumstances.

I do agree that setting the ENTRYPOINT_FEATURES to an empty string is a dirty hack to make this work, but that's a tech debt that wouldn't justify removing the whole test. In any case, we can remove the use of the ENTRYPOINT_FEATURES variables, while keeping this test behavior.

I'd suggest commenting this and adding a TODO in top of it, or creating an issue, instead of removing the test. Just so we don't forget later on, as this is an important validation.

# Test that Zebra works using $ZEBRA_CONF_PATH config
test-zebra-conf-path:
name: Test CI custom Docker config file
Expand Down
109 changes: 45 additions & 64 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@

# If you want to include a file in the Docker image, add it to .dockerignore.
#
# We are using 4 stages:
# We are using 4 (TODO: 5) stages:
# - deps: install build dependencies and sets the needed variables
# - tests: builds tests binaries
# - release: builds release binaries
# - runtime: runs the release binaries
# - TODO: Add a `monitoring` stage
#
# We first set default values for build arguments used across the stages.
# Each stage must define the build arguments (ARGs) it uses.
Expand All @@ -21,7 +22,8 @@ ARG TEST_FEATURES="lightwalletd-grpc-tests zebra-checkpoints"
ARG EXPERIMENTAL_FEATURES=""

ARG APP_HOME="/opt/zebrad"
ARG RUST_VERSION=1.79.0
ARG RUST_VERSION=1.81.0

# In this stage we download all system requirements to build the project
#
# It also captures all the build arguments to be used as environment variables.
Expand Down Expand Up @@ -67,10 +69,9 @@ ENV SHORT_SHA=${SHORT_SHA:-}

ENV CARGO_HOME="${APP_HOME}/.cargo/"

# Copy the entrypoint script to be used on both images
COPY ./docker/entrypoint.sh /etc/zebrad/entrypoint.sh
COPY ./docker/entrypoint.sh /usr/local/bin/entrypoint.sh

# In this stage we build tests (without running then)
# This stage builds tests without running them.
#
# We also download needed dependencies for tests to work, from other images.
# An entrypoint.sh is only available in this step for easier test handling with variables.
Expand Down Expand Up @@ -116,9 +117,9 @@ RUN --mount=type=bind,source=zebrad,target=zebrad \
--mount=type=cache,target=${APP_HOME}/target/ \
--mount=type=cache,target=/usr/local/cargo/git/db \
--mount=type=cache,target=/usr/local/cargo/registry/ \
cargo test --locked --release --features "${ENTRYPOINT_FEATURES}" --workspace --no-run && \
cp ${APP_HOME}/target/release/zebrad /usr/local/bin && \
cp ${APP_HOME}/target/release/zebra-checkpoints /usr/local/bin
cargo test --locked --release --features "${ENTRYPOINT_FEATURES}" --workspace --no-run && \
cp ${APP_HOME}/target/release/zebrad /usr/local/bin && \
cp ${APP_HOME}/target/release/zebra-checkpoints /usr/local/bin

# Copy the lightwalletd binary and source files to be able to run tests
COPY --from=electriccoinco/lightwalletd:latest /usr/local/bin/lightwalletd /usr/local/bin/
Expand All @@ -130,8 +131,8 @@ ENV ENTRYPOINT_FEATURES=${ENTRYPOINT_FEATURES}
ARG EXPERIMENTAL_FEATURES="journald prometheus filter-reload"
ENV ENTRYPOINT_FEATURES_EXPERIMENTAL="${ENTRYPOINT_FEATURES} ${EXPERIMENTAL_FEATURES}"

# By default, runs the entrypoint tests specified by the environmental variables (if any are set)
ENTRYPOINT [ "/etc/zebrad/entrypoint.sh" ]
# Run tests specified by the env vars
ENTRYPOINT [ "entrypoint.sh", "test" ]

# In this stage we build a release (generate the zebrad binary)
#
Expand Down Expand Up @@ -161,68 +162,48 @@ RUN --mount=type=bind,source=tower-batch-control,target=tower-batch-control \
--mount=type=cache,target=${APP_HOME}/target/ \
--mount=type=cache,target=/usr/local/cargo/git/db \
--mount=type=cache,target=/usr/local/cargo/registry/ \
cargo build --locked --release --features "${FEATURES}" --package zebrad --bin zebrad && \
cp ${APP_HOME}/target/release/zebrad /usr/local/bin
cargo build --locked --release --features "${FEATURES}" --package zebrad --bin zebrad && \
cp ${APP_HOME}/target/release/zebrad /usr/local/bin

# This stage is only used when deploying nodes or when only the resulting zebrad binary is needed
#
# To save space, this step starts from scratch using debian, and only adds the resulting
# binary from the `release` stage
# This step starts from scratch using Debian and only adds the resulting binary
# from the `release` stage.
FROM debian:bookworm-slim AS runtime

# Set the default path for the zebrad binary
ARG APP_HOME
ENV APP_HOME=${APP_HOME}
WORKDIR ${APP_HOME}
Comment on lines -174 to -176
Copy link
Member

@gustavovalverde gustavovalverde Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set a WORKDIR otherwise the user will end up in the / directory with no permissions, which they might require for testing/personal purposes.

The ARG + ENV combination also allows the user to set a custom directory, in case their host permissions does not allow the one we've chosen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member Author

@upbqdn upbqdn Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System users should have no home dir and should not even be able to log in to the machine. Our users should use -u root when they're logging to the machine. The logic I had in mind is very simple and minimal:

  • The whole runtime target has its entrypoint executed under the non-privileged zebra system user which has no home dir and no login (just as in a regular Linux env).
  • The Dockerfile sets up the minimal requirements for the zebra user to execute the entrypoint.
  • When our user wants to do some tweaks, they explicitly use -u root, which they can always do, and which gives them the clarity that they have full privileges.

I wanted to describe this in our docs in a subsequent PR. Is it OK if we do it like that?


RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
rocksdb-tools \
gosu \
&& rm -rf /var/lib/apt/lists/* /tmp/*
COPY --from=release /usr/local/bin/zebrad /usr/local/bin/
COPY --from=release /usr/local/bin/entrypoint.sh /usr/local/bin/

# Create a non-privileged user that the app will run under.
# Running as root inside the container is running as root in the Docker host
# If an attacker manages to break out of the container, they will have root access to the host
# See https://docs.docker.com/go/dockerfile-user-best-practices/
ARG USER=zebra
# Create a non-privileged user for running `zebrad`.
#
# ## Security
#
# If an attacker manages to break out of the container, they will have root
# access to the host. [1][1]
#
# [1]: https://docs.docker.com/go/dockerfile-user-best-practices/
ARG USER="zebra"
ENV USER=${USER}
ARG UID=10001
ENV UID=${UID}
ARG GID=10001
ENV GID=${GID}

RUN addgroup --system --gid ${GID} ${USER} \
&& adduser \
--system \
--disabled-login \
--shell /bin/bash \
--home ${APP_HOME} \
--uid "${UID}" \
--gid "${GID}" \
${USER}
Comment on lines -192 to -205
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be instances where a user would like to (re-)build the image with a custom UID:GUID, as they might require to mount files from their host, which will be incompatible with the UID:GUID of the container user, and it's also a good Docker practice to specify the UID for the default user for other edge-cases.

Unless there's a good reason to remove this, I'd suggest to keep it.

The PR that added this has some references to the reasoning behind it: #8803 (comment)

Copy link
Member Author

@upbqdn upbqdn Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we're creating a system user. Those users should have UIDs in [1, 999].

In Debian, the docs for adduser say that the default dynamic range for system UIDs is defined by [FIRST_SYSTEM_UID, LAST_SYSTEM_UID], which is [100, 999] in etc/adduser.conf: https://manpages.debian.org/bullseye/adduser/adduser.8.en.html

Moreover, the Linux spec says that system UIDs in [100, 499] should be reserved for dynamic allocation: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/uidrange.html

I'm surprised the Docker docs don't mention this.

adduser automatically checks if a UID is available before assigning one from the right range. Since I wasn't sure what UID to pick, and no user brought it up, I removed it to avoid scope creep. I'd prefer to address this issue according to the spec once a user brings it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not clear in Docker's documentation. But some of their tools, docker init for example, output the recommended approach, which is the following (with alpine linux):

ARG UID=10001
RUN adduser \
    --disabled-password \
    --gecos "" \
    --home "/nonexistent" \
    --shell "/sbin/nologin" \
    --no-create-home \
    --uid "${UID}" \
    appuser
USER appuser

In any case, to match their recommendation in debian, this would change to:

RUN addgroup --system --gid ${GID} ${USER} \
    && useradd \
    --uid "${UID}" \
    --no-create-home \
    --home-dir /nonexistent \
    --shell /usr/sbin/nologin \
    --comment "" \
    appuser

They main reasons to keep this instruction are:

  • Host and Container Alignment: When you mount host directories into the container, file permissions are based on UID and GID. If the container's user ID match those on the host, you avoid permission issues.

    • I've experienced this myself when I was testing the docker-compose and mounting a cached state from my host into the container, which had different UID:GID, and the container was not able to write to it.
  • Predictable IDs: By specifying the UID and GID, you ensure that the user inside the container has the same IDs across different builds, deployments, and host systems.

Additionally, this is very common in Docker, and plenty of projects use this approach to deal with these and other use-cases: https://github.com/search?q=%22useradd%22+%22--uid%22++language%3ADockerfile&type=code


# Config settings for zebrad
ARG FEATURES
ENV FEATURES=${FEATURES}
Comment on lines -208 to -209
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the runtime stage is built with custom FEATURES this will be propagated by default as an ENV variable to the entrypoint.sh, if we remove this instructions, then we need to always specify the FEATURES environment variable when running the image, otherwise it will be empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that var is not used in any meaningful way in entrypoint.sh, so I removed it. I would like to refactor the way we configure Zebra inside Docker in follow-up PRs.

Copy link
Member

@gustavovalverde gustavovalverde Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used here:
https://github.com/ZcashFoundation/zebra/pull/8923/files#diff-4f5cabe26761257a4d685a6edc7a43e0fe0f78762f50eeb48530f2bd3b3ee7caR81
and here:
https://github.com/ZcashFoundation/zebra/pull/8923/files#diff-4f5cabe26761257a4d685a6edc7a43e0fe0f78762f50eeb48530f2bd3b3ee7caR100

From our documentation, if we suggest:

docker build -f ./docker/Dockerfile --target runtime --build-arg FEATURES='default-release-binaries prometheus' --tag local/zebra.mining:latest .

The entrypoint will evaluate the $FEATURES var and it will be empty, as it was never defined as a variable. This would be confusing for the user as they built it with --build-arg FEATURES='default-release-binaries prometheus' but the configuration file is not adding the corresponding section, as that argument was not passed as an Environment variable (at build time) to the container.


# Path and name of the config file
# These are set to a default value when not defined in the environment
ENV ZEBRA_CONF_DIR=${ZEBRA_CONF_DIR:-/etc/zebrad}
ENV ZEBRA_CONF_FILE=${ZEBRA_CONF_FILE:-zebrad.toml}
RUN adduser --system ${USER}

RUN mkdir -p ${ZEBRA_CONF_DIR} && chown ${UID}:${UID} ${ZEBRA_CONF_DIR} \
&& chown ${UID}:${UID} ${APP_HOME}
Comment on lines -216 to -217
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a HOME directory for the application is a good practice, it also starts the container in an empty directory the users can use as they see fit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System users should have no home dir. Our users should go with -u root. Another approach would be to execute the entrypoint under root, and then run zebrad under the zebra system user. Our users could then login implicitly as root without -u root, but that adds a bit of extra complexity to the entrypoint, which I wanted to keep simple for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not allowing the user to log in with the user running Zebra, then we should document the approach they can use to get a bash/sh terminal in the container to troubleshoot in it.

ARG ZEBRA_CONF_DIR="/etc/zebrad"
ENV ZEBRA_CONF_DIR=${ZEBRA_CONF_DIR}
RUN mkdir -p ${ZEBRA_CONF_DIR} && chown ${USER} ${ZEBRA_CONF_DIR}

COPY --from=release /usr/local/bin/zebrad /usr/local/bin
COPY --from=release /etc/zebrad/entrypoint.sh /etc/zebrad
# TODO: Shorten `ZEBRA_CACHED_STATE_DIR` to `ZEBRA_CACHE_DIR`
ARG ZEBRA_CACHED_STATE_DIR="/var/cache/zebrad"
ENV ZEBRA_CACHED_STATE_DIR=${ZEBRA_CACHED_STATE_DIR}
RUN mkdir -p ${ZEBRA_CACHED_STATE_DIR} && chown ${USER} ${ZEBRA_CACHED_STATE_DIR}

# Expose configured ports
EXPOSE 8233 18233
gustavovalverde marked this conversation as resolved.
Show resolved Hide resolved
USER $USER

# Update the config file based on the Docker run variables,
# and launch zebrad with it
ENTRYPOINT [ "/etc/zebrad/entrypoint.sh" ]
ENTRYPOINT [ "entrypoint.sh" ]
CMD ["zebrad"]

# TODO: Add a `monitoring` stage
#
# This stage will be based on `runtime`, and initially:
#
# - run `zebrad` on Testnet
# - with mining enabled using S-nomp and `nheqminer`.
#
# We can add further functionality to this stage for further purposes.
Loading
Loading