Today I spent a couple of hours trying to help a Data Engineering coworker find some nasty bug that was in hiding into one of our company workloads that are deployed using Docker.

TLDR Link to heading

A Docker deployment bug caused files not to update despite successful builds. It stemmed from an inherited VOLUME directive creating an anonymous volume, persisting changes. Solution: Add docker compose down -v in the deployment pipeline before docker compose up.

The scenario Link to heading

We have a GitLab repository that hosts a dbt project (it’s not important what dbt is, but if you’re interested it’s a data tool that Data engineers use, and I don’t know anything about it), with the following Dockerfile:

FROM ghcr.io/dbt-labs/dbt-core:1.4.1 as build

# ...
# some RUN commands to install dependencies and whatnot, not relevant for our scenario
# ...

WORKDIR /usr/app
COPY ourproject/ .

# ...
# some other stuff copied over
# ...

RUN dbt deps
RUN dbt debug -t live

Seems simple enough to understand, even to somebody like me who has never done data stuff in their life.

Here’s the docker-compose.yml file that spins this stuff up:

services:
  app:
    image: ourfancyregistry.lan/dbtstuff:latest
    restart: no
    entrypoint:
      - /bin/bash
      - -c
      - dbt run -t live

Here’s the Jenkinsfile pipeline that builds and deploys this stuff:

node("linux-docker") {
  def gitData = checkout([$class: 'GitSCM', branches: [[name: '*/live']], extensions: [], userRemoteConfigs: [[credentialsId: 'JenkinsGitLab', url: 'https://ourfancygitlab.lan/xyz/dbtstuff']]])
  def commitHash = gitData["GIT_COMMIT"].substring(0, 7)

  stage("Build") {
    def dockerfile =
    docker.withRegistry("https://ourfancyregistry.lan") {
      def dockerImage = docker.build("dbtstuff:${commitHash}", ".")
      dockerImage.push()
      dockerImage.push("latest")
    }
  }

  stage("Deploy") {
    withCredentials([sshUserPrivateKey(credentialsId: 'REDACTED', keyFileVariable: 'sshIdentityFile', passphraseVariable: 'sshPassphrase', usernameVariable: 'sshUser')]) {
      def remote = [:]
      remote.name = "ourfancyserver.lan"
      # ...
      # some other ssh stuff not needed for this post
      # ...
      sshPut remote: remote, from: "./docker-compose.yml", into: "/srv/docker/dbtstuff/docker-compose.yml"
      sshCommand remote: remote, command: "cd /srv/docker/dbtstuff && docker compose up -d --pull always"
    }
  }
}

Again, pretty simple build pipeline that builds the image on one of our Docker build machines and then pushes it to our internal registry using the “latest” tag and a tag representing the commit hash.

The bug Link to heading

Whenever my coworker pushed some changes to the GitLab repository and triggered the manual build pipeline in Jenkins, Jenkins would do its thing and deploy everything to the remote Docker host without errors.

The logs would show:

Container dbtstuff-app-1 Recreate

However somehow the files inside the container were not updated with the new ones from the latest image version.

The debugging process Link to heading

We tried writing a new file inside the GitLab repository and then doing a cat during the build process of the Docker image:

FROM ghcr.io/dbt-labs/dbt-core:1.4.1 as build

# ...
# some RUN commands to install dependencies and whatnot, not relevant for our scenario
# ...

WORKDIR /usr/app
COPY ourproject/ .

# debug
COPY newfile.txt .
RUN cat newfile.txt

# ...
# some RUN commands to install dependencies and whatnot, not relevant for our scenario
# ...

RUN dbt deps
RUN dbt debug -t live

As expected, the file was copied correctly and the cat command would return the content since the file was there. But somehow, it did not end up in the deployed container. Running a docker compose exec -it app ls -lah on the target machine did not show it at all.

To much of our surprise, we tried bringing the compose project up again by issuing a docker compose up -d --pull always to try and see if somehow the new image was not being pulled from our registry, but we always got:

[+] Pulling 1/1
✔ app Pulled

suspiciously instantaneously… it was not actually downloading anything since it already did when Jenkins told it to.

What surprised us is that when issuing a docker compose down and a docker compose up -d --pull always, everything seemed to fall into place correctly, because then running a docker compose exec -it app ls -lah showed us that our test file was there as supposed to.

What could it be that made Docker download the wrong image somehow? Was there a bug into Docker compose or the Docker engine itself? Did we do something wrong on our side?

Anonymous volumes and the VOLUME directive Link to heading

In the Dockerfile reference guide there’s a directive called VOLUME. This directive instructs the Docker engine to create an anonymous volume in the host that mounts in the specified directory in the container.

Why am I talking about volumes here? Our docker-compose.yml did not have any volumes mapped, and our Dockerfile did not have any VOLUME directives specified in it, so what’s this all about?

Well, when you create a Dockerfile, you usually reference a base image as a starting point with the FROM directive, in our case FROM ghcr.io/dbt-labs/dbt-core:1.4.1. The base image is in turn based on some Dockerfile that starts from another image (or FROM scratch in some special cases), and in that Dockerfile there could be directives that subsequent layers will inherit. In our case, the dbt-core Dockerfile contained the following directive (see the Dockerfile here):

VOLUME /usr/app

This directive is inherited by our Dockerfile that starts from the base image created from this dbt Dockerfile, and thus ends up being used by Docker when starting a new container from our Dockerfile too!

This directive effectively mounts an anonymous volume (remember that we did not mount any volume in our docker-compose.yml file) at the first start of the container, and is persisted for each run of the same container until it is destroyed, for example by running a docker compose down.

We can see that it is mounted as an anonymous volume by issuing the docker inspect dbtstuff-app-1 command and looking at the JSON manifest for the container:

// ...
"Mounts": [
    {
        "Type": "volume",
        "Name": "b6536a91b06811db8da8b446e947bf8c69e6aef0247d9476271fb2d23ee07687",
        "Source": "/var/lib/docker/volumes/b6536a91b06811db8da8b446e947bf8c69e6aef0247d9476271fb2d23ee07687/_data",
        "Destination": "/usr/app",
        "Driver": "local",
        "Mode": "",
        "RW": true,
        "Propagation": ""
    }
],
// ...

As you can see, the /usr/app folder of the container has been mounted by Docker as an anonymous volume (hence the strange hash name it has) and will be persisted across restarts, meaning that a docker compose up -d --pull always will pull the new image, but will not overwrite anything that is located inside the volume mounted folders, thus preventing us from seeing changes that are built into the image itself.

The thing that got me fooled Link to heading

If you search the docker inspect dbtstuff-app-1 output, you will see that there is an Image property with the SHA256 hash of the image the container is running (or that is created on, if the container is stopped). If you do a docker compose up -d --pull always you will see that the image SHA256 will change, tricking you into thinking the container was recreated when effectively it was not! Or at least it was recreated, but it kept the same manifest with the same configuration as far as volumes are concerned.

// before pulling the image
"Image": "sha256:4849b591445c29d3a1bed5b616cb90205a6b02e938ef32a15d1de949cf422bd1",

// after pulling the image
"Image": "sha256:14999a5b0ffac8874d152fdd610c670c4881ec96e09a152b510ce395b6ea6533",

The solution Link to heading

You can just append a docker compose down command to your deploy pipeline, right before doing the docker compose up command, and everything will work fine. You can even go as far as adding a -v flag (meaning docker compose down -v) to get rid of anonymous volumes so that they do not pile up in your Docker host, but be careful that if you’re using other named volumes in your docker-compose.yml file, they will be destroyed too!

Conclusion Link to heading

In newer versions of the dbt-core image that we used here, it seems they got rid of that anonymous volume in the Dockerfile, but for a few reason we cannot upgrade yet to the newest one, so we ended up solving this problem by adding the docker compose down -v command in the Jenkinsfile right before deploying everything.