Docker for researchers

Images generated using ImageGen3

The blue whale logo of the docker is a familiar face we have seen so frequntly in the web, Eventhough many of us have seen it, majority of us don’t use this technology while sharing your source code related to research. My duty here is to convince to you use docker also to provide an easy to use totorial that you can use to package your code. First let’s understand the major issues on reproducing results from a code.

Major Issues with Code Reproducibility

  1. Dependencies, Dependencies, Dependencies - To run your code, you’ll likely rely on numerous Python libraries and their dependencies. As you may know, different versions of these libraries can behave differently. For instance, if you developed a piece of code five years ago, you might need the exact versions of the libraries that were available at that time.

    So, how can we address this issue?

    It’s simple: create a requirements.txt file specifying the exact versions of the libraries you used.

    But how many of us actually do this? Even if you manage to create such a file, there’s still the possibility that pip may struggle to find the required versions if they are outdated.

  2. Laziness - This can also be considered a significant issue. Setting up an environment to run a new set of codes downloaded from GitHub can be quite demanding. As a result, many researchers may hesitate to attempt it. Instead, they might just read the results published in journals and feel satisfied without verifying or reproducing the findings themselves.

  3. OS Compatibility - One of the challenges in code reproducibility arises from differences in operating systems. Code that runs smoothly on one OS may encounter issues on another due to variations in system libraries, file paths, and default settings. For example, a piece of software that works perfectly on Windows might crash or behave unexpectedly on Linux.

    This is where Docker shines. By containerizing your applications, Docker creates an environment that encapsulates all necessary dependencies and system configurations. This means that regardless of the host operating system—be it Windows, macOS, or Linux—your code will run consistently as long as Docker is installed.

    Using Docker, researchers can create a uniform environment for their work, eliminating the guesswork that comes with OS differences. With a simple command, anyone can pull your Docker image and spin up an identical environment to run your code, ensuring that they will observe the same results as you did. This minimizes the “it works on my machine” problem and significantly enhances the reproducibility of computational research.

Key Concepts

  • Image: A read-only template with instructions to create a container.
  • Container: A runnable instance of an image.
  • Dockerfile: A text file with instructions to build a Docker image.
  • Docker Hub: A public registry that stores Docker images.

Common Docker Commands

Docker Version & Info

docker --version         # Show the Docker version installed
docker info              # Display system-wide information about Docker

Managing Images

docker pull <image>      # Download an image from Docker Hub
docker images            # List all images
docker rmi <image>       # Remove an image
docker build -t <name> . # Build an image from a Dockerfile in the current directory

Managing containers

docker run <image>                   # Run a container from an image
docker run -d <image>                # Run a container in detached mode (background)
docker run -it <image> bash          # Run a container interactively with a bash shell
docker stop <container>              # Stop a running container
docker start <container>             # Start a stopped container
docker restart <container>           # Restart a container
docker rm <container>                # Remove a stopped container
docker ps                            # List running containers
docker ps -a                         # List all containers (running and stopped)

Managing volumes

docker volume create <volume_name>   # Create a volume
docker volume ls                     # List all volumes
docker volume rm <volume_name>       # Remove a volume
docker run -v <volume_name>:/path/in/container <image> # Attach volume to container

Networks

docker network ls                    # List networks
docker network create <network_name> # Create a network
docker network connect <network_name> <container> # Connect a container to a network
docker network disconnect <network_name> <container> # Disconnect a container from a network

Docker file basics

A Dockerfile contains instructions to build an image. Here’s an example:

# Start with a base image
FROM python:3.8

# Set working directory
WORKDIR /app

# Copy files to the container
COPY . /app

# Install dependencies
RUN pip install -r requirements.txt

# Run the application
CMD ["python", "app.py"]

Build and run an image

docker build -t myapp .       # Build Docker image from Dockerfile
docker run -p 8080:80 myapp   # Run container and map ports (host:container)

Docker Compose

Docker Compose allows you to define and manage multi-container Docker applications. docker-compose.yml example:

version: '3'
services:
  web:
    image: nginx
    ports:
      - "8080:80"
  db:
    image: postgres
    environment:
      POSTGRES_USER: example
      POSTGRES_PASSWORD: example

Docker Compose Commands:

docker-compose up       # Start all services in the docker-compose.yml
docker-compose down     # Stop and remove services



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • MonaiLabelLite - A lightweight implementation of MonaiLabel for Docker
  • Building a CLSTM cell for 4D Data