Getting AWS ECS to work on Ubuntu with Full GPU Support

I'm a big fan of Amazon ECS- it's a remarkably simple containerization service that integrates with the rest of AWS's services relatively easily. While I'd generally prefer to run with Fargate there are times when I need to have a bit more control over the underlying machines- especially when it comes to tasks using GPUs.

I'm also a big fan of the Debian/Ubuntu universe of Linux distros. While I appreciate that AWS offers their own ECS Optimized AMIs we can use for ECS, I prefer to keep all machines on the same OS if possible and already have a build system in place for Ubuntu based machines. If this isn't a requirement for you then you can move ahead immediately with the AWS ECS AMIs and skip some work.

When getting my base images to work with ECS I had a few strict goals-

Create Ubuntu based images matching the rest of our infrastructure.
Utilize ECS Task Based Networking, where each task gets its own elastic network interface.
Download images from private Docker Hub repositories.
Secure Instance Role and MetaData, so tasks can't break out and steal the roles from the ECS instances.
Only use the nvidia runtime for containers that need it- avoid the common ECS GPU hack of hardcoding the nvidia runtime for all containers.
Treat GPUs as first class resources, taking advantage of the ECS Agent's ability to schedule and pin tasks to specific GPUs.

There are a few other tutorials out there for getting GPUs working with ECS, but most of them do so by making the nvidia runtime the default for every container that runs. I wanted to avoid this method because it loses out on the best ECS GPU features. When setup properly AWS ECS is pretty smart with its usage of GPUs, doing things such as pinning tasks to specific GPUs.

Out of Scope

The primary output here is going to be a userscript that runs on the script startup. For this post I'm skipping the following-

Installing nvidia drivers. While this could be done in the userdata script, it will slow your cluster down to have to install it each time a new instance becomes active.
Creating AWS resources such as Launch Templates, Autoscaling Groups, and Capacity Providers. For this I recommend using Terraform or Cloudformation.

There are a lot of great resources out there on these topics, so my focus here is going to be on the less documented aspects of things.

The ECS Agent Container

The AWS Container team have a great little agent that takes actions on behalf of the cluster, including registering the instances themselves. The easiest way to run this agent is to launch it as a container.

AWS provides a few example scripts to set as userdata for the instance to get is started, including one for Ubuntu that serves as a great starting point.

#!/bin/bash
# Install Docker
apt-get update -y && apt-get install -y docker.io
echo iptables-persistent iptables-persistent/autosave_v4 boolean true | debconf-set-selections
apt-get update -y && apt-get install -y docker.io
echo iptables-persistent iptables-persistent/autosave_v6 boolean true | debconf-set-selections
apt-get -y install iptables-persistent

# Set iptables rules
echo 'net.ipv4.conf.all.route_localnet = 1' >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf
iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679

# Write iptables rules to persist after reboot
iptables-save > /etc/iptables/rules.v4

# Create directories for ECS agent
mkdir -p /var/log/ecs /var/lib/ecs/data /etc/ecs

# Write ECS config file
cat << EOF > /etc/ecs/ecs.config
ECS_DATADIR=/data
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
ECS_LOGFILE=/log/ecs-agent.log
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","awslogs"]
ECS_LOGLEVEL=info
ECS_CLUSTER=default
EOF

# Write systemd unit file
cat << EOF > /etc/systemd/system/docker-container@ecs-agent.service
[Unit]
Description=Docker Container %I
Requires=docker.service
After=docker.service

[Service]
Restart=always
ExecStartPre=-/usr/bin/docker rm -f %i 
ExecStart=/usr/bin/docker run --name %i \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--volume=/etc/ecs:/etc/ecs \
--net=host \
--env-file=/etc/ecs/ecs.config \
amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker stop %i

[Install]
WantedBy=default.target
EOF

systemctl enable docker-container@ecs-agent.service
systemctl start docker-container@ecs-agent.service

This script has a few components-

Initial system configuration, such as installing specific packages and configuring the firewall.
Creating the ECS configuration file.
Creating the systemd service file- most importantly defining the start command that launches the docker container.
The systemd enable and start commands to make sure the container agent is running and will start again on reboot.

Making changes to the agent generally involves making two changes- modifying the ECS configuration and modifying the docker start command for the container.

Task Based ENIs

In AWS lingo the ENIs (or Elastic Networking Interface) are network interfaces for services and tasks. Each ENI maps to an IP address (public or private depending on the subnet settings). ECS has the ability to assign an ENI to each task, so they have their own network connection. This gives us a lot of flexibility- at a minimum we don't have to worry about conflicting ports on the host machine and we can assign tasks their own security groups.

The easiest part of this is going to be updating our configuration to set the ECS_ENABLE_TASK_ENI option.

cat << EOF > /etc/ecs/ecs.config
## -- other config ##
ECS_ENABLE_TASK_ENI=true
EOF

Of course if it was that simple it wouldn't need a blog post. At this point if you attempt to run the agent you're going to see this error-

Unable to initialize Task ENI dependencies: agent is not started with an init system

To manage ENIs the agent needs to manage its own init system so it can launch and manage processes directly, rather than relying on docker.

You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container.
https://docs.docker.com/engine/reference/run/#specify-an-init-process

Besides setting the init flag it turns out we also need to share more of the underlying filesystem with the container so that it can gather all the information it needs to manage the network devices. We also need to grant additional capabilities to the container. Our newly revised file now looks like this-

[Service]
Restart=always
ExecStartPre=-/usr/bin/docker rm -f %i
ExecStart=/usr/bin/docker run --name %i \
--init \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--volume=/etc/ecs:/etc/ecs \
--volume=/sbin:/host/sbin \
--volume=/lib:/lib \
--volume=/lib64:/lib64 \
--volume=/usr/lib:/usr/lib \
--volume=/usr/lib64:/usr/lib64 \
--volume=/proc:/host/proc \
--volume=/sys/fs/cgroup:/sys/fs/cgroup \
--net=host \
--env-file=/etc/ecs/ecs.config \
--cap-add=sys_admin \
--cap-add=net_admin \
amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker stop %i

With these changes we can now run task based ENIs!

First Attempt at GPUs

The first steps of getting GPUs working is very similar to getting Task based ENIs working- we alter configuration and mount some more docker flags to attach the appropriate volumes and devices.

First we need to enable GPU support and set the runtime to nvidia (which is the current default, making this setting a bit redundant).

cat << EOF > /etc/ecs/ecs.config
## -- other config ##
ECS_ENABLE_GPU_SUPPORT=true
ECS_NVIDIA_RUNTIME=nvidia
EOF

Modifying the agent launch command is a little more complicated in this case because we need to add each GPU as a volume and different instance types have different amounts of GPUs (anywhere from zero to sixteen).

To make this work we need to write a little bash to dynamically generate the arguments. The devices are zero indexed with a maximum of sixteen, so we start at /dev/nvidia0 and work our way up to /dev/nvidia15, testing that each one exists before adding it to our config. We'll also need to add a few other nvidia related resources in.

# This quote gets parsed in the shell, and then in the script, 
# so lots of slashed are needed.
DEVICES=""
for DEVICE_INDEX in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
  DEVICE_PATH="/dev/nvidia${DEVICE_INDEX}"
  if [ -e "$DEVICE_PATH" ]; then
    DEVICES="${DEVICES} --device ${DEVICE_PATH}:${DEVICE_PATH} "
  fi
done
DEVICE_MOUNTS=`printf "$DEVICES"`


# Write systemd unit file for ECS Agent
cat << EOF > /etc/systemd/system/docker-container@ecs-agent.service
# Truncating top of config
ExecStart=/usr/bin/docker run --name %i \
--init \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--volume=/var/lib/nvidia-docker/volumes/nvidia_driver/latest:/usr/local/nvidia \
--device /dev/nvidiactl:/dev/nvidiactl \
$DEVICE_MOUNTS \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
amazon/amazon-ecs-agent:latest

If any of these volumes don't exist docker will fail to start the container.

GPU Info File

If you've done the above you should now be hitting this error-

Config for GPU support is enabled, but GPU information is not found; continuing without it

Unfortunately for us this isn't documented anywhere- but AWS has released the agent under an open source license so we can read the code directly! Diving into the source code we can see that the agent is looking for a file at /var/lib/ecs/gpu/nvidia-gpu-info.json. We can get the rough structure from the source code of the agent, and I went ahead and verified that against an official Amazon Linux ECS AMI.

The structure turns out to be really simple- a JSON object with the driver version as one value and an array of GPU IDs as the other. These GPU IDs are the actual UUID for each GPU, rather than the indexed device number.

Using nvidia-smi we can retrieve the GPU ID's and nvidia driver versions and feed them into the agents. The jq utility is used to make the json list from the GPU ID array without getting trapped in string building hell. This portion of the script should run before the ECS agent is launched, so we'll end up putting it at the top of the userdata script.

Originally when this was written we used the "cut" command to get the column of data. Unfortunately nvidia-smi broke their GPU Identifier Formatting with the A100 instances, so we've had to change our method here to be a bit more robust.

# install jq
apt-get install -y jq

# Register GPUs with ECS
DRIVER_VERSION=$(modinfo nvidia --field version)
IFS="\n"
IDS=()
for x in `nvidia-smi -L`; do
  IDS+=$(echo "$x" | perl -ne 'print "$1\n" if /(UUID: (.+?))/')
done

ID_JSON=$(printf '%s\n' "$${IDS[@]}" | jq -R . | jq -s -c .)

echo "\{\"DriverVersion\":\"$DRIVER_VERSION\",\"GPUIDs\":$ID_JSON\}" > /var/lib/ecs/gpu/nvidia-gpu-info.json

Finally we need to update the docker container start arguments to mount this file into the ecs agent container so the agent can actually read it.

--volume=/var/lib/ecs/gpu:/var/lib/ecs/gpu \

At this point the agent will launch, register the GPUs, and start assigning tasks! Of course those tasks won't actually launch, but we're that much closer!

Don't see the GPUs as a registered resource in the AWS Console? Don't worry, that's a bug with the console and not with your agent.

NVIDIA Runtime

We really shouldn't need this, as docker 19.03 added native support for GPUs. The ECS Agent still wants to send the runtime command when launching containers though, so we still need to install the docker nvidia runtime. Fortunately this package is easy to install and will get rid of any runtime errors.

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

apt-get install -y nvidia-docker2

At this point the tasks should successfully launch, putting our Ubuntu ECS machines in parity with the official AWS ECS AMIs. This issue is being tracked on the AWS Containers Roadmap so this step will likely be removed in the future.

Private Registry Support with AWS Secrets Manager

AWS supports private registry login for both Fargate and EC2 based ECS clusters, but in slightly different ways- Fargate uses a secret in the AWS Secrets Manager while ECS expects you to set values in the ECS Agent's configuration. To keep managing credentials easier I want to use one secret for both, and have the instances load the secret at launch.

To pull the secret out of AWS we're going to use the secretcli utility to populate a variable that we then inject into our configuration. We will also need to alter our IAM Role for the machines to give them access to this secret.

Fargate uses a secret with a json object that has a "username" and "password" field containing Docker Hub credentials. This object can be reused for the ECS Agent config as well, but it will need to be altered slightly.

# Get Docker Hub credentials
pip3 install secretcli
DOCKER_HUB_CREDENTIALS=$(secretcli download docker_hub_readonly -r us-west-2)

cat << EOF > /etc/ecs/ecs.config
## -- other config ##
ECS_ENGINE_AUTH_TYPE=docker
ECS_ENGINE_AUTH_DATA={"https://index.docker.io/v1/":$DOCKER_HUB_CREDENTIALS}
EOF

With that addition our instances will be able to log into Docker Hub without having to hardcode credentials into our startup scripts.

Tightening up Security

The ECS Agent has some additional settings which should be considered-

ECS_UPDATES_ENABLED lets you tell running instances to update their agent, so you can stay up to date without having to cycle machines. Unfortunately it doesn't work on Ubuntu right now, so this should be unset or set to false. That being said you can still update by pulling the latest amazon/amazon-ecs-agent:latest image, you just can't trigger it from the console.
ECS_DISABLE_PRIVILEGED should be set to true unless you explicitly have a reason for giving containers privileged access. Running containers in privileged mode opens up a lot more exploits.
ECS_AWSVPC_BLOCK_IMDS prevents tasks from being able to access the instance metadata- which includes its credentials and IAM Role.

A100/p4d.24xlarge Update

Are you suddenly getting a CUDA_ERROR_SYSTEM_NOT_READY error when attempting to launch tasks?

The new A100 cards from nvidia are impressive, and one of the things that makes them so impressive is the nvswitch devices that connect the GPUs together. This new system requires it's own service, the nvidia fabric manager, to run before the GPUs can be properly utilized. This is likely installed already from when you installed the drivers, but still needs to be enabled.

systemctl enable nvidia-fabricmanager.service
systemctl start nvidia-fabricmanager.service

Wrapping it all up

At this point we've got it all-

Ubuntu
First Class GPU support
Private Registry Access
Tightened Security

It's worth reviewing the full list of agent settings and customizing to your needs- if you are using more than one cluster you'll need to set ECS_CLUSTER, and there are also useful settings for spot instances. The entire script is below, along with a few of my preferred settings.

If you have any questions feel free to ask (@tedivm on twitter is the best place for quick response), and don't forget to check out my other projects!

#!/bin/bash

# Create directories for ECS agent
mkdir -p /var/log/ecs /var/lib/ecs/{data,gpu} /etc/ecs

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia docker runtime
apt-get install -y nvidia-docker2

# Install jq to build JSON
apt-get install -y jq

# Build array of GPU IDs
DRIVER_VERSION=$(modinfo nvidia --field version)
IFS="\n"
IDS=()
for x in `nvidia-smi -L`; do
  IDS+=$(echo "$x" | perl -ne 'print "$1\n" if /(UUID: (.+?))/')
done

# Convert GPU IDs to JSON Array
ID_JSON=$(printf '%s\n' "${IDS[@]}" | jq -R . | jq -s -c .)

# Create JSON GPU Object and populate nvidia-gpu-info.json 
echo "\{\"DriverVersion\":\"${DRIVER_VERSION}\",\"GPUIDs\":${ID_JSON}\}" > /var/lib/ecs/gpu/nvidia-gpu-info.json

# Create list of GPU devices
DEVICES=""
for DEVICE_INDEX in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
  DEVICE_PATH="/dev/nvidia${DEVICE_INDEX}"
  if [ -e "$DEVICE_PATH" ]; then
    DEVICES="${DEVICES} --device ${DEVICE_PATH}:${DEVICE_PATH} "
  fi
done
DEVICE_MOUNTS=`printf "$DEVICES"`

# Set iptables rules needed to enable Task IAM Roles
echo 'net.ipv4.conf.all.route_localnet = 1' >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf
iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679

# Write iptables rules to persist after reboot
iptables-save > /etc/iptables/rules.v4

# Get Docker Hub credentials
pip3 install secretcli
DOCKER_HUB_CREDENTIALS=$(secretcli download docker_hub_readonly -r us-west-2)


# Add ECS Config
cat << EOF > /etc/ecs/ecs.config
ECS_CLUSTER=MY_CLUSTER_NAME
ECS_DATADIR=/data
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
ECS_LOGFILE=/log/ecs-agent.log
ECS_AVAILABLE_LOGGING_DRIVERS=["syslog", "json-file", "journald", "awslogs"]
ECS_LOGLEVEL=info
ECS_UPDATES_ENABLED=false
ECS_DISABLE_PRIVILEGED=true
ECS_AWSVPC_BLOCK_IMDS=true
ECS_ENABLE_TASK_ENI=true
ECS_CONTAINER_INSTANCE_PROPAGATE_TAGS_FROM=true
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true
ECS_ENABLE_SPOT_INSTANCE_DRAINING=true
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h
ECS_ENGINE_AUTH_TYPE=docker
ECS_ENGINE_AUTH_DATA={"https://index.docker.io/v1/":$DOCKER_HUB_CREDENTIALS}
ECS_ENABLE_GPU_SUPPORT=true
ECS_NVIDIA_RUNTIME=nvidia
EOF


# Write systemd unit file for ECS Agent
cat << EOF > /etc/systemd/system/docker-container@ecs-agent.service
[Unit]
Description=Docker Container %I
Requires=docker.service
After=docker.service

[Service]
Restart=always
ExecStartPre=-/usr/bin/docker rm -f %i
ExecStart=/usr/bin/docker run --name %i \
--init \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--volume=/etc/ecs:/etc/ecs \
--volume=/sbin:/host/sbin \
--volume=/lib:/lib \
--volume=/lib64:/lib64 \
--volume=/usr/lib:/usr/lib \
--volume=/usr/lib64:/usr/lib64 \
--volume=/proc:/host/proc \
--volume=/sys/fs/cgroup:/sys/fs/cgroup \
--net=host \
--env-file=/etc/ecs/ecs.config \
--cap-add=sys_admin \
--cap-add=net_admin \
--volume=/var/lib/nvidia-docker/volumes/nvidia_driver/latest:/usr/local/nvidia \
--device /dev/nvidiactl:/dev/nvidiactl \
${DEVICE_MOUNTS} \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
--volume=/var/lib/ecs/gpu:/var/lib/ecs/gpu \
amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker stop %i

[Install]
WantedBy=default.target
EOF


# Reload daemon files
/bin/systemctl daemon-reload

# Enabling ECS Agent
systemctl enable docker-container@ecs-agent.service
systemctl start docker-container@ecs-agent.service