Hosting and distributing Debian packages with aptly and S3

This article describes Thread’s current setup for hosting and distributing Debian packages. We’ll first explain why we ended up with this setup and provide steps necessary to replicate it as well as a high level cost overview.

Context & Goals

We currently distribute most of our Python applications to production as Debian packages. This is native to our distribution of choice (Debian), and offers clear advantages such as dependency resolution between packages and integration with systemd.

We previously used a hosted service called Aptnik to distribute private packages. This worked well for some years, unfortunately it was discontinued earlier this year, leading us to evaluate alternatives.

Our application delivery process is rather simple 1:

  1. On successful CI build, we upload the package(s) to the apt repository.
  2. We release by ssh-ing into production machines and installing the latest version.

As such the requirements were mainly:

  1. Provide a simple & safe mechanism for uploading new package versions.
  2. Securely & reliably serve the packages to our infrastructure (bare metal and cloud VMs).

Our first approach was to look at other fully managed products such as PackageCloud or Cloudsmith to minimise the required engineering effort and operational overhead. Our trial of PackageCloud indicated that their upload API was eventually consistent (leading to unreliable releases) and that this would be too costly 2 for our use case. We decided to look at hosting our packages on our own infrastructure and settled on using Aptly (an open source application built to maintain local apt repositories) and serving packages through Amazon S3.

The setup

We'll detail here how we are running the aptly service and how we interact with it from our machines.

You'll need a valid aptly configuration file and GPG key as well as an Ansible controlled machine to run the aptly API and an S3 bucket.

Running Aptly

As for most of our infrastructure, everything is setup through Ansible and configured to run through systemd:

Ansible role
---
# Group and user to run under
- name: Ensure "aptly" group exists
  group:
    name: aptly
    state: present

- name: Add "aptly" user
  user:
    name: aptly
    group: aptly

- name: Import aptly repository key
  apt_key:
    id=ED75B5A4483DA07C
    keyserver=hkp://p80.pool.sks-keyservers.net:80
    state=present

- name: Add aptly repository
  apt_repository:
    repo="deb http://repo.aptly.info/ squeeze main"

- name: Install required packages
  apt:
    pkg={{ item }}
    update_cache=yes
  with_items:
    # See: https://www.aptly.info/doc/feature/pgp-providers/
    - gnupg1
    - gpgv1
    - aptly

- name: /var/lib/aptly
  file:
    path=/var/lib/aptly
    state=directory
    group=aptly
    owner=aptly

- name: Copy public key
  copy:
    src: ../files/key.pub
    dest: /var/lib/aptly/key.pub
    group: aptly
    owner: aptly

- name: Copy secret key
  copy:
    src: ../files/key.sec
    dest: /var/lib/aptly/key.sec
    group: aptly
    owner: aptly

# This needs to use gpg1 so the correct keyring is used and aptly can pick up
# on the keys later on.
- name: Import public key to gpg
  command: gpg1 --import /var/lib/aptly/key.pub
  become: yes
  become_user: aptly

- name: Import secret key to gpg
  command: gpg1 --import /var/lib/aptly/key.sec
  become: yes
  become_user: aptly
  # Ignore 'already in secret keyring' error
  ignore_errors: yes

- name: /etc/aptly.conf
  template:
    src=aptly.conf
    dest=/etc/aptly.conf
    mode=644
    group=aptly
    owner=aptly

- name: aptly.service
  template:
    src=aptly.service
    dest=/etc/systemd/system/aptly.service
    mode=644
  notify:
    restart aptly

- name: cleanup.sh
  template:
    src=cleanup.sh
    dest=/var/lib/aptly/cleanup.sh
    mode=755

- name: aptly-cleanup.service
  template:
    src={{ item }}
    dest=/etc/systemd/system/{{ item }}
    mode=644
  with_items:
    - aptly-cleanup.service
    - aptly-cleanup.timer

- name: certbot for ${YOUR_APRLY_DOMAIN} certificate
  include_role:
    name: geerlingguy-certbot
  vars:
    certbot_create_standalone_stop_services:
    - nginx
    certbot_auto_renew_options: --quiet --no-self-upgrade --pre-hook "systemctl stop nginx" --post-hook "systemctl start nginx"
    certbot_certs:
    - domains:
      - ${YOUR_APRLY_DOMAIN}

- name: /etc/nginx/aptly.htpasswd
  copy:
    src=../files/aptly.htpasswd
    dest=/etc/nginx/aptly.htpasswd
    mode=644

- name: /etc/nginx/sites-enabled/aptly
  template:
    src=nginx.conf
    dest=/etc/nginx/sites-enabled/aptly
    mode=644
  notify:
    restart nginx

- name: running
  service:
    name=aptly
    state=started
    enabled=yes

- name: enable cleanup timer
  service:
    name: aptly-cleanup.timer
    state: started
    enabled: yes
aptly.service unit file
[Unit]
Description=Aptly API
ConditionPathExists=/etc/aptly.conf

[Service]
Type=simple
WorkingDirectory=/var/lib/aptly
ExecStart=/usr/bin/aptly api serve -listen "localhost:{{ aptly_api_port }}" -no-lock
Restart=always
SyslogIdentifier=aptly
User=aptly

[Install]
WantedBy=multi-user.target

All the steps should be fairly self explanatory and reproducible outside of Ansible; but we'll expand on the details of the more custom parts of the setup.

Uploading packages

Now that we have a running aptly API, we start uploading packages. We use the following script after building debs and storing all the packages in a single directory.

upload-debs.sh
#!/usr/bin/env bash

set -xeu

APTLY_URL="https://${BASIC_AUTH}@${YOUR_APTLY_DOMAIN}"
APTLY_STAGE_DIRECTORY="${STAGE_DIRECTORY}"
APTLY_CURL_FLAGS="--include --fail"

for f in *deb; do
    # Upload the file to staging area of aptly. This does not publish the package.
    curl ${APTLY_CURL_FLAGS} --form "file=@${f}" --request POST "${APTLY_URL}/api/files/${APTLY_STAGE_DIRECTORY}"
done

# Tell aptly to include all the staged files from the stage directory
# into the repo. This will include any file put there so make sure no
# other process stages files there to avoid conflicts with other CI processes.
curl ${APTLY_CURL_FLAGS} --request POST "${APTLY_URL}/api/repos/${APTLY_REPO}/file/${APTLY_STAGE_DIRECTORY}"

# Tell aptly to publish the repo to S3. This will lock so only one CI process owns the operation.
curl ${APTLY_CURL_FLAGS} --request PUT "${APTLY_URL}/api/publish/s3:${S3_BUCKET}:${PATH_PREFIX}/" \
     --header 'Content-Type: application/json' \
     --data '{}'

The only tricky part is using a separate stage directory for each set of related packages. This is necessary to avoid race conditions when multiple CI processes upload simultaneously.

This is more than one step, but it is reliable and ensures that releasing dependent packages is done in a single, consistent, operation, that is once the above script exits successfully, we know that all the new versions will be available to our machines.

This was important as some hosted solutions would not provide a consistent upload endpoint and instead released on a timer leading to inconsistent and delayed releases. In practice this showed up as not all the uploaded packages being available when we updated running machines which lead to incompatible versions and missed releases.

Downloading packages

Aptly itself is used to manage the repository (hierarchy, manifests, signing, etc.) and doesn't serve packages by default. We've chosen to expose our repository through S3 and download packages directly from there to benefit from AWS access control and reliability. On machines, we use apt-transport-s3.

The only thing required to get this to work was adding the following rules to our production machines Ansible role:

Package downloader Ansible role
- name: Thread Aptly GPG key
    get_url:
        url: https://${YOUR_APTLY_DOMAIN}/gpgkey
        dest: /etc/apt/trusted.gpg.d/thread.aptly.gpg.asc
        mode: '0644'
        
- name: apt-transport-s3
    action: apt
      pkg={{ item }}
      update_cache=yes
      default_release={{ debian_release }}
      cache_valid_time=43200
    with_items:
      - apt-transport-s3
      
- name: sources.list
    template:
      src=sources.list
      dest=/etc/apt/sources.list
      mode=0644
    notify:
      update APT cache

  - name: /etc/apt/s3auth.conf
    template:
        src=s3auth.conf
        dest=/etc/apt/s3auth.conf
        mode=0644

Where s3auth.conf contains your S3 credentials and sources.list contains the following line: deb s3://${BUCKET_NAME}.s3.amazonaws.com/${PATH_PREFIX}/ any main.

Final cost estimate

As mentioned above, one of the main reasons to go with a self managed solution was to keep costs under control. So how are we doing now?

Our current costs can be broken into:

Hosting cost:

Storage cost:

Bandwidth cost:

This cost is entirely incurred due to downloading packages onto our bare metal machines, which are not running in AWS. Our EC2 instances are within the same region as the S3 bucket, incurring no bandwidth cost.

If we wanted to optimise our costs further, we could explore publishing the aptly repository to locations closer to our bare metal machine 3. That being said, while hosting on S3 does incur some cost, there are other non trivial benefits. The main one for us being reliability: our application delivery is split from our main infrastructure and should not be a blocker in a recovery scenario.


This setup has required little to no maintenance past the initial implementation and has proven easy to use for the engineering team. It provides significantly cheaper bandwidth than what similar plans on hosted services would offer, as well as stronger consistency guarantees. We've now been running it for almost 5 months and overall this has been well worth the extra effort.


1

There are other considerations which add complexity when releasing software, but these are fairly orthogonal to the problem at hand.

2

The main cost driver being bandwidth charges: our software packages can get quite large — in the order of 200-300MB — and with our current release frequency — around 10-15/day — this would have forced us into tiers above $700 per month (the highest advertised tier for PackageCloud).

3

Aptly supports publishing to the filesystem at which point we can serve the repository through nginx alongside the API.