Hosting and distributing Debian packages with aptly and S3
This article describes Thread’s current setup for hosting and distributing Debian packages. We’ll first explain why we ended up with this setup and provide steps necessary to replicate it as well as a high level cost overview.
Context & Goals
We currently distribute most of our Python applications to production as Debian packages. This is native to our distribution of choice (Debian), and offers clear advantages such as dependency resolution between packages and integration with systemd
.
We previously used a hosted service called Aptnik to distribute private packages. This worked well for some years, unfortunately it was discontinued earlier this year, leading us to evaluate alternatives.
Our application delivery process is rather simple 1:
- On successful CI build, we upload the package(s) to the apt repository.
- We release by ssh-ing into production machines and installing the latest version.
As such the requirements were mainly:
- Provide a simple & safe mechanism for uploading new package versions.
- Securely & reliably serve the packages to our infrastructure (bare metal and cloud VMs).
Our first approach was to look at other fully managed products such as PackageCloud or Cloudsmith to minimise the required engineering effort and operational overhead. Our trial of PackageCloud indicated that their upload API was eventually consistent (leading to unreliable releases) and that this would be too costly 2 for our use case. We decided to look at hosting our packages on our own infrastructure and settled on using Aptly (an open source application built to maintain local apt repositories) and serving packages through Amazon S3.
The setup
We'll detail here how we are running the aptly service and how we interact with it from our machines.
You'll need a valid aptly configuration file and GPG key as well as an Ansible controlled machine to run the aptly API and an S3 bucket.
Running Aptly
As for most of our infrastructure, everything is setup through Ansible and configured to run through systemd
:
All the steps should be fairly self explanatory and reproducible outside of Ansible; but we'll expand on the details of the more custom parts of the setup.
-
The aptly API is exposed behind an nginx proxy and secured through HTTPS (using Let's Encrypt) and basic authentication. This provides some level of security when accessing it from our CI provider which is where packages are uploaded from:
-
As we release quite often, we setup the following script to routinely drop old versions of our packages and keep storage needs under control. This gives us enough versions to promote older packages for short term rollbacks. For longer term rollbacks we can rebuild from scratch or recover older packages from backups.
-
Omitted from the Ansible role are monitoring & backup configurations used for added reliability.
Uploading packages
Now that we have a running aptly API, we start uploading packages. We use the following script after building debs and storing all the packages in a single directory.
The only tricky part is using a separate stage directory for each set of related packages. This is necessary to avoid race conditions when multiple CI processes upload simultaneously.
This is more than one step, but it is reliable and ensures that releasing dependent packages is done in a single, consistent, operation, that is once the above script exits successfully, we know that all the new versions will be available to our machines.
This was important as some hosted solutions would not provide a consistent upload endpoint and instead released on a timer leading to inconsistent and delayed releases. In practice this showed up as not all the uploaded packages being available when we updated running machines which lead to incompatible versions and missed releases.
Downloading packages
Aptly itself is used to manage the repository (hierarchy, manifests, signing, etc.) and doesn't serve packages by default. We've chosen to expose our repository through S3 and download packages directly from there to benefit from AWS access control and reliability. On machines, we use apt-transport-s3.
The only thing required to get this to work was adding the following rules to our production machines Ansible role:
Where s3auth.conf
contains your S3 credentials and sources.list
contains the following line: deb s3://${BUCKET_NAME}.s3.amazonaws.com/${PATH_PREFIX}/ any main
.
Final cost estimate
As mentioned above, one of the main reasons to go with a self managed solution was to keep costs under control. So how are we doing now?
Our current costs can be broken into:
Hosting cost:
- We are hosting aptly on one of our existing VMs alongside other workloads which doesn't incur any extra cost for us.
- Looking at resource usage, we could easily run this on a
t3.small
EC2 instance for less than $17 a month.
Storage cost:
- Our current aptly directory is around 15GB.
- Our VM similarly doesn't incur extra cost as we had enough free disk space to spare.
- This comes out at less than $1 in monthly S3 cost.
- If we were to host aptly on an EC2 instance, this would also come out at less than $1 per month for the required EBS volume.
Bandwidth cost:
- On average, we incur $4.5 per day for egress bandwidth from our bucket.
- This comes out as ~$140 per month which is around 1.5TB of egress.
This cost is entirely incurred due to downloading packages onto our bare metal machines, which are not running in AWS. Our EC2 instances are within the same region as the S3 bucket, incurring no bandwidth cost.
If we wanted to optimise our costs further, we could explore publishing the aptly repository to locations closer to our bare metal machine 3. That being said, while hosting on S3 does incur some cost, there are other non trivial benefits. The main one for us being reliability: our application delivery is split from our main infrastructure and should not be a blocker in a recovery scenario.
This setup has required little to no maintenance past the initial implementation and has proven easy to use for the engineering team. It provides significantly cheaper bandwidth than what similar plans on hosted services would offer, as well as stronger consistency guarantees. We've now been running it for almost 5 months and overall this has been well worth the extra effort.
There are other considerations which add complexity when releasing software, but these are fairly orthogonal to the problem at hand.
The main cost driver being bandwidth charges: our software packages can get quite large — in the order of 200-300MB — and with our current release frequency — around 10-15/day — this would have forced us into tiers above $700 per month (the highest advertised tier for PackageCloud).
Aptly supports publishing to the filesystem at which point we can serve the repository through nginx alongside the API.