<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<title>lirsacc</title>
	<link href="https://lirsac.me/atom.xml" rel="self" type="application/atom+xml"/>
	<link href="https://lirsac.me"/>
	<generator uri="https://www.getzola.org/">Zola</generator>
	<updated>2021-04-19T00:00:00+00:00</updated>
	<id>https://lirsac.me/atom.xml</id>
	<entry xml:lang="en">
		<title>How to DOS a Django application with bad GET requests.</title>
		<published>2021-04-19T00:00:00+00:00</published>
		<updated>2021-04-19T00:00:00+00:00</updated>
		<link href="https://lirsac.me/blog/gunicorn-dos-vector/" type="text/html"/>
		<id>https://lirsac.me/blog/gunicorn-dos-vector/</id>
		<content type="html">&lt;aside&gt;
&lt;p&gt;This article was originally published on &lt;a href=&quot;https:&#x2F;&#x2F;thread.engineering&#x2F;2020-11-23-gunicorn-dos-vector&#x2F;&quot;&gt;Thread&#x27;s engineering blog&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;&#x2F;aside&gt;
&lt;p&gt;Like any complex production service, Thread experiences outages and issues in production that affect our users from time to time. In order to improve our service quality we run post-mortems of these incidents to identify ways to fix the root causes and improve service reliability moving forward. This post summarises one such analysis.&lt;&#x2F;p&gt;
&lt;p&gt;Typical Django deployments geared for production are usually comprised of a WSGI server such as Gunicorn running your app, and a reverse proxy such as NGINX. The latter tend to scale better and are less susceptible to common Denial Of Service attacks; which is one of the &lt;a href=&quot;https:&#x2F;&#x2F;docs.gunicorn.org&#x2F;en&#x2F;latest&#x2F;deploy.html&quot;&gt;main reasons&lt;&#x2F;a&gt; to not expose your WSGI server directly.&lt;&#x2F;p&gt;
&lt;p&gt;Because WSGI servers like Gunicorn aren&#x27;t built to protect from Denial Of Service attacks, it&#x27;s easy to trigger DoS conditions against a Django application with even slightly malformed requests. For example, sending a request with a body smaller than the advertised &lt;code&gt;Content-Length&lt;&#x2F;code&gt;. If the view tries to read the body, it will hang and block the current worker. &lt;a href=&quot;https:&#x2F;&#x2F;code.djangoproject.com&#x2F;ticket&#x2F;29800&quot;&gt;This issue&lt;&#x2F;a&gt; has more details. With NGINX the &lt;a href=&quot;http:&#x2F;&#x2F;nginx.org&#x2F;en&#x2F;docs&#x2F;http&#x2F;ngx_http_proxy_module.html#proxy_request_buffering&quot;&gt;&lt;code&gt;proxy_request_buffering&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; setting is what protects us from this before it hits the backend.&lt;&#x2F;p&gt;
&lt;p&gt;This specific situation should not have been a problem for us given that we run behind a properly configured NGINX instance; so we were pretty surprised when we tracked down a drop in Gunicorn capacity to workers being stuck on a &lt;code&gt;read()&lt;&#x2F;code&gt; call and waiting for a &lt;code&gt;GET&lt;&#x2F;code&gt; request&#x27;s body to be available &lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#1&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. &lt;&#x2F;p&gt;
&lt;p&gt;We tracked the original request coming into our infrastructure and at the point of hitting NGINX this requests was correct (i.e. the &lt;code&gt;Content-Length&lt;&#x2F;code&gt; header matched its body). This pointed to the real cause here: we&#x27;d recently introduced a Node.js based server to handle server-side rendering of our React frontend. The architecture is as follows:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;We run multiple NGINX nodes exposed to the internet. These accept requests coming into our infrastructure, distribute to various backend servers and decide whether a request should go through the Node.js SSR server or can go directly to the Gunicorn server (e.g. GraphQL API requests).&lt;&#x2F;li&gt;
&lt;li&gt;Each backend server runs a Node.js process and a Gunicorn process. The Node.js process fronts Gunicorn, running the single page code when it can and forwarding what would have been AJAX requests, or proxying any requests it cannot handle directly as a fallback.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Requests from Node.js to Gunicorn do not go through NGINX for 2 main reasons:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;This could introduce a routing loop if NGINX decides to redirect the request back to a Node server.&lt;&#x2F;li&gt;
&lt;li&gt;It ensures minimal latency between the 2 backend components by keeping them local to a single host.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The requests causing our Gunicorn workers to lock up were coming from the Node.js server because we were stripping the bodies from &lt;code&gt;GET&lt;&#x2F;code&gt; requests, &lt;em&gt;corrupting&lt;&#x2F;em&gt; them along the way and creating what was essentially self induced Denial of Service. We do this because the Fetch API &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;whatwg&#x2F;fetch&#x2F;issues&#x2F;551&quot;&gt;does not allow&lt;&#x2F;a&gt; &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;whatwg&#x2F;fetch&#x2F;issues&#x2F;83&quot;&gt;sending GET request with a body&lt;&#x2F;a&gt; &lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#2&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. As we never rely on this ourselves, we missed stripping the relevant headers and this was never caught in normal operations (in fact it took some malicious requests &lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#1&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; trigger this failure mode).&lt;&#x2F;p&gt;
&lt;p&gt;The fix for this was simple: &lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The root cause was fixed by making sure headers are still valid when modifying proxied requests.&lt;&#x2F;li&gt;
&lt;li&gt;A contributing factor that was highlighted by this incident is that we were proxying some API requests through the Node.js server when they didn&#x27;t need to be. This was fixed by improving our routing at the NGINX layer.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;div class=&quot;footnote-definition&quot; id=&quot;1&quot;&gt;&lt;sup class=&quot;footnote-definition-label&quot;&gt;1&lt;&#x2F;sup&gt;
&lt;p&gt;Specifically, we were receiving malformed &lt;code&gt;GET&lt;&#x2F;code&gt; requests sent to our GraphQL endpoint. They had reproduced the request we make from our frontend, replaced some variables with SQL injection and then sent them as &lt;code&gt;GET&lt;&#x2F;code&gt;. The &lt;a href=&quot;https:&#x2F;&#x2F;docs.graphene-python.org&#x2F;projects&#x2F;django&#x2F;en&#x2F;latest&#x2F;&quot;&gt;library&lt;&#x2F;a&gt; we use attempts to read the body of the request regardless of the HTTP method. As we send all our GraphQL requests over &lt;code&gt;POST&lt;&#x2F;code&gt; and GraphQL over &lt;code&gt;GET&lt;&#x2F;code&gt; usually uses query parameters we had not encountered this particular failure mode before.&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;footnote-definition&quot; id=&quot;2&quot;&gt;&lt;sup class=&quot;footnote-definition-label&quot;&gt;2&lt;&#x2F;sup&gt;
&lt;p&gt;Notably this valid HTTP&#x2F;1.1 according to &lt;a href=&quot;https:&#x2F;&#x2F;tools.ietf.org&#x2F;html&#x2F;rfc7231&quot;&gt;RFC 7231&lt;&#x2F;a&gt;, although the semantics are undefined. This is &lt;a href=&quot;https:&#x2F;&#x2F;tools.ietf.org&#x2F;html&#x2F;rfc7540#section-8.1.3&quot;&gt;not valid in HTTP&#x2F;2&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Hosting and distributing Debian packages with aptly and S3</title>
		<published>2019-12-11T00:00:00+00:00</published>
		<updated>2019-12-11T00:00:00+00:00</updated>
		<link href="https://lirsac.me/blog/hosting-and-distributing-debian-packages-with-aptly-and-s3/" type="text/html"/>
		<id>https://lirsac.me/blog/hosting-and-distributing-debian-packages-with-aptly-and-s3/</id>
		<content type="html">&lt;aside&gt;
&lt;p&gt;This article was originally published on &lt;a href=&quot;https:&#x2F;&#x2F;thread.engineering&#x2F;2019-12-11-aptly-setup&#x2F;&quot;&gt;Thread&#x27;s engineering blog&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;&#x2F;aside&gt;
&lt;p&gt;This article describes Thread’s current setup for hosting and distributing Debian packages. We’ll first explain why we ended up with this setup and provide steps necessary to replicate it as well as a high level cost overview.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;context-goals&quot;&gt;Context &amp;amp; Goals&lt;&#x2F;h2&gt;
&lt;p&gt;We currently distribute most of our Python applications to production as Debian packages. This is native to our distribution of choice (Debian), and offers clear advantages such as dependency resolution between packages and integration with &lt;code&gt;systemd&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;We previously used a hosted service called Aptnik to distribute private packages. This worked well for some years, unfortunately it was discontinued earlier this year, leading us to evaluate alternatives.&lt;&#x2F;p&gt;
&lt;p&gt;Our application delivery process is rather simple &lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#1&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;On successful CI build, we upload the package(s) to the apt repository.&lt;&#x2F;li&gt;
&lt;li&gt;We release by ssh-ing into production machines and installing the latest version.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;As such the requirements were mainly:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Provide a simple &amp;amp; safe mechanism for uploading new package versions.&lt;&#x2F;li&gt;
&lt;li&gt;Securely &amp;amp; reliably serve the packages to our infrastructure (bare metal and cloud VMs).&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Our first approach was to look at other fully managed products such as &lt;a href=&quot;https:&#x2F;&#x2F;packagecloud.io&#x2F;&quot;&gt;PackageCloud&lt;&#x2F;a&gt; or &lt;a href=&quot;https:&#x2F;&#x2F;cloudsmith.io&#x2F;&quot;&gt;Cloudsmith&lt;&#x2F;a&gt; to minimise the required engineering effort and operational overhead. 
Our trial of PackageCloud indicated that their upload API was eventually consistent (leading to unreliable releases) and that this would be too costly &lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#2&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; for our use case. We decided to look at hosting our packages on our own infrastructure and settled on using &lt;a href=&quot;https:&#x2F;&#x2F;www.aptly.info&#x2F;&quot;&gt;Aptly&lt;&#x2F;a&gt; (an open source application built to maintain local apt repositories) and serving packages through Amazon S3.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-setup&quot;&gt;The setup&lt;&#x2F;h2&gt;
&lt;p&gt;We&#x27;ll detail here how we are running the aptly service and how we interact with it from our machines.&lt;&#x2F;p&gt;
&lt;p&gt;You&#x27;ll need a valid &lt;a href=&quot;https:&#x2F;&#x2F;www.aptly.info&#x2F;doc&#x2F;configuration&#x2F;&quot;&gt;aptly configuration file&lt;&#x2F;a&gt; and GPG key as well as an &lt;a href=&quot;https:&#x2F;&#x2F;www.ansible.com&#x2F;&quot;&gt;Ansible &lt;&#x2F;a&gt; controlled machine to run the aptly API and an &lt;a href=&quot;https:&#x2F;&#x2F;aws.amazon.com&#x2F;s3&#x2F;&quot;&gt;S3 bucket&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;running-aptly&quot;&gt;Running Aptly&lt;&#x2F;h3&gt;
&lt;p&gt;As for most of our infrastructure, everything is setup through Ansible and configured to run through &lt;code&gt;systemd&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;Ansible role&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;---
# Group and user to run under
- name: Ensure &amp;quot;aptly&amp;quot; group exists
  group:
    name: aptly
    state: present

- name: Add &amp;quot;aptly&amp;quot; user
  user:
    name: aptly
    group: aptly

- name: Import aptly repository key
  apt_key:
    id=ED75B5A4483DA07C
    keyserver=hkp:&#x2F;&#x2F;p80.pool.sks-keyservers.net:80
    state=present

- name: Add aptly repository
  apt_repository:
    repo=&amp;quot;deb http:&#x2F;&#x2F;repo.aptly.info&#x2F; squeeze main&amp;quot;

- name: Install required packages
  apt:
    pkg={{ item }}
    update_cache=yes
  with_items:
    # See: https:&#x2F;&#x2F;www.aptly.info&#x2F;doc&#x2F;feature&#x2F;pgp-providers&#x2F;
    - gnupg1
    - gpgv1
    - aptly

- name: &#x2F;var&#x2F;lib&#x2F;aptly
  file:
    path=&#x2F;var&#x2F;lib&#x2F;aptly
    state=directory
    group=aptly
    owner=aptly

- name: Copy public key
  copy:
    src: ..&#x2F;files&#x2F;key.pub
    dest: &#x2F;var&#x2F;lib&#x2F;aptly&#x2F;key.pub
    group: aptly
    owner: aptly

- name: Copy secret key
  copy:
    src: ..&#x2F;files&#x2F;key.sec
    dest: &#x2F;var&#x2F;lib&#x2F;aptly&#x2F;key.sec
    group: aptly
    owner: aptly

# This needs to use gpg1 so the correct keyring is used and aptly can pick up
# on the keys later on.
- name: Import public key to gpg
  command: gpg1 --import &#x2F;var&#x2F;lib&#x2F;aptly&#x2F;key.pub
  become: yes
  become_user: aptly

- name: Import secret key to gpg
  command: gpg1 --import &#x2F;var&#x2F;lib&#x2F;aptly&#x2F;key.sec
  become: yes
  become_user: aptly
  # Ignore &#x27;already in secret keyring&#x27; error
  ignore_errors: yes

- name: &#x2F;etc&#x2F;aptly.conf
  template:
    src=aptly.conf
    dest=&#x2F;etc&#x2F;aptly.conf
    mode=644
    group=aptly
    owner=aptly

- name: aptly.service
  template:
    src=aptly.service
    dest=&#x2F;etc&#x2F;systemd&#x2F;system&#x2F;aptly.service
    mode=644
  notify:
    restart aptly

- name: cleanup.sh
  template:
    src=cleanup.sh
    dest=&#x2F;var&#x2F;lib&#x2F;aptly&#x2F;cleanup.sh
    mode=755

- name: aptly-cleanup.service
  template:
    src={{ item }}
    dest=&#x2F;etc&#x2F;systemd&#x2F;system&#x2F;{{ item }}
    mode=644
  with_items:
    - aptly-cleanup.service
    - aptly-cleanup.timer

- name: certbot for ${YOUR_APRLY_DOMAIN} certificate
  include_role:
    name: geerlingguy-certbot
  vars:
    certbot_create_standalone_stop_services:
    - nginx
    certbot_auto_renew_options: --quiet --no-self-upgrade --pre-hook &amp;quot;systemctl stop nginx&amp;quot; --post-hook &amp;quot;systemctl start nginx&amp;quot;
    certbot_certs:
    - domains:
      - ${YOUR_APRLY_DOMAIN}

- name: &#x2F;etc&#x2F;nginx&#x2F;aptly.htpasswd
  copy:
    src=..&#x2F;files&#x2F;aptly.htpasswd
    dest=&#x2F;etc&#x2F;nginx&#x2F;aptly.htpasswd
    mode=644

- name: &#x2F;etc&#x2F;nginx&#x2F;sites-enabled&#x2F;aptly
  template:
    src=nginx.conf
    dest=&#x2F;etc&#x2F;nginx&#x2F;sites-enabled&#x2F;aptly
    mode=644
  notify:
    restart nginx

- name: running
  service:
    name=aptly
    state=started
    enabled=yes

- name: enable cleanup timer
  service:
    name: aptly-cleanup.timer
    state: started
    enabled: yes
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;aptly.service unit file&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;[Unit]
Description=Aptly API
ConditionPathExists=&#x2F;etc&#x2F;aptly.conf

[Service]
Type=simple
WorkingDirectory=&#x2F;var&#x2F;lib&#x2F;aptly
ExecStart=&#x2F;usr&#x2F;bin&#x2F;aptly api serve -listen &amp;quot;localhost:{{ aptly_api_port }}&amp;quot; -no-lock
Restart=always
SyslogIdentifier=aptly
User=aptly

[Install]
WantedBy=multi-user.target
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;p&gt;All the steps should be fairly self explanatory and reproducible outside of Ansible; but we&#x27;ll expand on the details of the more custom parts of the setup.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The aptly API is exposed behind an nginx proxy and secured through HTTPS (using Let&#x27;s Encrypt) and basic authentication. This provides some level of security when accessing it from our CI provider which is where packages are uploaded from:&lt;&#x2F;p&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;&#x2F;etc&#x2F;nginx&#x2F;sites-enabled&#x2F;aptly&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;server {
  server_name     ${YOUR_APTLY_DOMAIN}

  listen          80;

  return          301 https:&#x2F;&#x2F;$server_name$request_uri;
}

server {
  listen          443 ssl;
  server_name     ${YOUR_APTLY_DOMAIN}

  # HTTPS certificates
  ssl_certificate     &#x2F;etc&#x2F;letsencrypt&#x2F;live&#x2F;${YOUR_APTLY_DOMAIN}&#x2F;fullchain.pem;
  ssl_certificate_key &#x2F;etc&#x2F;letsencrypt&#x2F;live&#x2F;${YOUR_APTLY_DOMAIN}&#x2F;privkey.pem;

  # We upload debs through this server hence the large size limit.
  client_max_body_size      500M;

  # Expose public key for clients
  location &#x2F;gpgkey {
    alias &#x2F;var&#x2F;lib&#x2F;aptly&#x2F;key.pub;
  }

  location &#x2F; {

    auth_basic              &amp;quot;Restricted&amp;quot;;
    auth_basic_user_file    &#x2F;etc&#x2F;nginx&#x2F;aptly.htpasswd;

    proxy_redirect          off;

    proxy_set_header        Host $host;
    proxy_set_header        X-Real-IP $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        X-Forwarded-Proto $scheme;

    proxy_pass              http:&#x2F;&#x2F;localhost:{{ aptly_api_port }}&#x2F;;
    proxy_read_timeout      300;

    proxy_redirect          default;
  }
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;As we release quite often, we setup the following script to routinely drop old versions of our packages and keep storage needs under control. This gives us enough versions to promote older packages for short term rollbacks. For longer term rollbacks we can rebuild from scratch or recover older packages from backups.&lt;&#x2F;p&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;cleanup.sh&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;#!&#x2F;usr&#x2F;bin&#x2F;env bash

# Cleanup task to ensure the repository does not grow too much.
# This uses the CLI and not the API and is meant to run on the machine
# hosting the aptly db.

set -eu -o pipefail

REPO=&amp;quot;${APTLY_REPO_NAME}&amp;quot;
MAX_VERSIONS=20
ENDPOINT=&amp;quot;s3:${S3_BUCKET}:${PATH_PREFIX}&#x2F;&amp;quot;

deleted=false

# Extract unique package ids currently known in the repo, these include the
# version number and architecture hence the sed + filter to list all unique
# packages by name.
packages=$(aptly repo search ${REPO} | sed -E &#x27;s&#x2F;_[0-9]+_all&#x2F;&#x2F;&#x27; | uniq)

for package in ${packages}; do
  echo &amp;quot;Processing ${package}&amp;quot;

  versions=$(aptly repo search ${REPO} &amp;quot;${package}&amp;quot; | sed -E &#x27;s&#x2F;[^0-9]&#x2F;&#x2F;g&#x27;)
  version_count=$(echo &amp;quot;${versions}&amp;quot; | wc -w)

  echo &amp;quot;- ${version_count} versions found&amp;quot;

  if [ &amp;quot;$version_count&amp;quot; -le &amp;quot;$MAX_VERSIONS&amp;quot; ]; then
    echo &amp;quot;- Not cleaning up ${package}&amp;quot;
  else
    echo &amp;quot;- Cleaning up $() ${package}&amp;quot;

    # There must be a better way to do this...
    highmark=$(for x in $versions; do echo &amp;quot;$x&amp;quot;; done | sort -V -r | tail -n +&amp;quot;${MAX_VERSIONS}&amp;quot; | head -1)

    # See https:&#x2F;&#x2F;www.aptly.info&#x2F;doc&#x2F;feature&#x2F;query&#x2F; for details on how the query works.
    aptly repo remove ${REPO} &amp;quot;${package} (&amp;lt;&amp;lt; ${highmark})&amp;quot;

    deleted=true
  fi
done

if [ &amp;quot;$deleted&amp;quot; = true ] ; then
  # Removed dangling references
  aptly db cleanup
  # Assuming the repo had been published already, this will just update the remote
  aptly publish update any &amp;quot;${ENDPOINT}&amp;quot;
fi
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;aptly-cleanup.service&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;[Unit]
Description=Cleanup old versions from Aptly repo
ConditionPathExists=&#x2F;var&#x2F;lib&#x2F;aptly&#x2F;cleanup.sh

[Service]
Type=oneshot
WorkingDirectory=&#x2F;var&#x2F;lib&#x2F;aptly
ExecStart=&#x2F;var&#x2F;lib&#x2F;aptly&#x2F;cleanup.sh
SyslogIdentifier=aptly
User=aptly

[Install]
WantedBy=multi-user.target
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;aptly-cleanup.timer&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;[Unit]
Description=Cleanup old versions from Aptly repo

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;Omitted from the Ansible role are monitoring &amp;amp; backup configurations used for added reliability.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;uploading-packages&quot;&gt;Uploading packages&lt;&#x2F;h3&gt;
&lt;p&gt;Now that we have a running aptly API, we start uploading packages. We use the following script after building debs and storing all the packages in a single directory.&lt;&#x2F;p&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;upload-debs.sh&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;#!&#x2F;usr&#x2F;bin&#x2F;env bash

set -xeu

APTLY_URL=&amp;quot;https:&#x2F;&#x2F;${BASIC_AUTH}@${YOUR_APTLY_DOMAIN}&amp;quot;
APTLY_STAGE_DIRECTORY=&amp;quot;${STAGE_DIRECTORY}&amp;quot;
APTLY_CURL_FLAGS=&amp;quot;--include --fail&amp;quot;

for f in *deb; do
    # Upload the file to staging area of aptly. This does not publish the package.
    curl ${APTLY_CURL_FLAGS} --form &amp;quot;file=@${f}&amp;quot; --request POST &amp;quot;${APTLY_URL}&#x2F;api&#x2F;files&#x2F;${APTLY_STAGE_DIRECTORY}&amp;quot;
done

# Tell aptly to include all the staged files from the stage directory
# into the repo. This will include any file put there so make sure no
# other process stages files there to avoid conflicts with other CI processes.
curl ${APTLY_CURL_FLAGS} --request POST &amp;quot;${APTLY_URL}&#x2F;api&#x2F;repos&#x2F;${APTLY_REPO}&#x2F;file&#x2F;${APTLY_STAGE_DIRECTORY}&amp;quot;

# Tell aptly to publish the repo to S3. This will lock so only one CI process owns the operation.
curl ${APTLY_CURL_FLAGS} --request PUT &amp;quot;${APTLY_URL}&#x2F;api&#x2F;publish&#x2F;s3:${S3_BUCKET}:${PATH_PREFIX}&#x2F;&amp;quot; \
     --header &#x27;Content-Type: application&#x2F;json&#x27; \
     --data &#x27;{}&#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;p&gt;The only tricky part is using a separate stage directory for each set of related packages. This is necessary to avoid race conditions when multiple CI processes upload simultaneously.&lt;&#x2F;p&gt;
&lt;p&gt;This is more than one step, but it is reliable and ensures that releasing dependent packages is done in a single, consistent, operation, that is once the above script exits successfully, we know that all the new versions will be available to our machines. &lt;&#x2F;p&gt;
&lt;p&gt;This was important as some hosted solutions would not provide a consistent upload endpoint and instead released on a timer leading to inconsistent and delayed releases. In practice this showed up as not all the uploaded packages being available when we updated running machines which lead to incompatible versions and missed releases.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;downloading-packages&quot;&gt;Downloading packages&lt;&#x2F;h3&gt;
&lt;p&gt;Aptly itself is used to manage the repository (hierarchy, manifests, signing, etc.) and doesn&#x27;t serve packages by default. We&#x27;ve chosen to expose our repository through S3 and download packages directly from there to benefit from AWS access control and reliability. On machines, we use &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;MayaraCloud&#x2F;apt-transport-s3&quot;&gt;apt-transport-s3&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The only thing required to get this to work was adding the following rules to our production machines Ansible role:&lt;&#x2F;p&gt;
&lt;details class=&quot;expandable-code&quot;&gt;
&lt;summary&gt;Package downloader Ansible role&lt;&#x2F;summary&gt;
&lt;pre&gt;&lt;code&gt;- name: Thread Aptly GPG key
    get_url:
        url: https:&#x2F;&#x2F;${YOUR_APTLY_DOMAIN}&#x2F;gpgkey
        dest: &#x2F;etc&#x2F;apt&#x2F;trusted.gpg.d&#x2F;thread.aptly.gpg.asc
        mode: &#x27;0644&#x27;
        
- name: apt-transport-s3
    action: apt
      pkg={{ item }}
      update_cache=yes
      default_release={{ debian_release }}
      cache_valid_time=43200
    with_items:
      - apt-transport-s3
      
- name: sources.list
    template:
      src=sources.list
      dest=&#x2F;etc&#x2F;apt&#x2F;sources.list
      mode=0644
    notify:
      update APT cache

  - name: &#x2F;etc&#x2F;apt&#x2F;s3auth.conf
    template:
        src=s3auth.conf
        dest=&#x2F;etc&#x2F;apt&#x2F;s3auth.conf
        mode=0644
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;&#x2F;details&gt;
&lt;p&gt;Where &lt;code&gt;s3auth.conf&lt;&#x2F;code&gt; contains your S3 credentials and &lt;code&gt;sources.list&lt;&#x2F;code&gt; contains the following line: &lt;code&gt;deb s3:&#x2F;&#x2F;${BUCKET_NAME}.s3.amazonaws.com&#x2F;${PATH_PREFIX}&#x2F; any main&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;final-cost-estimate&quot;&gt;Final cost estimate&lt;&#x2F;h2&gt;
&lt;p&gt;As mentioned above, one of the main reasons to go with a self managed solution was to keep costs under control. So how are we doing now?&lt;&#x2F;p&gt;
&lt;p&gt;Our current costs can be broken into:&lt;&#x2F;p&gt;
&lt;p&gt;Hosting cost:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;We are hosting aptly on one of our existing VMs alongside other workloads which doesn&#x27;t incur any extra cost for us. &lt;&#x2F;li&gt;
&lt;li&gt;Looking at resource usage, we could easily run this on a &lt;code&gt;t3.small&lt;&#x2F;code&gt; EC2 instance for less than $17 a month.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Storage cost:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Our current aptly directory is around 15GB.&lt;&#x2F;li&gt;
&lt;li&gt;Our VM similarly doesn&#x27;t incur extra cost as we had enough free disk space to spare.&lt;&#x2F;li&gt;
&lt;li&gt;This comes out at less than $1 in monthly S3 cost.&lt;&#x2F;li&gt;
&lt;li&gt;If we were to host aptly on an EC2 instance, this would also come out at less than $1 per month for the required EBS volume.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Bandwidth cost:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;On average, we incur $4.5 per day for egress bandwidth from our bucket.&lt;&#x2F;li&gt;
&lt;li&gt;This comes out as ~$140 per month which is around 1.5TB of egress. &lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This cost is entirely incurred due to downloading packages onto our bare metal machines, which are not running in AWS. Our EC2 instances are within the same region as the S3 bucket, incurring no bandwidth cost. &lt;&#x2F;p&gt;
&lt;p&gt;If we wanted to optimise our costs further, we could explore publishing the aptly repository to locations closer to our bare metal machine &lt;sup class=&quot;footnote-reference&quot;&gt;&lt;a href=&quot;#3&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. That being said, while hosting on S3 does incur some cost, there are other non trivial benefits. The main one for us being reliability: our application delivery is split from our main infrastructure and should not be a blocker in a recovery scenario.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;This setup has required little to no maintenance past the initial implementation and has proven easy to use for the engineering team. It provides significantly cheaper bandwidth than what similar plans on hosted services would offer, as well as stronger consistency guarantees. We&#x27;ve now been running it for almost 5 months and overall this has been well worth the extra effort.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;div class=&quot;footnote-definition&quot; id=&quot;1&quot;&gt;&lt;sup class=&quot;footnote-definition-label&quot;&gt;1&lt;&#x2F;sup&gt;
&lt;p&gt;There are other considerations which add complexity when releasing software, but these are fairly orthogonal to the problem at hand.&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;footnote-definition&quot; id=&quot;2&quot;&gt;&lt;sup class=&quot;footnote-definition-label&quot;&gt;2&lt;&#x2F;sup&gt;
&lt;p&gt;The main cost driver being bandwidth charges: our software packages can get quite large — in the order of 200-300MB — and with our current release frequency — around 10-15&#x2F;day — this would have forced us into tiers above $700 per month (the highest advertised tier for PackageCloud).&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
&lt;div class=&quot;footnote-definition&quot; id=&quot;3&quot;&gt;&lt;sup class=&quot;footnote-definition-label&quot;&gt;3&lt;&#x2F;sup&gt;
&lt;p&gt;Aptly supports &lt;a href=&quot;https:&#x2F;&#x2F;www.aptly.info&#x2F;doc&#x2F;feature&#x2F;filesystem&#x2F;&quot;&gt;publishing to the filesystem&lt;&#x2F;a&gt; at which point we can serve the repository through nginx alongside the API.&lt;&#x2F;p&gt;
&lt;&#x2F;div&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Concurrency limits and kernel settings when running NGINX &amp; Gunicorn</title>
		<published>2017-12-23T00:00:00+00:00</published>
		<updated>2017-12-23T00:00:00+00:00</updated>
		<link href="https://lirsac.me/blog/concurrency-limits-and-kernel-settings-when-running-nginx-gunicorn/" type="text/html"/>
		<id>https://lirsac.me/blog/concurrency-limits-and-kernel-settings-when-running-nginx-gunicorn/</id>
		<content type="html">&lt;p&gt;A few weeks ago the team I work on at &lt;a href=&quot;https:&#x2F;&#x2F;tech.stylight.com&#x2F;&quot;&gt;Stylight&lt;&#x2F;a&gt; encountered an unexpected concurrency issue with one of our services. While this specific issue turned out to be simple, we didn&#x27;t find much information putting it all together online and thought our experience would be worth sharing.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;the-problem&quot;&gt;The problem&lt;&#x2F;h2&gt;
&lt;p&gt;After going to production and coming under increased load, one of our web services used for financial reporting started dropping requests with &lt;code&gt;502 (Bad Gateway)&lt;&#x2F;code&gt; response codes alongside the following error message from NGINX:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;connect() to unix:&#x2F;run&#x2F;gunicorn.socket failed (11: Resource temporarily unavailable)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A quick 10s load test performed with &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;tsenart&#x2F;vegeta&quot;&gt;vegeta&lt;&#x2F;a&gt; confirmed that the problem started appearing around 20 req&#x2F;s while both the NGINX and Gunicorn configurations were setup to handle much more than that and did so when ran locally against the production database.&lt;&#x2F;p&gt;
&lt;p&gt;Running the service in docker locally however exhibited the same problem as production. After some head scratching, we were tipped off the real problem by &lt;a href=&quot;http:&#x2F;&#x2F;docs.gunicorn.org&#x2F;en&#x2F;stable&#x2F;faq.html?highlight=somaxconn#how-can-i-increase-the-maximum-socket-backlog&quot;&gt;this paragraph&lt;&#x2F;a&gt; from Gunicorn&#x27;s documentation:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How can I increase the maximum socket backlog?&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
Listening sockets have an associated queue of incoming connections that are waiting to be accepted. If you happen to have a stampede of clients that fill up this queue new connections will eventually start getting dropped.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Turns out the issue is quite simple: when using NGINX as a reverse proxy through a unix socket, the unix socket connection queue size (controlled by the &lt;code&gt;net.core.somaxconn&lt;&#x2F;code&gt; kernel setting on Linux machines) is the bottleneck regardless of NGINX&#x27;s and &#x2F; or the upstream&#x27;s configured capacity (in our case Gunicorn backlog queue size). In practice, NGINX will hand over the requests to the socket, and when the socket&#x27;s queue is full, it starts refusing requests leading NGINX to drop subsequent requests with the status code &lt;code&gt;502 (Bad Gateway)&lt;&#x2F;code&gt;. The number of workers (at either the NGINX or Gunicorn level) doesn&#x27;t help as everything goes through the same socket.&lt;&#x2F;p&gt;
&lt;p&gt;You can find code and instructions to reproduce the problem in a minimal way in &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;lirsacc&#x2F;nginx-gunicorn-somaxconn-reproduction&quot;&gt;this Github repository&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The solution&lt;&#x2F;h2&gt;
&lt;p&gt;As we run our services in Docker on AWS&#x27;s infrastructure, we needed to figure out where the &lt;code&gt;net.core.somaxconn&lt;&#x2F;code&gt; setting was being limited. Turns out it is set to 128 by default in both docker containers and on Amazon&#x27;s default Ubuntu AMIs. It can be tweaked with the following commands:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;For Linux machines, run &lt;code&gt;sysctl -w net.core.somaxconn=...&lt;&#x2F;code&gt; (may require root access).&lt;&#x2F;li&gt;
&lt;li&gt;When running inside docker containers, you need to call &lt;code&gt;docker run&lt;&#x2F;code&gt; with the &lt;code&gt;--sysctl net.core.somaxconn=...&lt;&#x2F;code&gt; flag set correctly. Refer to the &lt;a href=&quot;https:&#x2F;&#x2F;docs.docker.com&#x2F;engine&#x2F;reference&#x2F;commandline&#x2F;run&#x2F;#configure-namespaced-kernel-parameters-sysctls-at-runtime&quot;&gt;relevant docker documentation&lt;&#x2F;a&gt; for more information.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;&#x2F;strong&gt; The default setting of 128 queued connections per socket should work for most applications with fast transactions, and the problem only affects very high concurrency servers and &#x2F; or applications which expect to wait on blocking I&#x2F;O and queue up a lot of requests. This was the case for us with reporting queries expected to potentially run in minutes, in which case being able to queue and delay some users was preferable than dropping their requests. A higher setting should not be a problem for most applications with fast transactions; however increasing the TCP queue size could hide some downstream latency issues and failing early may be better. As always consider your specific use-case and whether this is an unavoidable problem to be solved or a symptom of a deeper issue (design, architecture, etc.).&lt;&#x2F;p&gt;
&lt;p&gt;In the end while the issue turned out pretty simple and straightforward once we knew where to look, this served as a good reminder that even when running in cloud infrastructure understanding the underlying tech is as important as ever.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;&#x2F;h2&gt;
&lt;p&gt;Here are some interesting links to dive more into related topics:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Background on TCP socket: &lt;a href=&quot;http:&#x2F;&#x2F;veithen.github.io&#x2F;2014&#x2F;01&#x2F;01&#x2F;how-tcp-backlog-works-in-linux.html&quot;&gt;http:&#x2F;&#x2F;veithen.github.io&#x2F;2014&#x2F;01&#x2F;01&#x2F;how-tcp-backlog-works-in-linux.html&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Netflix write-ups on configuring Linux servers on AWS:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.slideshare.net&#x2F;AmazonWebServices&#x2F;pfc306-performance-tuning-amazon-ec2-instances-aws-reinvent-2014&quot;&gt;https:&#x2F;&#x2F;www.slideshare.net&#x2F;AmazonWebServices&#x2F;pfc306-performance-tuning-amazon-ec2-instances-aws-reinvent-2014&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;www.brendangregg.com&#x2F;blog&#x2F;2015-03-03&#x2F;performance-tuning-linux-instances-on-ec2.html&quot;&gt;http:&#x2F;&#x2F;www.brendangregg.com&#x2F;blog&#x2F;2015-03-03&#x2F;performance-tuning-linux-instances-on-ec2.html&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;Some more write-ups on NGINX specific configuration:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.linode.com&#x2F;docs&#x2F;web-servers&#x2F;nginx&#x2F;configure-nginx-for-optimized-performance&#x2F;&quot;&gt;https:&#x2F;&#x2F;www.linode.com&#x2F;docs&#x2F;web-servers&#x2F;nginx&#x2F;configure-nginx-for-optimized-performance&#x2F;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;http:&#x2F;&#x2F;engineering.chartbeat.com&#x2F;2014&#x2F;01&#x2F;02&#x2F;part-1-lessons-learned-tuning-tcp-and-nginx-in-ec2&#x2F;&quot;&gt;http:&#x2F;&#x2F;engineering.chartbeat.com&#x2F;2014&#x2F;01&#x2F;02&#x2F;part-1-lessons-learned-tuning-tcp-and-nginx-in-ec2&#x2F;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Using locally installed command line tools from NPM</title>
		<published>2015-03-31T00:00:00+00:00</published>
		<updated>2015-03-31T00:00:00+00:00</updated>
		<link href="https://lirsac.me/blog/using-local-npm-cli-tools/" type="text/html"/>
		<id>https://lirsac.me/blog/using-local-npm-cli-tools/</id>
		<content type="html">&lt;aside&gt;
&lt;h6 id=&quot;updated-august-18-2017&quot;&gt;Updated August 18, 2017&lt;&#x2F;h6&gt;
&lt;p&gt;As of &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;npm&#x2F;npm&#x2F;releases&#x2F;tag&#x2F;v5.2.0&quot;&gt;npm@5.2.0&lt;&#x2F;a&gt;, &lt;code&gt;npx&lt;&#x2F;code&gt; is installed by default and does essentially what this post describes. 
Yarn also has the &lt;a href=&quot;https:&#x2F;&#x2F;yarnpkg.com&#x2F;en&#x2F;docs&#x2F;cli&#x2F;run&quot;&gt;&lt;code&gt;run&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; command which can serve the same purposes.&lt;&#x2F;p&gt;
&lt;&#x2F;aside&gt;
&lt;aside&gt;
&lt;h6 id=&quot;updated-may-10-2015&quot;&gt;Updated May 10, 2015&lt;&#x2F;h6&gt;
&lt;p&gt;Thanks to &lt;a href=&quot;http:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;14524311&quot;&gt;this SO answer&lt;&#x2F;a&gt;, the completion is now improved and gets back to normal filenames after completing the command name. (Before it would just propose command names over and over again).&lt;&#x2F;p&gt;
&lt;&#x2F;aside&gt;
&lt;p&gt;I recently had the need to regularly run cli apps that were installed locally through npm. This didn&#x27;t use to happen very often and I would install cli tools globally, but recently I&#x27;ve had to work on a few projects using different versions of node, which would require reinstalling the packages for every version which is not very practical.&lt;&#x2F;p&gt;
&lt;p&gt;Before we would just run &lt;code&gt;.&#x2F;node_modules&#x2F;.bin&#x2F;package&lt;&#x2F;code&gt; (or &lt;code&gt;$(npm bin)&#x2F;package&lt;&#x2F;code&gt;) in the project directory. It does the job but is a bit tedious to type. Another solution, as proposed on &lt;a href=&quot;http:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;15157360&quot;&gt;Stack Overflow&lt;&#x2F;a&gt;, is to prepend the &lt;code&gt;.&#x2F;node_modules&#x2F;.bin&lt;&#x2F;code&gt; location to your &lt;code&gt;PATH&lt;&#x2F;code&gt;, either on the fly like proposed or when you start working in the directory, like with Python virtualenv or Ruby gemsets. Incidentally, npm &lt;a href=&quot;http:&#x2F;&#x2F;blog.nodejs.org&#x2F;2011&#x2F;03&#x2F;23&#x2F;npm-1-0-global-vs-local-installation&quot;&gt;does this under the hood&lt;&#x2F;a&gt; when running the scripts in your &lt;code&gt;package.json&lt;&#x2F;code&gt; so you can use your local packages in these, but this pollutes the session&#x27;s &lt;code&gt;PATH&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;As an alternative which does not pollutes the &lt;code&gt;PATH&lt;&#x2F;code&gt; and is quicker to type, we have wrote the following &lt;code&gt;npl&lt;&#x2F;code&gt; bash function:&lt;&#x2F;p&gt;
&lt;div &gt;
    &lt;script src=&quot;https:&amp;#x2F;&amp;#x2F;gist.github.com&amp;#x2F;lirsacc&amp;#x2F;d189b11194f397ab794a.js&quot;&gt;&lt;&#x2F;script&gt;
&lt;&#x2F;div&gt;
&lt;p&gt;Add this to your &lt;code&gt;~&#x2F;.profile&lt;&#x2F;code&gt; or any file sourced in your shell and you can just use &lt;code&gt;npl package args...&lt;&#x2F;code&gt; from the project directory and it will pass &lt;code&gt;args...&lt;&#x2F;code&gt; to your cli tool (or fail with exit code 1 and warn you if you&#x27;re not in an npm project). Alias it as you want, &lt;code&gt;npl&lt;&#x2F;code&gt; is just very easy to type with my right hand on a French keyboard.&lt;&#x2F;p&gt;
&lt;p&gt;The first &lt;code&gt;_npl_completion&lt;&#x2F;code&gt; function is just here to add tab completion by simply listing the files in the &lt;code&gt;.bin&lt;&#x2F;code&gt; directory, the &lt;a href=&quot;http:&#x2F;&#x2F;tldp.org&#x2F;LDP&#x2F;abs&#x2F;html&#x2F;tabexpansion.html&quot;&gt;Advanced Bash Scripting Guide&lt;&#x2F;a&gt; is a good resource to get started on this.&lt;&#x2F;p&gt;
</content>
	</entry>
	<entry xml:lang="en">
		<title>Hosting a static site on Amazon S3 &amp; deploying it with Github &amp; Travis</title>
		<published>2014-05-10T00:00:00+00:00</published>
		<updated>2014-05-10T00:00:00+00:00</updated>
		<link href="https://lirsac.me/blog/hosting-a-static-site-on-amazon-s3-deploying-it-with-github-travis/" type="text/html"/>
		<id>https://lirsac.me/blog/hosting-a-static-site-on-amazon-s3-deploying-it-with-github-travis/</id>
		<content type="html">&lt;aside&gt;
&lt;h6 id=&quot;updated-25-may-2018&quot;&gt;Updated 25 May 2018&lt;&#x2F;h6&gt;
&lt;p&gt;This was written a while ago and is most likely not completely accurate. There is also many better options to do just this with less hassle. Notable changes since I wrote this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;This site is now hosted on &lt;a href=&quot;https:&#x2F;&#x2F;www.netlify.com&#x2F;&quot;&gt;Netlify&lt;&#x2F;a&gt; which is super simple to setup at the cost of having all my eggs in one basket&lt;&#x2F;li&gt;
&lt;li&gt;Github recently &lt;a href=&quot;https:&#x2F;&#x2F;blog.github.com&#x2F;2018-05-01-github-pages-custom-domains-https&#x2F;&quot;&gt;announced&lt;&#x2F;a&gt; HTTPS support for custom pages using domain.&lt;&#x2F;li&gt;
&lt;li&gt;This site is built with &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;Keats&#x2F;gutenberg&quot;&gt;Gutenberg&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;aside&gt;
&lt;p&gt;This site is built by &lt;a href=&quot;http:&#x2F;&#x2F;jekyllrb.com&#x2F;&quot;&gt;Jekyll&lt;&#x2F;a&gt; and hosted on &lt;a href=&quot;http:&#x2F;&#x2F;aws.amazon.com&#x2F;s3&#x2F;&quot; title=&quot;Amazon S3&quot;&gt;Amazon S3&lt;&#x2F;a&gt; and its source on &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;lirsacc&#x2F;lirsac.me&#x2F;&quot; title=&quot;Git repo of lirsac.me&quot;&gt;Github&lt;&#x2F;a&gt; and deployed on push through &lt;a href=&quot;https:&#x2F;&#x2F;travis-ci.org&#x2F;lirsacc&#x2F;lirsac.me&quot;&gt;Travis CI&lt;&#x2F;a&gt; to emulate Github Pages&#x27; ease of use. In this post, I will try to explain all the steps required to setting it all up.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;amazon-setup&quot;&gt;Amazon setup&lt;&#x2F;h2&gt;
&lt;p&gt;I was pleasantly surprised to see that serving a static website out of a S3 bucket was extremely easy. The only annoying part was the payment verification which takes some time and requires a working credit card, not really an issue but can be a dealbreaker and definitely loosing in usability compared to Github Pages.&lt;&#x2F;p&gt;
&lt;p&gt;If you don&#x27;t have an AWS account, go &lt;a href=&quot;http:&#x2F;&#x2F;aws.amazon.com&#x2F;&quot; title=&quot;AWS home page&quot;&gt;here&lt;&#x2F;a&gt; and click &lt;em&gt;Sign Up&lt;&#x2F;em&gt;, it will ask you for the usual (name, email, password) and if I remember correctly you then reach the billing step where you have to enter your credit card details and enter a code given to you through an automated phone call. The first year you have access to &lt;a href=&quot;http:&#x2F;&#x2F;aws.amazon.com&#x2F;free&#x2F;&quot;&gt;Amazon&#x27;s free tier&lt;&#x2F;a&gt;, which should more than enough for most people.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;setting-up-the-buckets&quot;&gt;Setting up the buckets&lt;&#x2F;h3&gt;
&lt;p&gt;Once you have access to your &lt;a href=&quot;https:&#x2F;&#x2F;console.aws.amazon.com&#x2F;s3&#x2F;&quot;&gt;Amazon console&lt;&#x2F;a&gt;, create two buckets in the region of your choice, named after your domain (in my case &lt;em&gt;lirsac.me&lt;&#x2F;em&gt; for main zone apex domain and &lt;em&gt;www.lirsac.me&lt;&#x2F;em&gt; for the usual domain). This is just to makes sure both domain are accessible, though you might not need it if you use a CDN service like Cloudfront or Cloudflare. Setup logging if you want, but if you don&#x27;t need access logs or already use some kind of web analytics, you can forget it and limit the S3 storage you will use. Once the bucket is created select it click on the &lt;em&gt;Properties&lt;&#x2F;em&gt; link in the top right corner, you can then setup static hosting like so:&lt;&#x2F;p&gt;
&lt;figure class=&quot;image &quot;&gt;
  &lt;a href=&quot;bucket-settings.png&quot;&gt;
    &lt;img src=&quot;bucket-settings.png&quot; alt=&quot;Bucket Settings&quot;&#x2F;&gt;
  &lt;&#x2F;a&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;You also need to make all the files in the bucket public. This can be done with a simple bucket policy. (In the bucket properties: &lt;em&gt;Permissions&lt;&#x2F;em&gt; &amp;gt; &lt;em&gt;Edit bucket policy&lt;&#x2F;em&gt;):&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;{
    &amp;quot;Version&amp;quot;: &amp;quot;2008-10-17&amp;quot;,
    &amp;quot;Statement&amp;quot;: [
        {
            &amp;quot;Sid&amp;quot;: &amp;quot;PublicReadForGetBucketObjects&amp;quot;,
            &amp;quot;Effect&amp;quot;: &amp;quot;Allow&amp;quot;,
            &amp;quot;Principal&amp;quot;: {
                &amp;quot;AWS&amp;quot;: &amp;quot;*&amp;quot;
            },
            &amp;quot;Action&amp;quot;: &amp;quot;s3:GetObject&amp;quot;,
            &amp;quot;Resource&amp;quot;: &amp;quot;arn:aws:s3:::lirsac.me&#x2F;*&amp;quot;
        }
    ]
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This allows GET request for everything in the &lt;em&gt;lirsac.me&lt;&#x2F;em&gt; bucket. If you have already an index page, you should be able to access your bucket (apex or no) at the &lt;em&gt;Endpoint&lt;&#x2F;em&gt; url (&lt;del&gt;Try &lt;a href=&quot;http:&#x2F;&#x2F;lirsac.me.s3-website-eu-west-1.amazonaws.com&quot;&gt;this&lt;&#x2F;a&gt; for example&lt;&#x2F;del&gt;). As I don&#x27;t use any CDN, I set up the second non-apex bucket to redirect over to the first one:&lt;&#x2F;p&gt;
&lt;figure class=&quot;image small&quot;&gt;
  &lt;a href=&quot;bucket-redirect.png&quot;&gt;
    &lt;img src=&quot;bucket-redirect.png&quot; alt=&quot;Bucket redirection settings&quot;&#x2F;&gt;
  &lt;&#x2F;a&gt;
  &lt;div class=&quot;caption&quot;&gt;
    S3 bucket redirection settings
  &lt;&#x2F;div&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;That&#x27;s it for the S3 part, for more information the &lt;a href=&quot;http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;dev&#x2F;WebsiteHosting.html&quot;&gt;AWS documentation&lt;&#x2F;a&gt; is pretty extensive with examples for virtually anything. Next up is setting up your domain so that you can access your website with the correct urls.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;pointing-your-domain-to-the-bucket&quot;&gt;Pointing your domain to the bucket&lt;&#x2F;h3&gt;
&lt;p&gt;There are plenty of ways to point your domain to you S3 bucket, the simplest in my opinion was using Amazon&#x27;s own &lt;a href=&quot;http:&#x2F;&#x2F;aws.amazon.com&#x2F;route53&#x2F;&quot;&gt;Route 53&lt;&#x2F;a&gt;. To get started, create a hosted zone in the Route 53 console, it should appear like so:&lt;&#x2F;p&gt;
&lt;figure class=&quot;image &quot;&gt;
  &lt;a href=&quot;hosted-zone-overview.png&quot;&gt;
    &lt;img src=&quot;hosted-zone-overview.png&quot; &#x2F;&gt;
  &lt;&#x2F;a&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;Next update your registar with the DNS servers given in the &lt;em&gt;Delegation Set&lt;&#x2F;em&gt; section to allow Route 53 to manage your domain. To point your domain just click &lt;em&gt;Create a Record Set&lt;&#x2F;em&gt; select &lt;em&gt;A -IPv4&lt;&#x2F;em&gt;, then select the &lt;em&gt;Alias&lt;&#x2F;em&gt; checkbox and you should be prompted with the correct bucket address. In the end the config panel of the A record looks like this:&lt;&#x2F;p&gt;
&lt;figure class=&quot;image &quot;&gt;
  &lt;a href=&quot;a-record.png&quot;&gt;
    &lt;img src=&quot;a-record.png&quot; &#x2F;&gt;
  &lt;&#x2F;a&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;Do the same for both bucket and you&#x27;re all set. Just wait for the records to expire and you should have access to your files on the apex domain and non apex domain.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-deploy-on-push-to-work&quot;&gt;Getting deploy on push to work&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;s3-website&quot;&gt;s3_website&lt;&#x2F;h3&gt;
&lt;p&gt;To automate the upload to S3, there is a very useful command line tool called &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;laurilehmijoki&#x2F;s3_website&quot;&gt;s3_website&lt;&#x2F;a&gt; by Lauri Lehmijoki which is pretty simple to set-up. It also discovers the &lt;code&gt;_site&lt;&#x2F;code&gt; directories if you&#x27;re running a Jekyll site and only
It can be installed as a Ruby gem, and config is stored in a &lt;code&gt;s3_website.yml&lt;&#x2F;code&gt; file (You can create a starter file by running &lt;code&gt;s3_website cfg create&lt;&#x2F;code&gt; in your project directory).&lt;&#x2F;p&gt;
&lt;p&gt;My config file is pretty minimal and looks like this at the moment:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;s3_id: &amp;lt;%= ENV[&#x27;S3_KEY&#x27;] %&amp;gt;
s3_secret: &amp;lt;%= ENV[&#x27;S3_SECRET&#x27;] %&amp;gt;
s3_bucket: lirsac.me
s3_endpoint: eu-west-1
max_age: 120
gzip: true
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;There are a lot more options available (Cloudfront cache invalidation, excluding and ignoring files, reduced redundancy... it&#x27;s pretty awesome if you want to automate S3 assets management), but those work for a simple blog. It just sets the credentials, bucket name and S3 region (see &lt;a href=&quot;http:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;general&#x2F;latest&#x2F;gr&#x2F;rande.html#s3_region&quot; title=&quot;Available S3 Regions&quot;&gt;available values&lt;&#x2F;a&gt;), plus activates gzipping of all files.Note that the &lt;code&gt;s3_id&lt;&#x2F;code&gt; and &lt;code&gt;s3_key&lt;&#x2F;code&gt; values are read from your environment variables which will be useful to run this on Travis without exposing your credentials.&lt;&#x2F;p&gt;
&lt;p&gt;To push to S3, just run &lt;code&gt;s3_website push&lt;&#x2F;code&gt; or &lt;code&gt;s3_website push --headless&lt;&#x2F;code&gt; (delete files from S3 which are missing on the local site without prompt) from the project directory.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;setting-up-travis&quot;&gt;Setting up Travis&lt;&#x2F;h3&gt;
&lt;p&gt;So we have now setup a static website on Amazon, reachable at your domain and to which you can push the Jekyll build updates via a simple command. Now what would be nice is having all of this work as seamlessly as possible (like Github pages for example) and just work on your Github repo. For this I use Travis, a continous integration system which builds from Github pushes. It is super simple to setup and is free for open source projects.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;the-travis-yml-config-file&quot;&gt;The .travis.yml config file&lt;&#x2F;h4&gt;
&lt;p&gt;Once you log in with Github, you can check which repo you want to have Travis build. Once that is done, you need a &lt;code&gt;.travis.yml&lt;&#x2F;code&gt; config file in your project&#x27;s root. Mine looks like this:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;language: ruby
rvm:
  - 2.0.0
install:
  - npm install -g grunt-cli
  - npm install -g bower
  - bundle install
  - npm install
  - bower install
script:
  - grunt build
after_success: bundle exec s3_website push --headless
env:
  global:
  - secure: YzXcqU9FcOenY3BIOu1U1LjlWn5tLqpCIhoSA8GOOLPdD3m2bXAyGTau7ou49f&#x2F;CJ0PuVBNdQCs9pt405GzLhLcgnDy2YKGEB5h+slusX3u1k7SU3VPsDfJ2oWV8A&#x2F;cNiWELYf40hRiEduz8bEt2zI0ZkLRMsu+GM&#x2F;VU3WYHq4I=
  - secure: uNPUARiMIo&#x2F;Q4vkGSMcnRw0Si4qTkAIJAquvEu3cTtb6XHFGtxUVFThGtTFy0QNFylHnpMoWZ0aQi&#x2F;2l7iqU5l3CIzYm415S89Ga9T3&#x2F;5&#x2F;SOtVg+9EJhf58fYXb33nAjFTcx2kEUvXLz5eAKLD0ZngcKFTR&#x2F;SnoAe4y7pI6yxhQ=
branches:
  only:
    - master
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;First you define the project&#x27;s environment. As you can see, even when using Grunt, we define the project as a ruby project, (apparently, &lt;a href=&quot;http:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;18456611&#x2F;is-it-possible-to-set-up-travis-to-run-tests-for-several-languages&quot;&gt;all environments have node&#x2F;npm installed&lt;&#x2F;a&gt;). Then you define all the steps your build will run (see &lt;a href=&quot;http:&#x2F;&#x2F;docs.travis-ci.com&#x2F;user&#x2F;build-lifecycle&#x2F;&quot;&gt;here&lt;&#x2F;a&gt; for more info on the different steps). &lt;code&gt;install&lt;&#x2F;code&gt; is pretty straightforward, we the have the &lt;code&gt;script: grunt build&lt;&#x2F;code&gt;, and &lt;code&gt;after_success&lt;&#x2F;code&gt; is run when the script was executed successfully. We could also define a &lt;code&gt;after_failure&lt;&#x2F;code&gt; step if you want more the the build failure email.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;env&lt;&#x2F;code&gt; section defines environment variables. The &lt;code&gt;secure&lt;&#x2F;code&gt; keyword means that it has been encrypted thanks to the travis command line tool. To encrypt the &lt;code&gt;S3_KEY&lt;&#x2F;code&gt; variable used in the s3_website config, just run the following commands (&lt;code&gt;--add&lt;&#x2F;code&gt; automatically adds it to the config file if it&#x27;s found):&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;gem install travis
travis encrypt S3_KEY=your_key --add
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The last section defines the branches on which you want to run the build, in my case I use branches for drafts and experiments, merging into master will then start a deploy (except if you append &lt;code&gt;[skip ci]&lt;&#x2F;code&gt; somewhere in your commit message).&lt;&#x2F;p&gt;
&lt;h4 id=&quot;aws-security-credentials&quot;&gt;AWS Security credentials&lt;&#x2F;h4&gt;
&lt;p&gt;AWS identity management makes it very easy to create multiple credentials on one account and give them limited rights. Which is a good thing as noone should use root level credential on a public repo (even encrypted).
To create a contained identity which you can then give Travis or any external service, just go to &lt;a href=&quot;https:&#x2F;&#x2F;console.aws.amazon.com&#x2F;iam&#x2F;home?#users&quot;&gt;Identity Management Console&lt;&#x2F;a&gt; and create a new user with the name of your choice. You are then prompted to download key&#x2F;secret combination authenticating this user (those are the values you need to encrypt using Travis CLI).&lt;&#x2F;p&gt;
&lt;figure class=&quot;image &quot;&gt;
  &lt;a href=&quot;aim-create.png&quot;&gt;
    &lt;img src=&quot;aim-create.png&quot; alt=&quot;Create an AIM user&quot;&#x2F;&gt;
  &lt;&#x2F;a&gt;
&lt;&#x2F;figure&gt;
&lt;figure class=&quot;image &quot;&gt;
  &lt;a href=&quot;aim-created.png&quot;&gt;
    &lt;img src=&quot;aim-created.png&quot; alt=&quot;Download their credentials file&quot;&#x2F;&gt;
  &lt;&#x2F;a&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;After that select the user in the top panel and go to the &lt;em&gt;Permissions&lt;&#x2F;em&gt; tab.&lt;&#x2F;p&gt;
&lt;figure class=&quot;image &quot;&gt;
  &lt;a href=&quot;aim-permissions.png&quot;&gt;
    &lt;img src=&quot;aim-permissions.png&quot; alt=&quot;Setup user&amp;#x27;s policy&quot;&#x2F;&gt;
  &lt;&#x2F;a&gt;
&lt;&#x2F;figure&gt;
&lt;p&gt;That&#x27;s where you will set up policies limiting the reach of this specific user. Click on &lt;em&gt;Attach Policy&lt;&#x2F;em&gt; and either use a predefined policy or adapt this one (It gives full S3 access to a specific bucket, &lt;em&gt;lirsac.me&lt;&#x2F;em&gt; here) :&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;{
  &amp;quot;Version&amp;quot;: &amp;quot;2012-10-17&amp;quot;,
  &amp;quot;Statement&amp;quot;: [
    {
      &amp;quot;Effect&amp;quot;: &amp;quot;Allow&amp;quot;,
      &amp;quot;Action&amp;quot;: &amp;quot;s3:*&amp;quot;,
      &amp;quot;Resource&amp;quot;: [
        &amp;quot;arn:aws:s3:::lirsac.me&amp;quot;,
        &amp;quot;arn:aws:s3:::lirsac.me&#x2F;*&amp;quot;
      ]
    },
    {
      &amp;quot;Effect&amp;quot;: &amp;quot;Allow&amp;quot;,
      &amp;quot;Action&amp;quot;: &amp;quot;s3:ListAllMyBuckets&amp;quot;,
      &amp;quot;Resource&amp;quot;: &amp;quot;arn:aws:s3:::*&amp;quot;
    }
  ]
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;&#x2F;h2&gt;
&lt;p&gt;In the end, the setup was pretty straightforward thanks to the respective documentations, albeit with a few mistakes on my part. But it still shouldn&#x27;t take more than an hour to setup, and it&#x27;s a one time thing with very little added complexity. As for the comparison with Github Pages, I am not aware of any performance issues there, so if you run Jekyll the main advantage is the ability to run custom plugins. If you don&#x27;t run Jekyll however, any custom static site generator can work without playing with github branches, so the added control can be worth it if you have specific requirements.&lt;&#x2F;p&gt;
&lt;p&gt;Even if it started as an experiment, I have kept it in place as it is free for the moment. To take it further, I started experimenting a bit more as I think it would be interesting to handle versions, uploading a new site to the bucket with the commit sha for example, in order to make interactive mock-ups available to a client. I hope this guide is clear enough for anyone wanting to reproduce the setup.&lt;&#x2F;p&gt;
</content>
	</entry>
</feed>
