This post was originally published on July 17, 2017.
How we use Ansible extensively at ExpressVPN
Our development teams work independently, that is to say, a team owns their product for its full life cycle. This set up means our Ansible understanding comes from a collection of knowledge from many different teams in the company rather than a centralized group who manage Ansible.
A decentralized workforce gives our teams lots of flexibility and mobility but also puts pressure on individuals to know a lot about many tools.
To make it easier to share knowledge and use tools correctly, we’ve decided to standardize how we use Ansible for configuration management and server operations.
This blog covers the lessons we’ve learned operating at our scale, reflections on the way we work, and how we manage Ansible in such a context.
Ansible documentation
Let’s get right into it! The documentation for Ansible leaves some things to be desired, especially when it comes to end-to-end documentation (like, how do you get from point A to point Z?).
Some questions we regularly encounter are: “How does variable precedence work?” and “How does Ansible Vault fit in?”
Both problems are documented very well independently (here and here), and the Ansible Variables page has a very nice section about precedence explicitly, but the intersection of the two gets only a brief mention. The problem is that there are no links between the documentation about Variables and vaults, giving the impression that the onus is on the user to figure out how the two intersect with one another.
So, today we’ll try covering the intersection between Variables and Vaults and best practice.
What you can use Ansible Vault files for
In summary: The Vault documentation states that you can essentially encrypt anything within your Ansible folder into a Vault file, and Ansible will try to “cleverly” decrypt it whenever a play includes these files. Huh. Cool!
The documentation about Variables mentions nothing about Vault files at all, which is odd as Vault was designed for Variable files. So how do they fit together? It’s important to note that Vault files themselves have no special meaning for Variable processing or precedence, so there’s a lot of flexibility. But potentially this doesn’t leave you with enough information on how to use it properly.
Take this example of a simple Ansible folder:
.
├── group_vars
│ ├── all
│ ├── production
│ └── staging
├── ansible.cfg
├── inventory
└── playbook.yml
At first glance, this setup looks good; this would be a relatively common structure to produce if you were to read the documentation. An observer could potentially assume that the staging and production files in group_vars are Vaults, but that is not necessarily true, which in itself is a problem.
Now, the file “all” cannot be a Vault file since you (hopefully) encrypted the staging and production Vault files with different passwords. But it also means that your group_vars file for environments needs to contain a mix of secrets and non-secrets since you’re limited to one file per environment.
Because of this—and if you extrapolated a little after reading the intro to Vaults in the Ansible documentation—you probably created the production/staging vaults by copying the contents of “all” initially and then modifying them.
That means your “all” file might look like this:
database:
username: default_user
password: false
super_important_var_that_should_be_one: 1
And your production Vault file might look like this:
database:
username: produser
password: supersecretpasswordnoonecansee
super_important_var_that_should_be_one: 1
(Don’t worry, this isn’t our actual production password! We double-checked.)
The above is dangerous for reasons that may not be obvious. For example, you may miss changing a default for production, and/or your “all” file might even be named wrong and not included at all! (This is the root cause of the outage we had last week.)
Best practice: How to use Ansible Vault files safely
As stated in the best practices page, making a file into a Vault file obscures the contents of the file, so they come with a big drawback: You cannot search for what Variables are within the Vault file without explicitly decrypting them. This system means that whoever is looking at your Ansible configuration has no idea what is inside of these files without also knowing the Vault password (terrible for code reviews!). Hence, we recommend putting as few Variables as humanly possible inside Vault files. (In other words, only put secrets in the Vault files!)
So, let’s look at a structure that would make it easier not to shoot yourself in the foot:
.
├── group_vars
│ ├── all
│ │ └── vars.yml
│ ├── production
│ │ ├── vars.yml
│ │ └── vault.yml
│ └── staging
│ └── vault.yml
├── ansible.cfg
├── inventory
└── playbook.yml
The best practices documentation also recommends using a “layer of indirection,” meaning that you should be templating in all of the Variables in the Vault file into the Variables referenced within your playbooks. It also recommends that you prefix your vault Variables with “vault_” meaning your all/vars.yml could look something like:
database:
username: default_user
password: “{{ vault_database_password }}”
super_important_var_that_should_be_one: 1
Your production/vars.yml looks something like this:
database:
username: produser
And your production/vault.yml file should only contain this:
vault_database_password: supersecretpasswordnoonecansee
This revised structure has a couple of benefits. First of all, if you’re doing code reviews (please do!), it means your reviewers can see what you’ve changed, along with what you’ve added and removed in (almost all of) your config. With this structure, reviewers won’t just see a full file change on a Vault that needs to be manually decrypted, saved to disk, and diffed with the earlier version.
And, more importantly, Ansible will fail even rendering the vars if it’s missing the vault_database_password Variable within the Vault, which will save you from at least a swath of issues you might encounter if you’re not keeping close tabs on your Vault files.
If you stick to this pattern, no matter if it’s a host group within an environment, a full environment that you’re setting Variables for, or even the “all” folder, your peers will never be confused about what is and is not within the Vault.
That’s all for now, we hope it’s been of some use for you!