When working with the AWS EC2 service in a programmatic way I've repeatedly run into a simple problem- how can I get up to date metadata about the various instance types in a programmatic way?
It turns out this simple problem does not actually have a simple solution. AWS offers their Bulk API, which has all the information about every EC2 instance offering in a single giant JSON file, but parsing it with python3.6 will give an OOM error on machines with only 2gb of ram and actually getting the desired data out of it is not a trivial task. The AWS Query API requires AWS credentials and specific IAM roles (and has almost no documentation), making it overly burdensome to use.
Despite that I've built-in support for the AWS Bulk API into at least two projects. While contemplating doing it for the third I decided it made more sense to simply build a better API for EC2 Instance Details, with a few goals in mind-
- Information about each instance type should be easy to access.
- The data should include hardware specs, prices, and regional availability.
- The data should be accessible to pretty much any programming language.
- The data should be reasonably up to date.
- The API should have high availability and decent security (SSL).
- Hosting this should not cost me a fortune, even if it gets popular.
In the end I built a "static" API hosted on GitHub Pages. Every six hours CircleCI kicks off a job to download and process the Bulk API data, generating two files (JSON and YAML) with a cleaned up version of the instance data indexed by instance type. If the files are different from what is already stored in git then CircleCI commits the new files and pushes them back up to GitHub, so the API is never more than six hours out of date from the information available from AWS. Using Github Pages has some real benefits as well, with built-in SSL and the Fastly CDN. The whole system requires no direct hosting on my behalf, and will stay up to date without any need for me to interfere as long as AWS does not change the format of their giant json file. Since the whole thing is stored in git it also creates historical data as a matter of course, showing exactly when changes have occured.
The whole project is, of course, available on Github. The API itself, with documentation, is on Github Pages.