Skip to content

tedious ramblings

The blog of Robert Hafner

Menu
  • Projects
  • Resume
  • Sponsor
  • Archives
  • About
Menu

ec2details, the missing EC2 Instance Metadata API

Posted on March 28, 2018April 3, 2018 by Robert Hafner

When working with the AWS EC2 service in a programmatic way I've repeatedly run into a simple problem- how can I get up to date metadata about the various instance types in a programmatic way?

It turns out this simple problem does not actually have a simple solution. AWS offers their Bulk API, which has all the information about every EC2 instance offering in a single giant JSON file, but parsing it with python3.6 will give an OOM error on machines with only 2gb of ram and actually getting the desired data out of it is not a trivial task. The AWS Query API requires AWS credentials and specific IAM roles (and has almost no documentation), making it overly burdensome to use.

Despite that I've built-in support for the AWS Bulk API into at least two projects. While contemplating doing it for the third I decided it made more sense to simply build a better API for EC2 Instance Details, with a few goals in mind-

  • Information about each instance type should be easy to access.
  • The data should include hardware specs, prices, and regional availability.
  • The data should be accessible to pretty much any programming language.
  • The data should be reasonably up to date.
  • The API should have high availability and decent security (SSL).
  • Hosting this should not cost me a fortune, even if it gets popular.

In the end I built a "static" API hosted on GitHub Pages. Every six hours CircleCI kicks off a job to download and process the Bulk API data, generating two files (JSON and YAML) with a cleaned up version of the instance data indexed by instance type. If the files are different from what is already stored in git then CircleCI commits the new files and pushes them back up to GitHub, so the API is never more than six hours out of date from the information available from AWS. Using Github Pages has some real benefits as well, with built-in SSL and the Fastly CDN. The whole system requires no direct hosting on my behalf, and will stay up to date without any need for me to interfere as long as AWS does not change the format of their giant json file. Since the whole thing is stored in git it also creates historical data as a matter of course, showing exactly when changes have occured.

The whole project is, of course, available on Github. The API itself, with documentation, is on Github Pages.

Share this:

  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to print (Opens in new window) Print
  • More
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to email a link to a friend (Opens in new window) Email

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About

Robert Hafner is a Principal Engineer based in Chicago focusing on distributed applications, infrastructure, and security. This blog is a running journal of projects, tutorials, and random ideas that pop into his head.

  • GitHub
  • Mastodon
  • LinkedIn

Popular Posts

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

©2025 tedious ramblings | Built using WordPress and Responsive Blogily theme by Superb