:::: MENU ::::
Browsing posts in: Open Source

urlparser – a simple python program for extracting info from URLs

I regularly run into the need to use part of a URL inside of shell scripts- such as extracting the hostname and port from a URL in order to check if the service is reachable- and got a bit tired of screwing with regex. The urllib python library’s parse component is a great tool for managing this, so I wrote urlparser to expose that library directly from the command line and tossed it up on pypi for others to use.

ec2details, the missing EC2 Instance Metadata API

When working with the AWS EC2 service in a programmatic way I’ve repeatedly run into a simple problem- how can I get up to date metadata about the various instance types in a programmatic way?

It turns out this simple problem does not actually have a simple solution. AWS offers their Bulk API, which has all the information about every EC2 instance offering in a single giant JSON file, but parsing it with python3.6 will give an OOM error on machines with only 2gb of ram and actually getting the desired data out of it is not a trivial task. The AWS Query API requires AWS credentials and specific IAM roles (and has almost no documentation), making it overly burdensome to use.

Despite that I’ve built-in support for the AWS Bulk API into at least two projects. While contemplating doing it for the third I decided it made more sense to simply build a better API for EC2 Instance Details, with a few goals in mind-

  • Information about each instance type should be easy to access.
  • The data should include hardware specs, prices, and regional availability.
  • The data should be accessible to pretty much any programming language.
  • The data should be reasonably up to date.
  • The API should have high availability and decent security (SSL).
  • Hosting this should not cost me a fortune, even if it gets popular.

In the end I built a “static” API hosted on GitHub Pages. Every six hours CircleCI kicks off a job to download and process the Bulk API data, generating two files (JSON and YAML) with a cleaned up version of the instance data indexed by instance type. If the files are different from what is already stored in git then CircleCI commits the new files and pushes them back up to GitHub, so the API is never more than six hours out of date from the information available from AWS. Using Github Pages has some real benefits as well, with built-in SSL and the Fastly CDN. The whole system requires no direct hosting on my behalf, and will stay up to date without any need for me to interfere as long as AWS does not change the format of their giant json file. Since the whole thing is stored in git it also creates historical data as a matter of course, showing exactly when changes have occured.

The whole project is, of course, available on Github. The API itself, with documentation, is on Github Pages.

Manage Github Pull Requests with gitconsensus

This weekend I dug into the Github API to build gitconsensus, which lets communities create truly democratic projects using Reactions as a voting mechanism. Projects can define consensus rules (minimum age of pull request, quorum for votes, threshold needed for passing) using a yaml file in their project root. Pull Requests that meet the consensus rules will get merged, and those that do not meet them after a certain amount of time will get closed.

The yaml file itself is pretty straightforward-

# .gitconsensus.yaml
# Add extra labels for the vote counts and age when merging
extra_labels: false

# Do not count any vote from a user who votes for multiple options
prevent_doubles: true

# Minimum number of voters
quorum: 5

# Required percentage of yes votes (ignoring abstentions)
threshold: 0.65

# Only process votes by contributors
contributors_only: false

# Only process votes by collaborators
collaborators_only: false

# When defined only process votes from these github users
  - alice
  - bob
  - carol

# Number of days after last action (commit or opening the pull request) before issue can be merged
mergedelay: 3

# Number of days after last action (commit or opening the pull request) before issue is autoclosed
timeout: 30

The project is available now on pypi and can be installed using pip.

Introducing jsonsmash – work with large json files easily

Over the last year I’ve run into some pretty massive JSON files. One recent examples is from AWS, which publishes a 120mb file containing a list of their available services that they have yet to provide documentation for. Attempting to open that in a standard editor is not going to be pleasant, and while tearing it apart with something like jq is certainly an option it doesn’t feel like the best approach.

That’s why I’ve build jsonsmash, an emulated shell that lets users browse through their massive json files as if they were actually filesystems. It uses shell commands any linux user would already be familiar with to make it very straightforward to figure out the schema of a json object (even if it is a few hundred megabytes) and pull out specific data.

Development is on github and the package is published on npmjs.

Building an Email Testing Environment with Vagrant, Dovecot and Travis-CI

In my previous post I introduced the testing suite I created for Fetch. Here I want to go through exactly what needed to be done to put that together. This post is a bit longer, and you really don’t need it to take advantage of this package, but it may provide some insight to anyone hoping to do something like this themselves.

For me testing has two major components- it should ease development on the local level by preventing regressions, and it should help me as the maintainer of an open source project by giving me trust in the pull requests that are coming in to me. Travis-CI, an application I’ve been using extensively for my projects now, makes that last problem much easier as it will run my test suites and tell me right in the pull request if any problems have come up. On the local level I decided to use Vagrant, as it makes adding virtual machines to source control (without adding a huge binary file) really simple.

Continue Reading

Announcing a New Continuous Integration and Email Package using Travis-CI and Vagrant

Years ago I wrote a library, Fetch, which was designed to read email using the PHP IMAP Extension. At the time I did not expect it to get much reuse so I skipped out on much of the test suite- if I’m going to be perfectly honest here though I have to admit that a huge reason for not building out a test suite then was that I was unsure how to go about it. Building a test suite for an email library has some very interesting requirements, especially if you want to test it in as real world a setting as possible.

  • You need a mail server.
  • That mail server needs a variety of messages, folders, flags and attachments for all the for different types of tests that are needed.
  • Those messages have to remain consistent between test runs.
  • Resetting the mail server to it’s original state should be fast to encourage lots of testing during development.
  • The test suite, and thus the mail server, need to be portable so other people can run tests on their changes.
  • Continuous integration (integrated in with something like Github) is really really nice.

Unfortunately for me all of my great excuses have vanished, thanks in large part to Travis-CI and Vagrant. Both of these systems are fantastic for testing. Travis is a continuous integration environment that’s setup for a number of languages and has direct integration with a number of services such as Github. Vagrant makes sharing server environments as easy as passing along a configuration file. With the two of these tools I was able to put together a testing package for Fetch that met all of those requirements, and after a bit of additional work I’ve broken that up into a stand alone package for people to use in their own projects.

Using this is really simple- you just need to call SetupEnvironment.sh before each run of your test suite. Once that is finished running you have a fully functional Dovecot server with a consistent set of messages. It doesn’t matter if you’re running this from your home computer or if it’s run on Travis-CI, as the SetupEnvironment.sh script will recognize where it’s being run and work accordingly.

There are also some cool optimizations to make testing faster. The test environment will stay up and running for a half hour after the last test before shutting itself down, so you don’t have to wait for the virtual machine to boot between tests. Rather than start from a blank slate each time, the environment has a reset function that restores the original inboxes. With an already provisioned test environment there is almost no overhead by this package between tests, as resetting the inbox takes only a couple of seconds.

The project is up on Github at tedivm\DovecotTesting, so if you’re writing an email reading library there’s no more excuses. I’ve also put together a more detailed post describing how this was put together.

WordPress Syntax Highlighting for YAML

I was writing up a new blog post that had some Yaml config files in them. I’ve been using the Syntaxhighlighter Evolved plugin for this, which has worked remarkably well. Unfortunately it seems that Yaml is not one of the supported languages, and I couldn’t find much about it.

Luckily for me the author wrote a fantastic blog post about extending his plugin, which I used to create my very first WordPress plugin. If you’re looking to add Yaml highlighting to your blog you can grab my SyntaxHighlighter Evolved: Yaml Brush plugin right off of the WordPress Plugin Directory. The first version is pretty simple, but I’m considering collecting a few different languages together for a more comprehensive pack.


As part of maintaining Fetch I have to install the php imap extension quite a bit. Although this is pretty trivial on most variants of linux it’s kind of a pain for OSX- you have to find a few dependancies, compile the imap c library from source, create the extension against your currently installed version of php (which typically won’t include it’s source on the system), and then take that one resulting file and set it up.

After doing this for the millionth time I decided to script it, and like any good programmer I looked to see what was already out there. I found a script by Ivan Vucica for an older version of OSX and then polished it up. My version should work on more than just OSX, although in most cases you’ll want to use the system package manager in that case anyways.

I’ve posted the code and a Readme on Github, and have thrown together a first release. Please let me know if you find it useful!