Apache To CloudFront With Lambda At Edge

I've been running my (this) vanity website and mail server on Linux machines I administer myself since 1998 or so, but it's time to rebuild the machine and hosting static HTTPS no longer makes sense in a world where GitHub or AWS can handle it flexibly and reliably for effectively free. I did want to keep running my own mail server, but centralization in email has made delivery iffy and everyone I'm communicating with is on gmail, so the data is going there anyway.

Because I've redone the ry4an.org website so many times and because cool URLs don't change I have a lot of redirects from old locations to new locations. With apache I grossly abused mod_rewrite to pattern match old URLs and transform them into new ones. No modern hosting provider is going to run apache, especially with mod_rewrite enabled, so I needed to rebuild the rules for whatever hosting option I picked.

Github.io won't do real redirects (only meta refresh tags), so that was right out. AWS's S3 lets you configure redirects using either a custom x-amz-website-redirect-location header on a placeholder object in the S3 bucket or some hoary XML routing rules at the bucket level, but neither of those allow anything more complicated than key prefix matching.

AWS's content delivery edge network, CloudFront, doesn't host content or generate redirects -- it's just a caching proxy --, but it lets you deploy javascript functions directly to the edge nodes which can modify requests and responses on their way in and out. With this Lambda at Edge capability you're restricted to specific releases of only the javascript runtime, but that's enough to get full regular expression matching with group extraction.


Ry4an in Title Case

Python has a uniquely bad title case function which turns my already silly name into Ry4An, capitalizing the 'a' because it follows a non-letter character. I can't be sure that all the bulk email I get that's sent to Ry4An Brase has passed through Python's .title() function, but I've not found another language or framework with so bad an implementation.

At least Python warns you that their version is terrible right in the docstring for title and provides a slightly better one they suggest you paste directly into your code. There are, of course, better versions available in libraries like titlecase which handle things like not capitalizing articles.

Other languages seem to avoid the fussiness of title case requirements by omitting it from the core language entirely (ruby, java), leaving it to third party implementors like rails and apache commons.

Four emails with my name in bad titlecase

Kindle Highlights and Ratings

When reading I've always underlined sentences that make me happy. Once the kids got old enough to understand there's no email or fun on a Kindle I switched from dead tree books, and now the underlining is stored in Amazon's datacenters.

After a few years of highlighting on Kindle I started to wonder if the number of sentences that I liked and the eventual five-star scale rating I gave a book had any correlation. Amazon owns Goodreads and Kindle services sync data into Goodreads, but unfortunately highlight data isn't available through any API.

I was able to put together a little Python to scrape the highlight counts per book (yay, BeautifulSoup) and combine it with page count and rating info from the goodreads APIs. Our family scientist explained "the statistical tests to compare values of a continuous variable across levels of an ordinal variable", and there was no meaningful relationship. Still it makes a nice picture:

Highlights Per Page vs. Rating


Home Alarm Analytics With AWS Kinesis

Home security system projects are fun because everything about them screams "1980s legacy hardware design". Nowhere else in the modern tech landscape does one program by typing in a three digit memory address and then entering byte values on a numeric keypad. There's no enter-key -- you fill the memory address. There's no display -- just eight LEDs that will show you a byte at a time, and you hope it's the address you think it is. Arduinos and the like are great for hobby fun, but these are real working systems whose core configuration you enter byte by byte.

The feature set reveals 30 years of crazy product requirements. You can just picture the well-meaning sales person who sold a non-existent feature to a huge potential customer, resulting in the boolean setting that lives at address 017 bit 4 and whose description in the manual is:

ON: The double hit feature will be enabled. Two violations of the same zone within the Cross Zone Timer will be considered a valid Police Code or Cross Zone Event. The system will report the event and log it to the event buffer. OFF: Two alarms from the same zone is not a valid Police Code or Cross Zone Event

I've built out alarm systems for three different homes now, and while occasionally frustrating it's always a satisfying project. This most recent time I wanted an event log larger than the 512 events I can view a byte at a time. The central dispatch service I use will sell me back my event log in a horrid web interface, but I wanted something programmatically accessible and ideally including constant status.

The hardware side of the solution came in the form of the Alarm Decoder from Nu Tech. It translates alarm panel keypad bus events into events on an RS-232 serial bus. That I'm feeding into a Raspberry Pi. From there the alarmdecoder package on PyPI lets me get at decoded events as Python objects. But, I wanted those in a real datastore.


Raspberry Pi UPS

I'm starting to do more on a raspberry pi I've got in the house, and I wanted it to survive short power outages. I looked at buying an off the shelf Uninteruptable Power Supply (UPS), but it just struck me as silly that I'd be using my house's 120V AC to power to fill a 12V DC battery to be run through an inverter into 120V AC again to be run through a transformer into DC yet again. When the house is out of power that seemed like a lot of waste.

A little searching turned up the PicoUPS-100 UPS controller. It seems like it's mostly used in car applications, but it has two DC inputs and one DC output and handles the charging and fast switching. The non-battery input needs to be greater than the desired 12 volts, so I ebayed a 15v power supply from an old laptop. I added a voltage regulator and buck converter to get solid 12v (router) and 5v (rpi) outputs. Then it caught on fire:

Scorched UPS controller

But I re-bought the charred parts, and the second time it worked just fine:

Working UPS setup


Pylint To Github

I spent a few hours trying to get the Jenkins Git & Github plugins to:

  • run pylint on all remote branch heads that:
    • arent' too old
    • haven't already had pylint run on them
  • send the repo status back to GitHub

I'm sure it's possible, but the Jenkins Git plugin doesn't like a single build to operate on multiple revisions. The repo statuses weren't posting, the wrong branches were getting built, and it was easier to write a quick script.

Now whenever someone pushes code at DramaFever pylint does its thing, and their most recent commit gets a green checkmark or a red cross. If/when they open a PR the status is already ready on the PR and warns folks not to merge it if pylint is going to fail the build. They can keep heaping on commits until the PR goes green.

I run it from Jenkins triggerd by a GitHub push hook, but it's setup so that even running it from cron on the minute is safe for those without a CI server yet.

Branches with green checks

Bitcoin Conversion In Google Spreadsheets

I've been using Charlie Lee's excellent Google Spreadsheet Bitcoin tracker sheet for awhile but it pulls data from a single exchange at a time and relies on the ordering of those exchanges on the bitcoinwatch.com site, which vary with volume.

I figured out I could get better numbers more reliably from bitcoinaverage.com, which (predictably) averages multiple exchanges over various time periods. They offer a great JSON API, but unfortunately Google spreadsheets only export JSON -- they don't have a function for importing it.

None the less I was able to fake it using a regex. You can pull the 24 hour average price in with this forumla:

=regexextract(index(importdata("https://api.bitcoinaverage.com/ticker/USD"),2,1), ": (.*),")+0

If you want that to update live (not just when you open the spreadsheet) you need to use Charlie's hack to get the sheet to think the formula depends on live stock data:

    NOW()*1E3)&REPT(GoogleFinance("GOOG");0)),2,1), ": (.*),")+0

I've put together a sample spreadsheet based on Charlie's.

Occuped: Twine + Go + App Engine

In our NY office We've got 40 people working in a space with two bathrooms. Walking to the bathrooms, finding them both occupied, and grabbing a snack instead is a regular occurrence. For a lark I took a Twine with the breakout board and a few magnetic switches and connected them to the over taxed bathroom doors.

The good folks at Twine will invoke a web hook on state change, so I created a tiny webapp in Go that takes the GET from Twine and stashes it in the App Engine datastore. I wrote a cheesy web front end to show the current state based on the most recent change. It also exposes a JSON API, allowing my excellent coworkers to build a native OS X menulet and a much nicer web version.

Occupied light

Crossed Lamps

Last weekend we bought two Rodd lamps at Ikea for the guest room, and it struck me how amused I'd be if each one switched the other. Six hours and a few new parts later, and it came out pretty well:

The remote action is especially jarring because the switches are right next to the bulbs they would normally control:

Ikea Rodd lamp head


Amazon S3 as Append Only Datastore

As a hack, when I need an append-only datastore with no authentication or validation, I use Amazon S3. S3 is usually a read-only service from the unauthenticated web client's point of view, but if you enable access logging to a bucket you get full-query-parameter URLs recorded in a text file for GETs that can come from a form's action or via XHR.

There aren't a lot of internet-safe append-only datastores out there. All my favorite noSQL solutions divide permissions into read and/or write, where write includes delete. SQL databases let you grant an account insert without update or delete, but still none suggest letting them listen on a port that's open to the world.

This is a bummer because there are plenty of use cases when you want unauthenticated client-side code to add entries to a datastore, but not read or modify them: analytics gathering, polls, guest books, etc. Instead you end up with a bit of server side code that does little more than relay the insert to the datastore using over-privileged authentication credentials that you couldn't put in the client.

To play with this, first, create a file named vote.json in a bucket with contents like {"recorded": true}, make it world readable, set Cache-Control to max-age=0,no-cache and Content-Type to application/json. Now when a browser does a GET to that file's https URL, which looks like a real API endpoint, there's a record in the bucket's log that looks something like:

bucketname [31/Jan/2013:18:37:13 +0000]
- 289335FAF3AD11B1 REST.GET.OBJECT vote.json "GET
  /bucketname/vote.json?arg=val&arg2=val2 HTTP/1.1" 200 - 12 12
9 8 "-" "lwp-request/6.03 libwww-perl/6.03" -

The full format is described by Amazon, but with client IP and user agent you have enough data for basic ballot box stuffing detection, and you can parse and tally the query arguments with two lines of your favorite scripting language.

This scheme is especially great for analytics gathering because and one never has to worry about full log on disks, load balancers, backed up queues, or unresponsive data collection servers. When you're ready to process the data it's already on S3 near EC2, AWS Data Pipeline or Elastic MapReduce. Plus, S3 has better uptime than anything else Amazon offers, so even if your app is down you're probably recording the failed usage attempts.