Filtering Mail Using Time of Day

I get a lot of email, and a good percentage of it is spam. To help filter the spam from the ham I use software called SpamAssassin (http://useast.spamassassin.org/). SpamAssassin applies hundreds of tests to each incoming email and increases or decreases the mail's spam score depending on the result. If the total spam score for a message is above a pre-set threshold (4.0 for me) it gets put aside.

No one rule is enough to get a message marked spam or ham; rather, each contributes a little to the overall determination. For example, here's the report for a piece of spam I recently received:

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.5 SPAM_SUB_ADDRS         Sent to a high spam sub address of mine
 1.5 MY_SUB_ADDRS           Sent to a sub address of mine
 0.0 BAYES_50               BODY: Bayesian spam probability is 50 to 56%
                            [score: 0.5002]
 0.6 HTML_FONT_INVISIBLE    BODY: HTML font color is same as background
 0.3 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 0.1 HTML_MESSAGE           BODY: HTML included in message
 0.1 HTML_FONTCOLOR_UNSAFE  BODY: HTML font color not in safe 6x6x6 palette
 0.1 BIZ_TLD                URI: Contains a URL in the BIZ top-level domain
 1.1 MIME_HTML_ONLY_MULTI   Multipart message only has text/html MIME parts

Various mis-uses of HTML formatting and sending to some email addresses I reserve for spam only earned that message a score of 4.3.

SpamAssassin is in common use, and its set of rules works very well. Looking at a summary of my email in the last 52 days one sees these numbers:

  • Spam : 1377 (missed: 53) (0.37%)
  • Virus: 207 (0.06%)
  • Ham : 2162 (0.58%)
  • Total: 3746

So of 1377 spam messages that came in during the 52 day span only 53 were missed by SpamAssassin -- just 3.85%. Not bad, but there's always room for improvement.

It seemd to me that the majority of the missed spam would come in overnight. In the morning I'd have plenty of ham messages and a piece or two of spam that SpamAssassin had missed sitting in my mail. I decided to track when spam came in to see if I could add a rule giving a higher spam score to messages arriving overnight.

A little googling showed that others have done the same sort of tests and found that spam arrives evenly throughout the day. On The Origin of Spam has some great stats available at http://db.org/spam/. Still I gathered my own to see if they'd show the same pattern (see attached graph).

I found that spam does, indeed, favor no particular hour. However, the non-spam ham messages show pretty much the pattern you'd expect. They largely arrive while other people in my timezone are awake. So if I slip in a rule that gives some score bonus to emails arriving during off-peak hours I should see a decrease in that 3.85% error rate. I'll try it out and post the results.

mailchart.png

Dumbing Down scp

The tool scp, UNIX for secure copy, is a wrapper around ssh, secure shell. It lets you move files from one machine to another through an encrypted connection with a command line syntax similar to that of the standard cp, local copy, command. I use it 100 times a day.

The command line syntax for scp is at its most basic:

scp <source> <destination>

Either the source, destination, or both can be on a remote computer. To denote that one just prefixes the file name with "username@machinename:". So this command:

scp myfile ry4an@elsewhere:/home/ry4an/myfile

would copy the contents of the local file named 'myfile' to a remote system named 'elsewhere' and place it in the directory /home/ry4an with the name myfile.

As a shortcut one can omit the destination directory and filename, in which case they default to the home directory of the specified user and the same filename as the original. Nice, simple, handy, straightforward.

However, when you're an idiot in a rush like me in addition to skipping the destination directory and filename you also skip the colon, yielding a command like:

scp myfile ry4an@elsewhere

When you make that mistake scp decides that what you wanted to do was take the local file named 'myfile' and copy it to another local file named 'ry4an@elsewhere'. That's certainly consistent with their syntax -- no colon, no remote system, but it is never what I want.

Instead I go to the 'elsewhere' machine and start looking for 'myfile' and it's not there, and I'm very puzzled because scp didn't give me any error messages. I don't know of anyone who has ever wanted scp to do a local copy, but for some reason the scp developers took extra time to add that feature. I want to remove it.

The best way would be to edit the scp source and neuter the part that does local copies. The problem with that is I'd have to modify the scp program on lots of machines, many of which I have no control over.

So, as a cheesy work around I spent a few seconds writing a bash shell function that refuses to invoke scp if there is no colon in the command line. It's a crappy band-aid on what's definitely a user-error problem, but it's already saved me from making the same frustrating mistake 10 times since I wrote it last week. Here it is in case anyone else has the same un-problem I do:

function scp () { if echo $* | grep -v -q -s : ; then
  echo scp: missing colon /usr/bin/scp
else
  /usr/bin/scp $@
fi

Just put that in your .bashrc and forever be saved from having a drive full of user@elsewhere files. Or don't; I don't care about you.

Diplomacy at Sea and a Templated Evolver

I've got a policy here on the unblog where I only do entries about things I've created. When I wanted to hype the Diplomacy at Sea V/Dip Con event coming up March 2005 I had to find something to do with it first, so I volunteered to set up their website.

Whenever I need to set up a quick site I head over to the Open Source Web Design site (http://oswd.org/) and pick from their vast array of great designs. This time I went with one called Evolver. It has a clean look and clean code. Rialto did a great job of synthesis and design on this one.

In order to avoid having to edit every page in a site when I want to update a footer or navigation bar I often use Apache's Server Side Includes (SSI) to pull in elements that are on the same on every page. This time I decided to go a step further and see if I could use a single .shtml page for every page on the site. It ended up working out in a fashion sufficiently general to be worth re-distributing.

The outcome can be seen on the https://ry4an.org/dipatsea/ site, though it looks just like Rialto's original design. Attached is a zip-file containing the templatized page and a few paragraphs detailing the necessary symlinks. You'll have to enable .shtml SSIs in your Apache install, of course.

evolver-template.zip

RSS Feed for Security Camera List

I added a RSS feed to the Minneapolis Security Camera Project site. Subscribers to it will see any new cameras as soon as they're posted. If you're not yet using an RSS aggregator let me know and I'll chisel the list on a stone tablet and mail it to you whenever it's updated.

The feed is at http://mpls-watched.org/map/rss.xml

History of the World - Attack Probabilities

History of the World is a fine game from Avalon Hill. It's distributed by Hasbro now, and it's one of the rare Avalon Hill games that Hasbro managed to improve when they "cleaned it up".

History of the World uses dice to simulate combat, and they do so in a way so as to intentionally skew the likelihood of success toward the attacker. There are, however, various terrains (mountainous, ocean straight, forest), types of attack (amphibious), bonuses (strong leader, elite troops, forts, weaponry, etc.) which can affect the success rate of an attacker.

While playing last night we tried to estimate the relative merits of the different troop-efficacy modifiers and found that as if often the case with probability it was difficult to agree even on estimates.

As a result I sat down and wrote some quick Perl simulations to find the numbers for all the various scenarios. This wasn't done in the most efficient way possible, but it's accurate. The terms 'win', 'tie', and 'loss' are from the attacker's point of view.

Attk Dice Attk Bonus Def. Dice Def. Bonus Win Tie Loss
2 0 1 0 57.87% 16.67% 25.46%
2 0 1 1 41.67% 16.20% 42.13%
2 0 2 0 38.97% 22.07% 38.97%
2 0 2 1 22.38% 16.59% 61.03%
2 0 3 0 28.07% 24.77% 47.16%
2 0 3 1 12.96% 15.11% 71.93%
2 1 1 0 74.54% 11.57% 13.89%
2 1 1 1 57.87% 16.67% 25.46%
2 1 2 0 61.03% 16.59% 22.38%
2 1 2 1 38.97% 22.07% 38.97%
2 1 3 0 52.84% 19.23% 27.93%
2 1 3 1 28.07% 24.77% 47.16%
3 0 1 0 65.97% 16.67% 17.36%
3 0 1 1 49.38% 16.59% 34.03%
3 0 2 0 47.16% 24.77% 28.07%
3 0 2 1 27.93% 19.23% 52.84%
3 0 3 0 35.23% 29.54% 35.23%
3 0 3 1 16.69% 18.54% 64.77%
3 1 1 0 82.64% 9.65% 7.72%
3 1 1 1 65.97% 16.67% 17.36%
3 1 2 0 71.93% 15.11% 12.96%
3 1 2 1 47.16% 24.77% 28.07%
3 1 3 0 64.77% 18.54% 16.69%
3 1 3 1 35.23% 29.54% 35.23%

Attached is the script and the same output in various formats.

HoW-dice.tar.gz

Bookmarksync Patch for Tab-Group Bookmarks

I use the Mozilla web browser on a few different machines, and its lack of roaming profiles/bookmarks is a source of annoyance. When I bookmark a site on my laptop, I want that bookmark to be available on my desktop machine. In order to achieve that I use a kludgy network of scripts, CVS, crontab entries, and the bookmarksync application.

Bookmarksync (http://sourceforge.net/projects/booksync/) is a tiny little program that takes as input two different bookmark files and outputs the combination of the two. It works just fine in all respects, except that it converts tab-group bookmarks into bookmark folders.

Mozilla has tabbed browsing which means one can have multiple pages open in the same window at once with little index tabs for each. Mozilla also has a feature wherein one can open more than one window and then bookmark the entire set of tabs. Selecting that bookmark in the future will open all the pages that were open when the tab set was bookmarked. I mostly use the feature to keep a bookmark group of all the news sites I read daily. When I select that bookmark twenty tabs spring open, and I just close each one as I read finish reading it.

Bookmarksync (v0.3.3) doesn't know about tab-group bookmarks and instead converts them to just folders full of individual bookmarks. I fired up my C skills for the first time in a very long while and hacked support for bookmark tabs in. I've no idea if I did it well or not, but it works now and that's all that matters to me.

Attached is the patch file and README which was submitted to the project.

foldergroup-patch--bookmarksync-0.3.3.tar.gz

University of MN Magic Number Guessing

Back when I started at the University of Minnesota in 1995 the course registration system was terminal/telnet based. Students would register using a clumsy mainframe-style form interface. When a class a student wanted was full or required unsatisfied prerequisites, the student come supplicant would go to the department to beg for a "magic number" which, when input into the on-line registration system, would allow him or her admission into the course.

Magic numbers were five digits long and came pre-printed in batches of about sixty when provided to departmental secretaries. For each course there existed a separate printed list of magic numbers. As each number was handed out to a student it was crossed off the list, indicating that they were single-use in nature.

As getting one's schedule "just so" was nearly impossible given the limited positions in some courses, and if I recall correctly being particularly frustrated that the only laboratory session remaining open for one of my courses was late on Friday afternoons, I set out to beat the magic number system.

The elegant solution would have been to find the formula used to test a five digit number against the course information to see if it was a match. This, however, presupposes that there existed an actual test and not just a list of sixty numbers for each course. Given than the U of MN had 1000s of courses it's certainly hoped that they didn't design a system requiring the generation and storage of 60,000 numbers, but one never knows. A day spent playing Bletchley Park with previously received magic numbers and their corresponding course numbers found no easily discernible pattern, and given the lack of certainty that there even was one I decided to move on.

A five digit magic number leaves only 100,000 possible options. With at least sixty available per course that's a one in 1,666 chance per guess. Given average luck that's only 833 expected guesses before a solution hits. Tedious when done manually, but no problem for a script.

At the time, Fall 1997, my script-fu was weak, but apparently sufficient. I used Perl (poorly) to create a pair of scripts that allowed me to login, attempt to register for a course, and then kick off a number guesser. In case the registration system had been programmed to watch for sequential guesses, I pre-randomized all 100,000 possible magic numbers and tried them in that order. Given that they didn't even bother to watch for thousands of failed guesses in a row this was probably overkill, but better safe than sorry.

The script worked. My friends and I got our pick of courses for the next few quarters, and despite my boastful nature news never made it back to the U that such a thing was occurring. We only stopped using the system when the telnet based registration was retired in favor of a web based system. Knowing what I now do about automating HTTP form submissions, the web based system would likely have been even easier to game.

The biggest glitch in the system was the fact that magic numbers were single use. Whenever I "guessed" a magic number that was later given by the department to a student, that student's number wouldn't work. However, being given non-working magic numbers was a fairly regular occurrence and certainly not a cause for further investigation on the part of the department. Indeed, the frequency with which my friends and I were given non-working magic numbers leads one to wonder if others weren't doing exactly as we were either using scripts, manual guessing, or by riffling the secretaries' desks.

I've attached a screen-shot of the script in progress from an actual course registration in 1998. Also attached are all the files necessary for use of the original script though since the target registration system is long gone they're only of historical interest. Looking at the code now, I'm really embarrassed at both the general style and the overall design. The open2 call, the Expect module, or at least named pipes would have made everything much cleaner. Still it worked well enough, and I never got caught which is what really matters.

magic-number.gif

magic-number.tar.gz

Comments


Doh, had no idea those attachment were as big as they were. Sorry 'bout that. -- Ry4an

Hang on a sec...you're using Emacs in that screenshot! -- Luke Francl

I know, I didn't see the light and switch to vi until 1998. Goes to show you're never too late to repent. -- Ry4an

Turn Sequence Diagram for Kill Doctor Lucky

Cheap Ass Games (http://cheapass.com) make a lot of great games which are available for very little money. A perennial favorite is Kill Doctor Lucky which is sort of like Clue, in reverse, and with a better sense of humor. It's an easy game to learn from the rules, but no one wants to read the rules. I've found that putting this turn sequence diagram I whipped up in front of a first time player augmented with a quick explanation of killing and failure is enough to get someone from zero to playing in 30 seconds. Maybe the computer geeks with whom I hang out are more into state diagrams than the general populace, but it works for us.

The attached image shows the diagram. The attached archive contains the source diagram, a .pdf version, and a the same image.

doctor-lucky.png

doctor-lucky.tar.gz

Comments


Sometimes the best things to use are the simplest ones, this works perfectly and takes any guesswork out of the equation for a new player. Now if I only one of the for my life, that would be truly useful! -- Louis Duhon

Dude, we made one for your life... it's blank -- Gabe Turner


Heh. (hate you!) -- Ry4an

Making Driftnet work in Webcollage in Xscreensaver

EtherPEG (http://www.etherpeg.org/) is software for the Mac the listens in on local network traffic, identifies any images being downloaded, and displays them. Driftnet (http://www.ex-parrot.com/~chris/driftnet/) is Linux software that does the same thing, but offers better command line integration. Xscreensaver (http://www.jwz.org/xscreensaver/) is the screen saver/console locking program I use to keep people from using my laptop when I'm getting a refill at my local coffee shop.

Xscreensaver has a display mode called 'webcollage' (http://www.jwz.org/webcollage/) that can use driftnet to show modified images from the network as the screen saver display. So when I'm away from the laptop it pops up pictures from all the websites that everyone else on the wireless network is looking at. At least in theory it does. Actually, I couldn't get it to do that at all.

I'm sure there's a single, simple thing I was missing to make xscreensaver use webcollage use driftnet, but I couldn't find it if there was one. Xscreensaver kept starting multiple copies of driftnet and never finding the images they captured. I had a blank screen saver and a lot of run-away processes. I'm sure jwz, Xscreensaver's author, could've told me what I was doing wrong, but people who ask him questions tend to get found hanging from meat hooks in his club.

I ended up hacking the xscreensaver config and webcollage script to get things working. I've attached a tarball containing instructions and a patch in case anyone else is having the same problems I did.

driftnet-webcollage-xscreensaver.tar.gz

Comments


That is actually very cool, every come back to some questionable content for a public place? -- Louis Duhon

Mostly just friendster stuff, cnn, and sports. You should try out etherpeg, the mac version, it's definitely fun. -- Ry4an

Drink Recipe Fortune File

I like mixing drinks, despite having no real skill for it. The commercial bar-tending courses seem to rely extensively on flashcards for the learning of drink recipes, so I though a UNIX fortune file of drink recipes would be a natural fit for learning.

Unfortunately, I couldn't find any good, public drink listings in a format I could parse. Eventually I found one in CVS from this project: http://drinkmixer.sourceforge.net/ It's not huge, but at least its a start and easily added to.

Output looks like:

[ry4an@home ~]$ fortune drinks Blue Blazer (serve in Old Fashioned Glass)

  • 1 1/2 oz. Scotch Whiskey
  • 1 1/2 oz. Boiling Water
  • 1 tsp. Sugar

Instructions: WARNING USE EXTREME CAUTION! Get two silver plated mugs with handles. pour Scotch and boiling water in one mug. Ignite the liquid, and while blazing pour rapidly back and forth from one mug to another. Pour into old-fashioned glass with sugar.

The nature of the fortune command also makes the recipes search-able using the -m option like this:

[ry4an@home ~]$ fortune -i -m 'Rob Roy' drinks (drinks) % Rob Roy (serve in Cocktail Glass)

  • 1 1/2 oz. Scotch
  • 3/4 oz. Sweet Vermouth
  • 1 dash Orange Bitters

Instructions: Shake well with cracked ice. Strain into cocktail glass. %

The fortune file is probably of no use to anyone actually taking bar-tending classes as everyone's recipes are always a little different, but for the rest of us it's a nice little learning tool.

The fortune file, corresponding .dat index file, source .csv file, and munging script are found in the attached archive.

drinks-fortune-1.0.tar.gz