Wiki History Overlay

Wikis, like ry4an.org, are websites meant to be easily edited. One simply clicks the edit button, changes the content, and poof the page is changed. One of the most famous wikis is Wikipedia, the free encyclopedia. It's a wonderful resource and chocked full of information. Unfortunately, due to its anyone-can-change-it-at-any-time freedom, some folks are hesitant to consider it a reliable reference.

Wikipedia's documented accuracy is largely due to careful edit policing by interested persons. I could go change the date of Abraham Lincoln's birthday right now, but someone monitoring the changes would detect the "vandalism" and revert the change in minutes. Sadly, anyone viewing the Abraham_Lincoln page between my edit and the repair would see the wrong birthday.

Wikis also have great history features. One can look at every old version of every page, and can see the when, what, and who for every change. That's usually enough to identify intentional vandalism. The history information, however, isn't presented with the article -- it's on a separate page. It's easily available but it's not presented with the primary content.

If, instead, the primary page text were altered to indicate where recent edits were made that data could be identified as suspect and the rest could be assumed to be well vetted. I've done a mock-up of such a display below:

wikipedia.png

Using text background colors closer to full red to indicate new data, we can see the year has been very recently changed, a phrase in paragraph one 2nd most recently, and that the last two paragraphs were added individually and in reverse order.

There are a variety of schemes one could use to color text, including:

  • last three edits
  • only edits made in the last week
  • all edits colored by calendar age
  • all edits colored by number of subsequent edits

What's more, the history information could be placed around the edit portions using CSS span tags, allowing the color rendering mode to be toggled or entirely disabled by the reader while viewing.

The CPU usage required to generate the history spans over the document may be quite intensive, especially, if more than just the last few edits are included. Pre-caching and other web trickery would help. One could even make an external viewer or perhaps even a greasemonkey script to insert the spans without burdening the wiki's server as much. External solutions for Wikipedia could use the database dumps.

Somewhat related: Some folks at IBM did a very cool project wherein they generated activity graphs for Wikipedia pages, but again that's on a separate page from the article, not embedded within it. It does, though, produce some pretty pictures.

Fixing the Roomba Circle Dance

My Roomba had been on the fritz lately. When I powered it on it went forward a few inches and then started backing up in a tight circle. I figured it was a dirty sensor, but I cleaned everything I could see and had no luck.

My coworker Brandon pointed me to the Circle Dance website, which explains how a dirty internal sensor can cause just that problem. I've got an older Roomba, but the wheel assembly seemed the same. The site has great instructions and photos showing how one can fix the problem. They do, however, go through incredible contortions, including removing 10 screws and a hard to replace panel, just to remove a single screw.

I found I could skip all that by drilling a small hole in the fender rather than removing it. Given that replacing the fender is so difficult the original site recommends not bothering, I think a hole is an acceptable level of resulting cosmetic defect.

This image shows just where the hole was made:

roomba-hole.jpg

...and now the Roomba works great again.

KateAndRy4an.org

Kate Bauer and I put together the vanity/informational website for our wedding. Those joining us will find information on travel and hotels. Note the snazzy embedded google map on the bottom of the hotels page. Thanks go to Kate for writing most of the content and putting up with my insistance on hand-edited HTML.

The site also links to our engagement photos, where you can see how very lucky I am.

Improving Nick Tracking using String Similarity

Years back I wrote an IRC nick tracking script. It's served me well since then, but it has one major annoyance. When people changed their name slightly it would remember that name change, even though the old/new mapping didn't contain any real identity change information.

For example, when Gabe_ became Gabe it would display every message from him as <Gabe_(Gabe)>. That doesn't tell me anything interesting about who Gabe is.

I decided to tweak the tracker to ignore small changes in names. Computers don't think in terms like small they need a way to quantify difference and then see if it exceeds a specified threshold. Fortunately, lots of people have worked on just that problem -- mostly so that spell checkers can present you with a list that's close to the non-word you typed.

When I've worked with close enough strings in the past I've used the Levenshtein_distance as implemented in the String::Approx module or the ancient Soundex algorithm. This time, however, I tried out the String::Trigram module as written by Tarek Ahmed, which implements the method proposed by Angell in this paper. Here's an explanation from String::Trigram's README file:

This consists of splitting some string into triples of
characters and comparing those to the trigrams of some other string. For
example the string kangaroo has the trigrams "{kan ang nga gar aro
roo}". A wrongly typed kanagaroo has the trigrams "{kan ana nag aga gar
aro roo}". To compute the similarity we divide the number of matching
trigrams (tokens not types) by the number of all trigrams (types not
tokens). For our example this means dividing 4 / 9 resulting in 0.44.

Thus far, at a 50% match threshold it's never failed to detect a real change or ignore a minor-change, and if it does I should just be able to notch the match-threshold higher or lower. Great stuff.

The modified script can be viewed here and downloaded here.

Comments


If you wanted to only track nick changes in certain channels you'd add code line this at line 86:
return unless grep /^$chan$/, qw(#channelone #channeltwo #channel3);

I've modified 1.1 with a new /function, trackchan, that allows one to manage a list of channels where they want nick tracking to take place. If the list is empty, tracking will be done in all channels. The following is a unified diff.

What it doesn't do:

  1. Check to make sure that the channel you're passing in actually conforms to any standard channel naming conventions.
  2. Check to see if the channel already exists in the list before trying to remove it (though thanks to it just being a simple grep, no errors is returned in any case).
  3. Check to see if you're adding a duplicate channel to the list (feel free, it doesn't affect the functionality one bit).
  4. Have an option for printing the channel list. I think I will modify it to just print the channel list in addition to the usage if /trackchan is called with no arguments.

-- Gabe

--- nick-track.pl.orig  Thu Dec 22 10:37:34 2005
+++ nick-track.pl.trackchan     Thu Dec 22 14:50:30 2005
@@ -22,7 +22,7 @@
 use Irssi;
 use strict;
 use String::Trigram;
-use vars qw($VERSION %IRSSI %MAP);
+use vars qw($VERSION %IRSSI %MAP @CHANNELS);

 $VERSION = "1.1";
 %IRSSI = (
@@ -47,6 +47,7 @@
     'Asrael' => 'Sammi',
     'Cordelia' => 'Sammi',
 );
+@CHANNELS = qw();

 sub call_cmd {
     my ($data, $server, $witem) = @_;
@@ -84,6 +85,13 @@
     my ($chan, $nick_rec, $old_nick) = @_;
     my $nick = $nick_rec->{'nick'};

+    # If channel list is empty, track for all channels.
+    # If channel list is non-empty, track only for channels in list.
+    my $channels = @CHANNELS;
+    if ($channels > 0) {
+       return unless grep /^$chan$/, @CHANNELS;
+    }
+
     if (defined $MAP{$old_nick}) {  # if a previous mappings exists
         if (String::Trigram::compare($nick, $MAP{$old_nick},
                 warp => 1.8,
@@ -101,6 +109,34 @@
         }
     }
 }
+
+sub trackchan_cmd {
+    my ($data, $server, $witem) = @_;
+    my ($cmd, $channel) = split ' ', $data;
+    my @cmds = qw(add del);
+
+    unless (defined $cmd && defined $channel && map($cmd, @cmds)) {
+        print "Usage: /trackchan [add|del] #channel";
+        return;
+    }
+
+    if ($cmd eq 'add') {
+        push @CHANNELS, $channel;
+        print "$channel added to channel list";
+    }
+
+    if ($cmd eq 'del') {
+        @CHANNELS = grep(!/^$channel$/, @CHANNELS);
+        print "$channel removed from channel list";
+    }
+
+    print "Current channel list:";
+    foreach my $channel (@CHANNELS) {
+        print "    $channel";
+    }
+}
+
+Irssi::command_bind trackchan => \&trackchan_cmd;

 Irssi::signal_add("message public", \&rewrite);
 Irssi::signal_add("nicklist changed", \&nick_change);

Thanks, Dopp, great stuff! -- Ry4an

Linux on the Dell X1

Yesterday I got the warranty replacement machine for my (company's) Dell X300 laptop. Dell mailed me an X1, which seems a nice enough machine. It meets my firm criteria: under 3 lbs and thinner than an inch. If Apple would hit those numbers I'd be there in a second.

Unfortunately, it looks like getting Linux on to this thing is going to be a pain. Emperor Linux will sell an X1 with Linux pre-installed, but they want $450 to take the X1 I already "own" and put Linux on to it. If they're not able to simply mirror a debugged installation over, that says a lot about their volume. I value my time pretty highly, but $450 for a software install seems extreme.

Fortunately there are plenty of pages detailing how to get Linux running on the X1. I'll muddle through the process and attach my notes as comments.

Comments


I've found that to boot Knoppix using the external optical drive I need to use this boot time invocation:

knoppix fromhd=/dev/uba

Now to try qtparted to squish down the NTFS windows partition to something reasonable.


Using ntfsresize and fdisk I was able to squish the windows install down to 10GiB. Now I'm just waiting for my fedora core 4 DVD to arrive for install.

Fedora Core 4 installed from the DVD without a hitch. I had to download the ipw2200 firmware RPM to make the wireless work. The 855resolution utility as invoked from rc.local and tweaked in xorg.conf got the resolution notched up. Next up... ion.

Email Sub-Address Spam Frequency

My email server is configured such that email to ry4an-anything@ry4an.org gets correctly delivered to me. The dash and whatever is after it are retained but ignored completely.

When I give an email address to a company, say Northwest Airlines,I'll give them an email address that shows to whom it was given, say ry4an``-nwa``@ry4an.org. By doing this I'm able to check which companies are giving/selling/leaking my email address to spammers. Some of the leaks are surprising -- just a few weeks after giving out ry4an-philmont for the first time, giving it to the Boy Scouts, I started getting porn spam on it. When I called to let them know about the leak they assured me it was impossible.

Last month I decided to save all of my inbound spam and run some totals to see which sub-addresses got the most spam. Here are the counts:

  • 6427 total spam messages to ry4an.org in 34 days
  • 679 spam messages to plain ry4an*@*ry4an.org
  • the 10 most spammed sub-addresses were
Received Address Given to
2542 ry4an-slashdot Posted to http://slashdot.org
252 ry4an-dip Used in the Diplomacy community
159 ry4an-resume On my resume
141 ry4an-yahoo Given to yahoo.com
125 ry4an-cnet Byline for some articles I wrote
98 ry4an-oldenburg Defunct Oldenburg project
88 ry4an-poker Used at https://ry4an.org/poker/
84 ry4an-tclug Given to the Twin Cities Linux Users' Group
62 ry4an-dns Used for all my domain registrations
44 ry4an-keysigning Posted at https://ry4an.org/keysigning/

So it looks like the worst offenders aren't comanies to whom I've given my email address, but rather letting them get posted to the internet for automated crawlers to harvest.

Gmail users: You can do the same thing using the plus sign.

Comments


Yeah, the one I gave to United actually garners me the most spam. I emailed them to complain but was brushed off relatively quickly. -- Anonymous

Pocket Pair Palsy

A little googling shows I'm the first person to (publically) coin the phrase pocket pair palsy to describe the adrenaline powered tremors poker players get when they've got a good hand. Dibs.

Comments


I would argue that palsy is not the best condition to compare this to. While involuntary movement is occasionally the result of palsy, paralysis is more common and likely. Some definitions of palsy do not even include tremor-like movements.


I know that we both know people who have dealt with Bell's Palsy, which is indeed primarily paralysis, but most people still associate palsy more with tremors than with paralysis. Besides, St. Vitus's Hole Cards just doesn't have the alliteration thing going for it. -- Ry4an

Novelty Keg Scale

At the haloween party we got a keg of New Castle, but most folks went for the mixed drinks or the bottled beer. We kept trying to steer folks toward the keg, but it's hard to get people behind a project that's not providing good status reports.

That got me thinking that one could make a simple scale showing a gas-tank style empty to full scale for a keg of liquid. Googling for keg scale turned up some products for bar owners, but nothing consumer focused:

t_19874

If instead you could start with a mechanism similar to that found in these scales:

52564

you could provide interchangeable faceplaces for the standard keg gross and tare weights.

People rally around steadily progressing meters -- ask anyone who's been involved with a funraiser/telethon. I suspect produced cheeply enough something like this will sell to frats and the like.

Using Mutt to Automate Mailman Message Rejections

Since this post I've upgraded my mailman installation to a newer version, which allows me to automatically reject messages from non-subscribers without having to resort to external scripting.

However, some of the mailing lists I run are subscribed to by a significant number of members who can't be counted on to post from the email address with which they subscribed, or indeed to even understand what that means. For those lists a policy that automatically rejects messages from non-members is just too draconion. Unfortunately, that means the few spam messages a day from non-members which make it past my filters but would normally be automatically rejected due to their non-member origins have be manually discarded so that I can approve the few non-member messages per month that really do belong on the list.

Relying on mailman's email control interface (as differentiated from its web control interface) I was able to craft the following mutt macro to make the rejecting of undesirable non-member messages a single keystroke affair:

macro index X ":set editor=touch^Mv/confirm^Mryqd:set editor=\"vim -c 'set nocindent' -c 'set textwidth=72' -c '/^$/+1' -c 'nohlsearch'\"^M"

When the 'X' key is pressed the message editor is set to the UNIX 'touch' command which represents absolute minimal message editing. The rest of the macro replies to the confirmation subpart of the mailman message, which indicates rejection to mailman's email control interface. After the reply is sent the message editor is set back to it's usual value (vim).

Halloween 2005

Last weekend Bridget, Joe, and I threw our annual Halloween party. It was well attended and everyone seemed to have a good time. It peaked around midnight with a good 50 people inside and out, which is about the same as last year.

This year we did a little more with the decor including the building of a coffin cut-out and a few corpses. The walls got covered with cheesy off the shelf decorations that drew more praise than anything else -- go figure.

15.jpg

Photos are starting to get posted by various attendees, and I'll post them here as more appear. Not all photos in all sets are from our party, but a goodly fraction are:

Thanks to all who attended,