Previously I mentioned I was importing the full corpus of BoingBoing posts into MonogoDB, which went off without a hitch. The import was just to provide a decent dataset for trying out Rogue, the Mongo searching DSL from the folks at Foursquare. Last weekend I was in New York for the Northeast Scala Symposium and the Foursquare Hackathon, so I took the opportunity finish up the query part while I had their developers around to answer questions.
In the end though, there was very little to do. I just had to define a case class to represent a Boing Boing post:
class Post extends MongoRecord[Post] with MongoId[Post] { def meta = Post object comment_count extends LongField(this) object venuename extends StringField(this, 255) object basename extends StringField(this, 255) object author extends StringField(this, 255) object title extends StringField(this, 255) object body extends StringField(this, 86000) object categories extends MongoListField[Post, String](this) object created_on extends DateTimeField(this) }
and the rest took care of itself. After a few imports I could query the posts directly from the Scala repl:
scala> (Post where (_.author eqs "Cory Doctorow") fetch).length res0: Int = 27701 scala> val z = Post where (_.author eqs "Cory Doctorow") and (_.categories contains "History") fetch z: List[org.ry4an.boingboingrogue.Post] = ... scala> z.map(_.title.toString) res1: List[java.lang.String] = List(Bailout costs more than Marshall Plan, Louisiana Purchase, moonshot, S&L bailout..
What code there is is up in the repository at BitBucket. Thanks to @jorgeoritz85 for on-site help and to the Foursquare folks for a tool that's as easy to use as it looks.
This work is licensed under a
Creative Commons Attribution-NonCommercial 3.0 Generic License.
©Ry4an Brase | Powered by: blohg 0.10.1+/77f7616f5e91