January, 2009


26
Jan 09

Chatsworth: A Google talk group chat bot

For the past year and a half, I’ve been using PartyChat to participate in multi-user chats with friends.  PartyChat essentially provides IRC type functional via GTalk and has been a great way to have on going conversations throughout the day while avoiding 100+ email threads.

The only pitfall PartyChat has to do with stability.  PartyChat is a free service used by a large number of people, and it is run off the project creator’s computer in his apartment.  I do not know exactly why the service goes down intermittently, but the recipe of home server plus lots of users cannot be helping.  While it stinks that PartyChat isn’t always up, I understand I have no right to demand or expect 100% uptime, so I decided to do something about it.  I leveraged some of the knowledge I picked up about XMPP, the protocol that Jabber and GTalk use, from another project and wrote a simple group chat bot called Chatsworth to improve the availability of group chat that my friends and I use and depend on.

Chatsworth is a windows service written in C# that provides basic chat room functionality.  It does not have all the features that PartyChat has, but it does offer people the ability to setup and manage their own chat bots.  Additionally, if you are concerned about having all your chat logs being available not just to Google but also to the people running the PartyChat servers, then Chatsworth is the group chat provider for you.  Chatsworth is fully functional but still immature and under development.  I plan to add some additional features, take care of a few loose ends and provide more unit tests in the immediate future.

Admittedly I could’ve just downloaded the PartyChat java source, compiled it, and ran it on my own set of servers, but where is the fun in that?  I figured rolling my own chat bot would give me the opportunity to do more than address some of the availability issues I was having.  Starting up Chatsworth provides me a non-trivial project that I can use to explore different software development concepts and techniques as well as get some code and design samples out on the web.  So if you’re in the market for group chat in GTalk, give Chatsworth a try.  I am looking forward to building this project out further and would love to hear any feedback.


11
Jan 09

Just Say No (to the Big Redesign)

In a post Robert Martin made last Friday, he described a pretty common scenario that companies and teams go through in an effort to bring peace and harmony to their codebase and restore order over all their components, known as the big redesign.  Having just gone through this at work and having been on the “Tiger Team” that ported/rewrote our legacy application shell that was a muddled mix of C++/MFC and C#/WinForms, this post really resonated with me.

The moral of Uncle Bob’s story is that the only way to clean up messes in your codebase is to do via incremental changes iteration by iteration and release by release.  If I would have read this post six months ago, I might have argued that this is an overly simplistic take and that exceptional cases exist that make big redesigns necessary.  Having just gone through a big redesign; however, I am confident that fixing your messes iteration by iteration is the only way to fix your codebase issues for the long term especially if your team is trying to develop with agile processes.

My redesign change of heart did not come because of a spectacular failure or the system ending up in a worse shape than before.  I like to think our sob story about how terrible our legacy codebase was made a pretty compelling case for a redesign (doesn’t everyone?) but I’ll spare you the back story.   When it comes down to it, even the best reasons and intentions don’t justify a full blown redesign.  Here is why in big bold letters:

You are not addressing the root causes

A codebase does not fall into disrepair over night.  Having the Tiger Team do all the heavy lifting and making all the design decisions might solve the problem righthissecond but none of the problems that got you in this bind will have been addressed. Also, if the redesign is radical enough, you could actually be reducing your teams ability to manage and maintain your codebase even further.  Choosing to solve your team’s problems by delegating cleanup to a Tiger Team is the software management equivalent of giving a man a fish instead of teaching him how to fish.

Having a big redesign short circuits everything that the agile process tries to address.  If your team is ignoring the feedback it is getting from having to work with brittle code, not delivering on user story commitments, and rising incidents counts, then what is the redesign buying you other than masking these problems for your team a little longer?  If your team is considering doing a big redesign, do yourself a favor and address the real problems your team has.  Just say no to the big redesign.


10
Jan 09

Crunching some NFL Stats with F#

To explore functional programming, I’ve decided to return to a familiar problem domain, football stats. I used this domain a couple years ago when I was in the process of making the transition from the Unix-based OS/Java world to the Microsoft/C# world. I am the type of person that learns better by doing than studying, so I’m going to try and jump in and cobble something together to start the learning process. I’ve watched the PDC presentation by Luca Bolognese, and I’ve read through the first couple chapters of Don Syme’s Expert F#, so consider me armed with an F# Interactive window and dangerous.

The first stat that I plan to look at is the QB Score Stat as outlined by Berri, et al. in Wages of Wins.  The stat is much easier to calculate than the traditional QB Rating used by the NFL, and if you read the link and/or book, you’ll see that it correlates much better to wins and points than QB rating. For our purposes, I’ll just outline the formula here but I do recommend checking out the links for more info.

QB Score = Total Yards – 3 * Plays – 30 * Turnovers

I got the 2008 QB stats from Yahoo, dumped them into Excel and then saved them off into CSV.  This can be done programmatically fairly easily with HtmlAgilityPack and Linq to Xml but I’ll save that for another post.  I’ve provided a copy of the stats in CSV here.

So to get started here is what we have to do in order to calculate the raw QB score and the QB score per play for all the NFL QB’s:

  1. Read in the CSV file
  2. Grab the relevant stats for our calculation
  3. Calculate the QB score per play for each QB
  4. Return the QB name and the score.

I’ll tackle this step by step and we can verify our results via the F# Interactive window.

To read in the file we can leverage the .Net System.IO library. The call pattern to read a file into memory is identical to what you would see in C# or VB and is pretty straight forward.

   <br />#light    <br />open System.IO    <br />let filePath = "D:\code\data\QB_Stats_2008.csv"    <br />let stream = new FileStream(filePath, FileMode.Open)    <br />let reader = new StreamReader(stream)    <br />let csv = reader.ReadToEnd()    <br />

Here is the output of the F# interaction window.
val filePath : string

val stream : System.IO.FileStream

val reader : System.IO.StreamReader

val csv : string

As we can see from the output, ‘csv’ is string that holds the contents of the QB stats file. Since we know that the file is is a CSV file, we can break it down into its individual elements like so:

   <br />let stats =    <br />csv.Split([|'\n'|])    <br />|> Seq.skip 1    <br />|> Seq.map(fun line -> line.Split([|','|]))    <br />|> Seq.map(fun values ->    <br />string values.[0], // qb name    <br />System.Int32.Parse(values.[5]), // att    <br />System.Int32.Parse(values.[7]), // pass yds    <br />System.Int32.Parse(values.[11]), // int    <br />System.Int32.Parse(values.[12]), // rushes    <br />System.Int32.Parse(values.[13]), // rush yds    <br />System.Int32.Parse(values.[17]), // sacks    <br />System.Int32.Parse(values.[20])) // fumbles lost    <br />

Since ‘csv’ is a string, we can use the Split method to chunk the string up into individual lines using the ‘\n’ character as our split token.  Once split into individual lines, the pipeline operator on line 3 further processes each line.  Sequences in F# can be thought of as IEnumerables from C# and come with some nice baked-in methods to help with processing. Our QB stats CSV file has as its first line a key to the data. We’ll need to skip that first line before we get to process the real data, and to do so we’ll use one of those nice baked-in methods (Seq.skip) to do so.

Line 4 further deconstructs the csv file into the individual comma delimited values tokenizing each line. After the lines have been tokenized the individual values can be read. Here I’ve created a tuple to hold each lines values. The tokenized values have been collected in a tuple that holds 8 values.  The mapping of the values is specified by the comments.

Here is the output of the F# interaction window after step 2:

val stats : seq

After step two we have a sequence of tuples that have only the stats and information that we care about. The next step now becomes calculating the QB score. The calculation of the score requires three sub-steps, so let us revise the outline we laid out earlier to include them.

  1. Read in the CSV file
  2. Grab the relevant stats for our calculation
  3. Calculate the QB score per play for each QB
    1. Create the formula function
    2. Compute the components of the formula
    3. Create the desired output
  4. Return the QB name and the score

Let’s tackle the first sub-step and codify the formula now and see what we’ll need to provide from the data we just acquired.

   <br />let qbcalc (plays,yards,turnovers) = yards - 3 * plays - 30 * turnovers    <br />

This line of code creates a function called qbcalc that takes in a tuple composed of the plays, yards, and turnovers components of the formula.

If we run the qbcalc function through the interactive window we get:

val qbcalc : int * int * int -> int

The end result of this is the raw QB score. The arithmetic operations in F# are similar to most languages, so the formula is a straight forward expression without any surprises. Since we know plays, yards and turnovers are all integer values, we could further constrain the types of values that the tuple is composed of, but F#’s type inference already does this for us, so it is not needed. When the compiler analyzed this code, it was able to ascertain from the operations and the integers used that the plays, yards and turnover values were of type int and automatically created the int constraints.

The next step is to compute the individual values of plays, yards, and turnovers. Before we start, I just want to note that I am sure there is a slicker, more concise way to do this, but this is my first go at this, so pardon the mess.

   <br />let names = stats |> Seq.map(fun(name,_,_,_,_,_,_,_) -> name)    <br />

Here we start to perform operations on the stats sequence we captured from the CSV file. The basic structure of what I am doing here is grabbing the specific values of the components I am looking to either aggregate (names) or calculate (plays, yards, and turnovers) from the sequence and mapping them to a new sequence. Here is an example of how to create the plays sequence.

   <br />let plays = stats |> Seq.map(fun(_,att,_,_,rush,_,sacks,_) -> att+rush+sacks)    <br />

Here the stats sequence is pushed through the pipeline operator ( |> ) which allows you to chain functions in a sequence. This is happens because, as pointed out in Expert F#, the pipeline operator is just function application in reverse. This can be expressed like so:

   <br />let (|>) x f = f x     <br />

So in our case when we have the following:

   <br />stats |> Seq.map (fun(_,att,_,_,rush,_,sacks,_) -> att+rush+sacks)    <br />

Chaining the stats sequence with the the Seq.map function will apply the function we’ve defined in the parenthesis to each element in the stats sequence and return a new sequence with the results of the function.  The function we have defined has a signature that matches the 8 value tuples that compose the stats sequence. Since only a few values are needed to be computed for the various values, ‘_’ can be assigned to the values in the parameter definition and more meaningful names can be given to the values we care about. On the right hand side of the -> (a symbol that represents a function), we do the simple adding of the values. Again the results of this function are collected in a new sequence that is returned from the Seq.map call.

After all the individual components of the QB score formula have been computed, we’re left with a bunch of individual sequence values that need to be reconstructed into something that we can pass to the the qbcalc function. The calculation function is defined as taking a tuple composed of a play, yard, and turnover values, so we need to utilize another method that Seq provides called zip.

Here is the code that crunches the individual components.

   <br />let getStats =    <br />let stats = loadQBStats    <br />let names = stats |> Seq.map(fun(name,_,_,_,_,_,_,_) -> name)    <br />let plays = stats |> Seq.map(fun(_,att,_,_,rush,_,sacks,_) -> att+rush+sacks)    <br />let yards = stats |> Seq.map(fun(_,_,passyd,_,_,rushyd,_,_) -> passyd + rushyd)    <br />let turnovers = stats |> Seq.map( fun(_,_,_,int,_,_,_,fum) -> int+fum)    <br />Seq.zip3 plays yards turnovers |> Seq.zip names    <br />

The final step to complete is to apply the qbcalc function to each play, yard and turnover tuple, and zipping up the resulting sequence with the names sequence rounds out steps and completes our task. The values were balled up into tuples in previous steps, so a lot of what is left to do is unpacking what we need to do the actual calculation and then reassemble to the output. The unpacking of the tuples are done with the fst and snd functions that are applied to the sequences. These methods return the fst, and the snd functions return the first and second elements of the tuples respectively. The last line of the doCalc function divides the raw QB score over the plays completing the calculation and then back pipes that sequence to be zipped up with the names. The zipped sequence gets returned, and at last we’ve calculated the QB score per play for the 2008 season. The last thing to note with the calculation is that in order to get better precision from the final result, the int values being divided need to be converted to a decimal. If the integers aren’t converted, then the results of the division operation will be rounded down, and we’ll lose precision on the calculation.

    <br />let doCalc =     <br />let stats =    <br />getStats    <br />let names =    <br />stats |> Seq.map fst    <br />let rawScore =    <br />stats    <br />|> Seq.map snd    <br />|> Seq.map qbcalc    <br />let plays =     <br />stats    <br />|> Seq.map snd    <br />|> Seq.map (fun (plays,_,_) -> plays)    <br />let components =    <br />Seq.zip rawScore plays     <br />Seq.zip names <| Seq.map(fun(x:int, y:int) -> System.Convert.ToDecimal(x)/ System.Convert.ToDecimal(y)) components    <br />

Below is the complete source listing of my first crack at doing something useful with F#. There are a couple things (the packing and repacking of the tuples, the CSV parsing) that scream “optimize me”. In my next F# post, I’ll refactor this code to slim it down and package it up so I can display these results graphically via C#.

   <br />#light    <br />open System.IO</p>  <p>let loadQBStats =   <br />let filePath = "D:\code\ProFootballDB\Data\QB_Stats_2008.csv"    <br />let stream = new FileStream(filePath, FileMode.Open)    <br />let reader = new StreamReader(stream)    <br />let csv = reader.ReadToEnd()</p>  <p>let stats =   <br />csv.Split([|'\n'|])    <br />|> Seq.skip 1    <br />|> Seq.map(fun line -> line.Split([|','|]))    <br />|> Seq.map(fun values ->    <br />string values.[0], // qb name    <br />System.Int32.Parse(values.[5]), // att    <br />System.Int32.Parse(values.[7]), // pass yds    <br />System.Int32.Parse(values.[11]), // int    <br />System.Int32.Parse(values.[12]), // rushes    <br />System.Int32.Parse(values.[13]), // rush yds    <br />System.Int32.Parse(values.[17]), // sacks    <br />System.Int32.Parse(values.[20])) // fumbles lost    <br />stats</p>  <p>let qbcalc (plays,yards,turnovers) = yards - 3 * plays - 30 * turnovers</p>  <p>let getStats =   <br />let stats = loadQBStats    <br />let names = stats |> Seq.map(fun(name,_,_,_,_,_,_,_) -> name)    <br />let plays = stats |> Seq.map(fun(_,att,_,_,rush,_,sacks,_) -> att+rush+sacks)    <br />let yards = stats |> Seq.map(fun(_,_,passyd,_,_,rushyd,_,_) -> passyd + rushyd)    <br />let turnovers = stats |> Seq.map( fun(_,_,_,int,_,_,_,fum) -> int+fum)    <br />Seq.zip3 plays yards turnovers |> Seq.zip names</p>  <p>let doCalc =    <br />let stats =    <br />getStats    <br />let names =    <br />stats |> Seq.map fst    <br />let rawScore =    <br />stats    <br />|> Seq.map snd    <br />|> Seq.map qbcalc    <br />let plays =     <br />stats    <br />|> Seq.map snd    <br />|> Seq.map (fun (plays,_,_) -> plays)    <br />let components =    <br />Seq.zip rawScore plays     <br />Seq.zip names <| Seq.map(fun(x:int, y:int) -> System.Convert.ToDecimal(x)/ System.Convert.ToDecimal(y)) components    <br />

Useful links:


4
Jan 09

Best thing I’ve read all day

“Most programmers think it is a sin to write code w/o comments, but it is a greater sin to write code that cannot be understood without them” – Paul Berry via Twitter


4
Jan 09

Exploring functional programming

I am trying to heed the Pragmatic Programmer’s advice to learn a new language each year and this year I figured it would not be a bad idea to look at functional languages.  There is a lot of buzz and blog posts about functional programming going on right now so there should be plenty of fresh material and this won’t be my first exposure to functional programming.  I’ll have to dust off some of my notes from the programming languages class I took during some of my masters work.

I am planning to follow along with the Real World Haskell book club (which starts tomorrow night) and also mess around with F# so I thankfully won’t have to rely on my one semester of OCaml to help me get going.  As a fan of sports and statistics I’ll probably try and use that as my problem domain while exploring these languages.  I am really looking forward to trying to wrap my mind around functional programming as it is so different than thinking about objects and should be a great “neurobic” activity for the ol’ brain.


2
Jan 09

F# Support for the SyntaxHighlighter WordPress Plug-in

I added the F# keywords to syntaxhighlighter.php and added the F# brush that Elijah Manor put together to the SyntaxHighlighter WordPress plug-in.  I am not sure if anyone is still maintaining that project but you can download the zip here and install it the same way as the original.

To highlight F# code you can now do code here.  You can also swap out ‘f#’ for ‘f-sharp’ or ‘fsharp’


1
Jan 09

Mini9 & Vista Quick Review

I’ve been playing around with my Dell Mini9 (1Gb RAM, 32GB SSD) netbook that I picked up recently over the holidays and finally got around to swapping the Ubuntu installation that came from the factory with a slimmed down version of Vista Ultimate.  For a machine that has the type of resource constraints and intended use (email/web/light office app work) a typical full Vista install wouldn’t be practical but thanks to the vLite configuration tool I was able to shrink my installation footprint down to roughly 3.5Gb.  If you have the right tools (a couple gig flash drive, access to a copy of Vista, and some patience) then the conversion isn’t that painful or time consuming.  The one recommendation that I would like to emphasize is make sure that you have all the drivers downloaded ahead of time especially the network drivers since neither the wireless or Ethernet drivers got installed by default..  You can dump them on the flash disk as they don’t take up that much space and it will expedite the process of getting your system fully operational.

Here are some of the links that I relied on heavily to get me through the conversion:

For a resource constrained machine it certainly doesn’t seem sluggish.  The Vista experience, while not in the same league as my tricked out M1530, isn’t terrible and the solid state drive really makes a world of difference in these netbooks as they’re silent, low power, and fast as hell.

In my eyes the one serious drawback of these Mini9’s is the size and layout of the keyboard and touchpad.  The keyboard on a 9” laptop obviously has to be compressed and tradeoffs conceded but for someone like me who has stubby fat fingers this becomes more of a problem than I’d thought.  The other major issue I have is with the touchpad.  The touch sensitive surface that comprises the touchpad runs all the way to the space bar without any buffer or separation.  I continuously hit the touchpad surface causing the cursor or focus to jump to where the pointer is.  The keyboard issue by itself isn’t a terrible problem, I’m sure I’ll acclimate, but when you compound that with the lack of a buffer between the space bar the whole keyboard/mouse experience becomes extremely frustrating.    I bought a bluetooth notebook mouse and can turn off the touchpad so I am not impacted by inadvertent contact but that defeats the purpose of this device for me since I’d much rather have my 15” laptop if I am going to have to be sitting at a table/desk/tray to use the external mouse for any length of time.

Overall the Mini9 isn’t bad for couch surfing and travel but this isn’t a machine that I could use for extended periods of time in the situations where a netbook would be convenient.  While I like the concept and the price point that this class of portable addresses I can’t strongly endorse the mini9 due to the layout and design of the keyboard and touchpad.