Working with the Enron dataset

time to read 1 min | 85 words

Every now and then I need to do some work with text, and the Enron data set is one of the most well known corpuses.

I ended up writing the parsing code for that so many times that it isn’t even funny. Therefor, I decided to make my life easier and just post it somewhere that I can refer back to it.

This code simply unpack the Enron dataset into a .NET object, from where you can start processing the text in interesting ways.