Analyzing (small) log file
I got a log file with some request trace data from a customer, and I want to have a better view about what is actually going on. The log file size was 35MB, so that made things very easy.
I know about Log Parser, but to be honest, it would take more time to learn to use that effectively than to write my own tool for a single use case.
The first thing I needed to do is actually get the file into a format that I could work with:
var file = @"C:\Users\Ayende\Downloads\u_ex140904\u_ex140904.log";
var parser = new TextFieldParser(file)
{
CommentTokens = new[] {"#"},
Delimiters = new[] {" "},
HasFieldsEnclosedInQuotes = false,
TextFieldType = FieldType.Delimited,
TrimWhiteSpace = false,
};
////fields
// "date", "time", "s-ip", "cs-method", "cs-uri-stem", "cs-uri-query", "s-port", "cs-username", "c-ip",
// "cs(User-Agent)", "sc-status", "sc-substatus", "sc-win32-status", "time-taken"
var entries = new List<LogEntry>();
while (parser.EndOfData == false)
{
var values = parser.ReadFields();
if (values == null)
break;
var entry = new LogEntry
{
Date = DateTime.Parse(values[0]),
Time = TimeSpan.Parse(values[1]),
ServerIp = values[2],
Method = values[3],
Uri = values[4],
Query = values[5],
Port = int.Parse(values[6]),
UserName = values[7],
ClientIp = values[8],
UserAgent = values[9],
Status = int.Parse(values[10]),
SubStatus = int.Parse(values[11]),
Win32Status = int.Parse(values[12]),
TimeTaken = int.Parse(values[13])
};
entries.Add(entry);
}
Since I want to run many queries, I just serialized the output to a binary file, to save the parsing cost next time. But the binary file (BinaryFormatter) was actually 41MB is size, and while parsing the file took 5.5 seconds for text parsing, the binary load process took 6.7 seconds.
After that, I can run queries like this:
var q = from entry in entries
where entry.TimeTaken > 10
group entry by new {entry.Uri}
into g
where g.Count() > 2
select new
{
g.Key.Uri,
Avg = g.Average(e => e.TimeTaken)
}
into r
orderby r.Avg descending
select r;
And start digging into what the data is telling me.
Comments
The Linqpad CSV driver is brilliant for this kind of stuff.
https://github.com/dobrou/CsvLINQPadDriver
IMHO this is where F# Type providers come in really useful, doing this is trivial. see Deedle (http://bluemountaincapital.github.io/Deedle/) or FSharp.Data (http://fsharp.github.io/FSharp.Data/library/CsvProvider.html) for examples.
BinaryFormatter is not a good serializer; it's slow and extremely fragile.
http://stackoverflow.com/q/703073/161336 https://code.google.com/p/protobuf-net/wiki/Performance
why do you use a serialized binary file and not a raven/sqlite db file?
Marco, One off work, the quickest solution is what I'll take.
Looks interesting for parsing Trace logs, but it doesn't compile for me. TextFieldParser, FieldType, LogEntry all are unrecognized; and I also get, "Delegate 'System.Func<LogEntry,int,bool>' does not take 1 arguments" on the "where entry.TimeTaken > 10" line.
Clay, TextFieldParser, FieldType- http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser(v=vs.110).aspx
LogEntry is a custom class
Try LogParser Studio. http://blogs.technet.com/b/exchange/archive/2013/06/17/log-parser-studio-2-2-is-now-available.aspx
LogParser studio is what I've successfully used for such scenarios many times.
It already has a template for IIS logs and you use SQL-like syntax (with some very useful functions) to process the data, so it really should be faster than writing your own parser.
Comment preview