Here's a quick example of why I like Splunk so much:

We have a whole heap of websites. Some unknown number of them are no longer used -- either they never went live, or they've been replaced -- but we don't know which ones. And of course one doesn't want to accidentally nuke an active site being used by clients.


The web hosting boxes are running Splunk in forwarder mode, watching all of /var/local/apache/logs. This tree contains one directory per website, with each directory containing all the Apache logs for that site.

Doing the naive thing and just having Splunk do the default thing with that directory didn't work out so well. It was catching all the logs in there, including useless things we don't care about like mod_jk.log, and it wasn't correctly classifying all the access logs so not all of them were being correctly parsed and presenting useful variables like "clientip" in the search tool.

Change inputs.conf to read thus:

disabled = false
followTail = 0
sourcetype = access_common
whitelist = .*access_log$

Then restarting the clients, flushing the index on the master, and now only the current access_log for each website is being watched, and is always parsed as access_common.

This will have to be left running for a week or two to get enough data to be sure, but once the index is properly seeded the following search command will produce a lovely table suitable for finding the inactive sites:

sourcetype="access_common" | stats dc(clientip) by source

Could this have been done using a different tool, or scripted from scratch? Absolutely! But Splunk is much more generic, the skills one picks up by working with it for this specific problem are widely-applicable to any other task where log-analysis is the answer.

I love this tool, and wish it were easier to sell to management. Unfortunately the price-tag is pretty high, and it's not something they use personally so attempts to get them to want to pay for it have not been entirely successful. I suspect there are a lot of companies where the free 500MB/day version is getting significant use by people like me, with the vague hope that eventually it'll become so obvious to those who control the budgets that it's incredibly useful and efficiency-boosting that they'll realise they really ought to just pay up and do it right.


Abort, Rephrase, Ignore?

October 2011

2 345678


RSS Atom

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags