[personal profile] abortrephrase
Here's a quick example of why I like Splunk so much:

We have a whole heap of websites. Some unknown number of them are no longer used -- either they never went live, or they've been replaced -- but we don't know which ones. And of course one doesn't want to accidentally nuke an active site being used by clients.

So.

The web hosting boxes are running Splunk in forwarder mode, watching all of /var/local/apache/logs. This tree contains one directory per website, with each directory containing all the Apache logs for that site.

Doing the naive thing and just having Splunk do the default thing with that directory didn't work out so well. It was catching all the logs in there, including useless things we don't care about like mod_jk.log, and it wasn't correctly classifying all the access logs so not all of them were being correctly parsed and presenting useful variables like "clientip" in the search tool.

Change inputs.conf to read thus:

[monitor:///var/local/apache/logs]
disabled = false
followTail = 0
sourcetype = access_common
whitelist = .*access_log$


Then restarting the clients, flushing the index on the master, and now only the current access_log for each website is being watched, and is always parsed as access_common.

This will have to be left running for a week or two to get enough data to be sure, but once the index is properly seeded the following search command will produce a lovely table suitable for finding the inactive sites:

sourcetype="access_common" | stats dc(clientip) by source


Could this have been done using a different tool, or scripted from scratch? Absolutely! But Splunk is much more generic, the skills one picks up by working with it for this specific problem are widely-applicable to any other task where log-analysis is the answer.

I love this tool, and wish it were easier to sell to management. Unfortunately the price-tag is pretty high, and it's not something they use personally so attempts to get them to want to pay for it have not been entirely successful. I suspect there are a lot of companies where the free 500MB/day version is getting significant use by people like me, with the vague hope that eventually it'll become so obvious to those who control the budgets that it's incredibly useful and efficiency-boosting that they'll realise they really ought to just pay up and do it right.

(no subject)

Date: 2011-03-24 08:22 am (UTC)
rbarclay: (Default)
From: [personal profile] rbarclay
I tried Splunk 2-3 years ago, and never saw any point. Granted, I already started to hate it when I tried to find out via the webshit what it's actually for, but that was rectifiable via a generic google search.
All the quadro-gazillion options weren't as easily found out, though, and after an hour or two I gave up in disgust, and went back to swatch.

(no subject)

Date: 2011-03-24 08:34 am (UTC)
ideological_cuddle: (Default)
From: [personal profile] ideological_cuddle
I've been using it for about three years now. The install and setup is about as easy as it could possibly be (install the package, run "/opt/splunk/bin/splunk start", poke at the web UI). Most of the time the automatic parsing detection Just Works, and it knows how to parse a *lot* of different logfile types and extract Useful Information with minimal fiddling.

And the web UI is very "discoverable".

So I'm really not seeing how it's difficult to figure out.

In the same boat

Date: 2011-03-24 03:32 pm (UTC)
From: (Anonymous)
I am also using the 500MB version and have turned a few of our sysadmins into converts. Unfortunately, management won't convert. I am going to get it in here one way or another. Good luck

Kevin
http://google.com/profiles/kefoster

(no subject)

Date: 2011-03-24 11:51 pm (UTC)
reddragdiva: (Default)
From: [personal profile] reddragdiva
Considering that at present I answer questions about access by constructing a pipe starting with "grep" on the Apache logs by hand, and more complicated stuff is done by giving our customers' intinmate details to Google Analytics ... this is most tempting.

(no subject)

Date: 2011-03-24 11:57 pm (UTC)
ideological_cuddle: (Default)
From: [personal profile] ideological_cuddle
One really obvious and immediately-handy trick is that if you have multiple web servers you can very easily pull up all the requests across the farm from a specific client address, making it simple to trace the progress of a single session.

But bear in mind that you'll get locked out of searches if you routinely exceed 500MB/day of data unless you're paying for the Enterprise version.

Setup is a complete doddle, so there's very little reason not to play around with it. The documentation is pretty good, too.

(no subject)

Date: 2011-03-25 08:37 am (UTC)
rbarclay: (Default)
From: [personal profile] rbarclay
Not one of the options in the web-UI (and I distinctly remember finding that it even had one by typing 'netstat -pant', not from the docs) was self-explanatory for me. I thought "I read that it's about logs, now how do I throw them at the thing?" and not finding an obvious answer wherever I looked - but a gazillion docs about "queries". Great!

Probably it's like Perl - for some people just everything falls into place naturally, for others everything's just completely and utterly alien.

(no subject)

Date: 2011-03-25 09:01 am (UTC)
ideological_cuddle: (Default)
From: [personal profile] ideological_cuddle
So you didn't see the bit where it tells you, after you start it, that the web UI is at such-and-such a URL?

Maybe it didn't do that with the version you looked at. I didn't start using it until 3.0 came along, which was a couple of years ago now. By that point the documentation was very clear and straightforward. The only spot where I could see some people having some difficulty was with getting remote syslog data into it, because to do that without buying a license you need to set up syslog-ng and have that push data into a FIFO for Splunk to read.

You add data by going into the settings section and poking at "Inputs". You search on that data by typing stuff in the textarea. You refine the search by typing more stuff, or by clicking on things (e.g., an IP address) to get results that match that term.

Most of it is about querying the data you've stored. Getting data in is a really small part of the whole.

(no subject)

Date: 2011-03-25 09:19 am (UTC)
rbarclay: (Default)
From: [personal profile] rbarclay
It should've been pretty exactly 3 years since I poked at it. I read about it somewhere, went to the website, looked at/for "download", did that, installed it, went back to the website, looked for "quickstart" or something like that, and found nothing to help me figure out where to go from this point.
LOTS of docs on how to do advanced queries and stuff, but nothing at all that I could find (in 1-2 hours of poking) on how to throw the data I want to then massage into it in the first place - should I look in the web-UI, some config file buried in the filesystem, or what?

Which left me with a feeling exactly like the camel book - I heard about Perl, bought the book, and after the first couple pages was 900% more in the dark about it than before. (In the case of Perl it's because I know about variables, pointers, functions, double-pointers and all that from C, but didn't even know how to spell "scalar" (and it went steeply downhill from there - "that's how you deal with a hashref" - uh, and what the fuck is that?). The book wouldn't have been any less useful for me if it was written in Neanderthal. Completely different vocabular than anything I encountered before, no explanation whatsoever for the (to me!) confusing things.)

Profile

Abort, Rephrase, Ignore?

October 2011

S M T W T F S
       1
2 345678
9101112131415
16171819202122
23242526272829
3031     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags