Getting Data In: The Overlooked Art of Cooking Up Splunk

Here at GTRI, when we talk about Splunk we divide it up into two major tasks – getting data into Splunk (i.e., data ingest) and getting information out of Splunk (i.e., search, alerts, dashboards). These tasks are equally important to getting value from Splunk, and, after all, that’s the whole point of the exercise in […]

In this blog post, I will cover data ingest, or getting data into Splunk. In my experience, data ingest is often overlooked in the rush to make things look pretty as the information comes out. Pretty dashboards turn heads, right? But, one of my favorite IT truisms is GIGO: garbage in, garbage out. There is only so much you can do to clean up a bad ingest at search time.

Take a few minutes, or a few hours if the data is complex, and get the data in right. Extract the fields that make sense and do it as the admin so everyone can enjoy the fun. And sure, there are apps for that right? Well, yes and no. Some of the apps out there are great and some not so much. Either way, you need to check the app and make sure thee extractions work as advertised.

The process I use looks something like this:

1. Identify the data and the use case.

This basic first step is where everything begins. For example:

Use case: Track failed authentication on Cisco switches or routers
Data needed: Authentication information from the routers and switches

2. Determine ingest method.

For our use case, we’ll use a syslog listener on our Splunk server, although the better answer is syslog-ng and a Splunk universal forwarder (UF) on a dedicated utility server.

3. Is there an app for that?

Yes, no? It doesn’t matter, because you need to check the data anyway. Incidentally, there is an app for the Cisco data type and it’s fairly good if you match up to the predefined sourcetype.

4. Ingest data into a test index.

Hopefully you have Splunk installed on dev/test instance on some virtual machine that you can test against without impacting your production Splunk system. If not, at a minimum, create a test index you can clean up later. As you probably know, the only way to truly get rid of something in Splunk is to delete the whole index, so this really does matter.

5. Check your data.

Everything worked great the first time, right? Awesome! You’re almost done. Here’s where you check to make sure the fields look right. I typically check the user field and the action field at a minimum. If all the fields don’t come through, there are a number of resources that can help you extract them. I suggest taking a look at this Splunk Live! presentation. It’s very helpful and much more detailed than this overview.

6. Convert the source type to a permanent index.

Here’s a tip: Don’t use the “main” index. In fact, I recommend creating more indexes rather than fewer. There are several reasons for this rule of thumb, but the bottom line is that there are no hard limits to the number of indexes Splunk can handle, and creating specific indexes can help you with organization, access control and retention.

7. Search and make as many pretty dashboards as you can imagine.

Go nuts! A picture is worth a thousand words and gigabytes of log data, so have fun!

This is not meant to be an exhaustive treatise on the how to of data ingest. I find that getting organized and having a process is the difference between boiling the ocean and actually getting something done. Hopefully this high level process helps you get started with getting the data into Splunk, in the right way.

If you have questions or get stuck, feel free to leave a comment on this post or contact the GTRI Splunk team for assistance. Happy Splunking!

Micah Montgomery is a Solution Architect for Big Data, part of GTRI’s Professional Services team.