Zivaro Blog

Splunk with Hadoop: Three Ways to Play

With Splunk’s recent announcement of Hunk (Hadoop and Splunk), lots of customers are wondering exactly how these two leading big data platforms can work together.  From a Splunk perspective, there are actually three ways to integrate with Hadoop.  In this post I’ll quickly discuss all three and give you plenty of links to additional resources […]

With Splunk’s recent announcement of Hunk (Hadoop and Splunk), lots of customers are wondering exactly how these two leading big data platforms can work together.  From a Splunk perspective, there are actually three ways to integrate with Hadoop.  In this post I’ll quickly discuss all three and give you plenty of links to additional resources if you want to take a deeper dive.

Splunk App for HadoopOps

The first option is the Splunk App for HadoopOps.  Available on Splunk Apps (formerly Splunkbase), this application template allows IT departments to gain insight on how their Hadoop clusters are performing.  In this scenario, Splunk is used to index the machine data generated by Hadoop and the app template speeds the process of monitoring the operation of the Hadoop cluster.  Users are able to visualize Hadoop cluster resources and gain real-time performance metrics.  All Hadoop services, hosts, jobs, and users can be tracked allowing for searching, alerting, reporting, visualization, and correlation of Hadoop-generated event data.  When using this app template, the data stored in Splunk and Hadoop is not actually shared between the systems in any way.  Rather, Splunk is simply used as a monitoring and perhaps a management tool for Hadoop.  Although not a data integration, this approach has proven to be extremely valuable to many organizations tasked with operating production Hadoop clusters.

Splunk Hadoop Connect

The next option is Splunk Hadoop Connect which provides two-way integration between Splunk and Hadoop allowing data stored in one platform to move to the other and vice versa.  Customers find this capability useful as a way to first store machine data in Splunk where it can be analyzed quickly and easily.  Eventually the data is archived in Hadoop where it can still be queried in batch if necessary.  Needless to say there are many other use cases and drivers for this type of bi-directional transfer capability.  One obvious shortcoming in this scenario is that Splunk is still not able to perform analytics on data while it is stored in Hadoop.  For that to happen, the data would first need to be moved from Hadoop (back) into Splunk.  Enter Hunk!

Hunk:  Splunk Analytics for Hadoop

The name Hunk is a combination of Hadoop and Splunk.  Currently in beta, Hunk will allow users to apply Splunk’s rich analytics capabilities directly to data stored in Hadoop.  This will make Hadoop’s data more usable and valuable to users by allowing them to quickly search, alert, report, and visualize data.  This direct integration will be enabled by patent pending technology from Splunk called Virtual Indexing which will make Hadoop data appear to Splunk as a traditional Splunk index.  Clearly Hunk will be a powerful data integration between the two platforms unlocking a wealth of big data value which is currently untapped.  Splunk has created a nice video explaining what Hunk is all about.  

Questions?  Drop your GTRI account rep a line or reach out to me directly.