Wednesday, 9 October 2013

When big data meets bio data


I was invited to spend a few days last week at the users conference of data analytics company Splunk. Here’s my thoughts on that event.

Much of the content was way too technical for me, but it was obvious from the rapturous reception with which the announcement  of version 6 of the Splunk Enterprise software was greeted, and from talking to representatives of a number of Splunk’s partner companies in the accompanying expo, that Splunk has a lot of enthusiastic followers who are gaining significant advantage from its software. More than a third of the company’s 6000 plus customers are using Splunk to help detect security attacks on their networks.

I can’t profess to totally understand how Splunk works, but here’s my potted version. It ingests any kind of text data from disparate sources and combines all this in a proprietary file structure. The Splunk software then enables rapid queries to be made across these data, even where volumes are massive.

So for example, in a security application log files could be ingested from numerous bits of different network gear and if an interesting event were noted on one log it would be possible to identify events that occurred at the same time in multiple components of the network.

One of the major enhancements of version 6, on my understanding, is that it enables people with much less expertise in the intricacies of Splunk to implement meaningful queries. This is important because it means that those who understand the significance of information can use Splunk rather than having to rely on people with expertise in the software, who may not fully understand the implications of what they are finding.

As Guido Schroeder, senior vice president of products at Splunk put it: “Enterprise 6 gives technical users the ability to define the meaningful relationships in the underlying data, enabling business users and analysts to easily manipulate and visualise data in a simple drag-and-drop interface.”

This was reinforced by a comment from Eric Hanselman, chief analyst at 451 Research, who said: “Business users want and need to use software that makes it easier to dig deeper into analytic tasks without the help of IT or knowledge of coding and query languages.

“Those who’ve been using the Splunk product for years will benefit from usability and management enhancements that will make their Splunk lives easier and more productive. By providing machine data analytics to a new set of users and an improved user experience, Splunk Enterprise 6 has value for both audiences.”

As I said earlier, Splunk can be used with any kind of data and one of most interesting, and accessible presentations at the conference came from Splunk senior software developer, Ed Hunsinger, who - just for fun - has been ‘Splunking himself’ for the past two years.

In other words, he has been trying to generate electronic data about as many aspects of his life as possible and feed this into Splunk. He’s used brainwave monitors, heart rate monitors, location data and much more. For a number of everyday activities for which electronic data is not available he's developed an iPhone app so that he can just touch the screen to generate an event, for example every time he sneezes.

There is already a global movement for self monitoring, Quantified Self (www.quantifiedself.org). It has over 100 Meetup groups around the world and on each in Sydney and Melbourne with over 300 members between them. It was founded by two editors from Wired Magazine, Gary Wolf and Kevin Kelly and you’ll find a detailed explanation of it in this 2009 Wired article by Wolf.

However the application of ‘big data’ or data analytics to self monitoring takes it to a whole new level.

Most of Hunsinger’s presentation focussed on the practicalities of his self splunking: he dwelt only briefly on the analysis made possible by the exercise,. He was, for example, able to link a spike in his heart rate to the fact that he was go-kart racing at the time.

That might seem blindingly obvious, but it set me thinking about the potential for creating a whole new industry in self monitoring, one where there are devices and tools available conforming to some sort of standard that all work together with some sort of software like Splunk that enables people to gain a better understanding of what is happening with their bodies, and their life, by analysing the data gathered.

Such a system could help identify food and other allergies, perhaps provide early warnings of health problems, and generally enable them to better understand how they are living their lives.


No comments:

Post a Comment