In March this year Intel paid $US740m for an 18 percent
stake in Hadoop software company Cloudera becoming its largest strategic
shareholder. According to Cloudera founder and CTO, Amr Awadallah, it's a sign that
Intel believes the Cloudera approach could revolutionise the data centre and
the way large organisations manage their massive data bases.
Intel's investment was made at the time Cloudera raised
$US160m in a round of venture funding but it seems that Intel bought
most of its shares on the market, so Cloudera didn't get the money.
That however does not diminish the significance of the move,
the largest single investment in data centre technology in Intel's history,
according to Intel. Intel
said: "The deal will join Cloudera's leading enterprise analytic data
management software powered by Apache Hadoop with the leading data centre architecture
based on Intel Xeon technology. The goal is acceleration of customer adoption
of big data solutions, making it easier for companies of all sizes to obtain
increased business value from data by deploying open source Apache Hadoop
solutions."
Awadallah likens the move to a number of other landmark initiatives
over the years that have helped Intel become the dominant chipmaker.
"Intel historically has been very clever in surveying their customers and
seeing which new workflows are growing very quickly within data centres,"
he told me in an interview earlier this week. "So Intel now has a 96
percent market share in the data centre: Ninety six percent of the servers run
Intel CPUs."
He identifies the earlier landmarks as being: The 'Wintel'
alliance some 20 years ago; the alliance with RedHat about 15 years ago; the
backing of virtualisation leader VMware about a decade ago. "Intel saw
Cloudera growing very quickly and wanted to make sure that whatever they did
was optimised for Cloudera," Awadallah said.
So just what is it about Cloudera and its Hadoop
distribution that has merited Intel making the company the target of its
largest ever investment in data centre technology?
It all centres on Cloudera's concept of the Enterprise Data
Hub. According to Awadallah, a traditional data centre architecture comprises
dedicated, and costly storage systems and processors connected by a network.
Data is by and large dedicated to each application and, where necessary,
replicated to serve different applications.
Cloudera's Enterprise Data Hub is made up of low-cost,
commodity 'pizza box' servers containing both CPU and disc, and open source
software. This software manages a single pool of data and serves data up to
applications as needed. For redundancy data is replicated across multiple pizza
boxes and the management software automatically isolates any device that fails.
According to this Cloudera
white paper on the EDH, "An enterprise data hub (EDH) is one place to
store all data, for as long as desired or required, in its original fidelity;
integrated with existing infrastructure and tools; with the flexibility to run
a variety of enterprise workloads—including batch processing, interactive SQL,
enterprise search, and advanced analytics—together with the robust security,
governance, data protection, and management that enterprises require. With an
enterprise data hub, leading organisations are changing the way they think
about data, transforming it from a cost to an asset."
Awadallah contrasts the EDH approach with a data warehouse
built on relational database technology and claims that the costs of data
storage are 30 to 100 times lower. "With relational systems you are
looking at average cost of $30,000 per terabyte, $30 million for one petabyte.
The cost with the Enterprise Data Hub is anywhere from $300,000 per petabyte to
$1 million per petabyte. We are 30 to 100 times cheaper."
This is only part of the story, according to Awadallah. The
power of Hadoop is that it enables analysis of and insights into both
structured and unstructured data. The lower cost means that much more data can
be simultaneously available for analysis than with a data warehouse, where
costs dictate that old and little used data must be archived. This in turn
enables organisations to completely re-engineer the way they operate and
enables them to extract many more valuable insights from their data.
This Awadallah says, represents "the highest level of maturity
of the hub vision," and is "when you have achieved enlightenment as
an organisation, what we refer to as converged analytics.
This is where you have a single place with all your data and
your workloads all come to the data, as opposed to the data going to the
workloads. This is typically a four-year journey for some organisations it can
be a ten-year journey and for some organisations it can be a one-year journey.
An organisation's path to enlightenment with the Enterprise
Hub, Awadallah says, must pass a hiatus where the technology moves from being
the domain of IT to being in the domain of users within the business, how it
manages that transition is yet another example of the challenges organisations
face in 'becoming digital', but that's a story for another day.