dababadook: Spark versus MapReduce: which way for enterprise IT?

Step aside, MapReduce. You have had a good run, but today’s big data developers are hungry for speed and simplicity. So, when it comes to picking a processing framework for new workloads to run on their Hadoop environments, they are increasingly favouring a nimble young rival called Spark.

At least that’s the message from big data suppliers who are throwing their weight behind Apache Spark, casting it as big data’s next big thing.

At the recent Spark Summit in San Francisco in June, Cloudera chief strategy officer Mike Olson spoke of the “breathtaking” growth of Spark and the profound shift in customer preference that he says his company, a Hadoop distributor, is witnessing as a result.

“Before very long, we expect that Spark will be the dominant general-purpose processing framework for Hadoop,” he said. “If you want a good, general-purpose engine these days, you’re choosing Apache Spark, not Apache MapReduce.”

Olson’s words were chosen carefully, in particular his use of the phrase “general purpose”. His point was that, while there is still plenty of room for special-purpose processing engines for Hadoop, such as Apache Solr for search or Cloudera Impala for SQL queries, the battle for supremacy among processing frameworks that developers can use to create a wide variety of analytic workloads (hence “general purpose”) is now a two-horse race – and it’s one that Spark is winning.

Quite simply, Spark niftily addresses a number of longstanding criticisms that developers have levelled at MapReduce – in particular, its high-latency, batch-mode response.

“It has been known for a very long time that MapReduce was a good workhorse for the world that Hadoop grew up in,” says Arun Murthy, founder and architect at Hortonworks.

He points out that the technology was created in the labs at Google to tackle a very specific use case: web search. More than a decade on, it has evolved – but perhaps not enough to match the enterprise appetite for big data applications.

“Its strength was that it was malleable enough to take on more use cases,” Murthy adds. “But it’s been known forever that there are use cases that MapReduce can solve, sure, but not in the most optimum manner. Just as MapReduce disrupted other technologies, it’s entirely natural that new technologies come along to disrupt or displace MapReduce.”

Pages

Spark versus MapReduce: which way for enterprise IT?

Speed and simplicity

Read more about Spark and MapReduce

Early days