SparkSummit East wrapup: Big demand for evolving technology

#SparkSummit East wrapup: Big demand for evolving technology

The Apache Spark analytics engine is still immature, and this year’s Spark Summit East was more of a technology than a business conference. But Spark is also generating huge enthusiasm, and not just from techies. IT organizations and even non-technical business people, as well as big data pundits like Wikibon’s George Gilbert (@GGilbert41) see Spark as a game-changer that will become the preferred engine for real time analysis of streaming unstructured, semi-structured and structured data.

Spark has two major advantages over Apache Hadoop in this area. First, it is designed from ground up to operate in near real-time, whereas Hadoop is fundamentally a batch database engine.

Second, Spark is simpler. Hadoop is not an application but rather an ecosystem of multiple pieces, which are constantly developing and in some cases being replaced, Gilbert said. Building an operational stack from these still-immature technologies is challenging at best. Spark, in contrast, is a single application designed to do a single thing: near-real-time streaming data analysis. If users want to do something more complicated, like manage large amounts of complex data, need Hadoop or another large database.

However, Spark is immature enought that at least one presenter in the Spark Summit general session put actual lines of code on the screen, something that Gilbert referenced several times on #theCUBE during its two days of wall-to-wall coverage.

The SiliconAngle/Wikibon team did manage to snag representatives from three early adopter companies to fill out a user panel that closed the first day of coverage on #theCUBE. Two of those, security companies, Terbium Labs LLC and White Ops Inc., are still early in their implementations, though, leaving DataXu, Inc. as the only company the team interviewed that has real experience using Spark. All three presented strong use cases for the technology, however, and the half-hour session is well worth watching.

The two days of coverage were dominated by big data startups, starting with Databricks Inc., the company that is productizing Apache Spark, along with executives from MapR Technologies Inc., which has announced a major technical educational program on Spark, Syncsort Inc., Hortonworks, Inc., and Qubole, Inc.

Gilbert and Vellante also discussed the preliminary results of an intensive two-month study Gilbert just finished (see video below) based on interviews with Wikibon community members. Among his conclusions are that Spark’s role in big data is as a more unified analytic application platform that avoids the need to pull data from multiple silos. It eliminates batch handoffs between applications and operates fast enough that it can be used, for instance, to stop suspicious credit card transactions in the less than 100 milliseconds a bank has before it must accept or decline a transaction. These are major reasons that many companies, including big banks, are interested in Apache Spark despite its immaturity.

The full set of interviews from theCUBE, plus the Spark Summit East keynotes, are available online on YouTube and many of the individual sessions have been summarized on SiliconAngle. The full CrowdChat is also available and, while incomplete, can provide a guide to which videos might be most rewarding to watch, depending on your interests.

Gilbert will soon publish the full conclusions from his latest study on Wikibon. Together, these content assets provide a rich set of resources for technical and business people who want to educate themselves on Spark’s technology and use cases.

[“Source-siliconangle”]