Big Data Analytics

Rising demand for faster big data analytics apps is driving the adoption of Apache Spark, a core technology for modernizing data warehouses.

A full 91% cited performance as the top attribute, followed by ease of programming, at 77%; ease of deployment, at 71%; and advanced analytics, at 64%.

Real-time streaming (52%) and DataFrames (47%) are additional important features. A full 64% of respondents are running the latest version of Apache Spark.

Business intelligence was ranked highest, at 68%, followed by data warehousing (52%), recommendation systems (44%) and log processing (40%).

Nearly half (48%) deploy Apache Spark in stand-alone mode followed by YARN running on Hadoop, at 40%. Just over half of respondents (51%) are running Apache Spark on public cloud.

A full 75% are running Apache Spark on a Linux/Unix platform, while 47% are running on OS X. The fastest growing platform is Windows (23%), which grew 17 percentage points from 2014.

Nearly seven in 10 (69%) are using Spark SQL, followed by DataFrames (62%), MLib + GraphX (58%) and streaming (58%). Three-quarters (75%) are using two or more Apache Spark components.

At 71%, Scala is the most widely used programing language, followed by Python (58%), SQL (36%) and Java (31%). Python use is up 49% year-over-year.

Nearly a quarter (24%) cite SQL, followed by DataFrames and advanced analytics (at 15% each), and streaming (14%). SQL use grew 380% from 2014.