Apache Spark Interfaces — RDD’s, Dataframes & Datasets
Apache Spark in a nutshell,is an open-source powerful distributed querying and processing engine originally developed by Matei Zaharia as a part of his PhD thesis while at UC Berkeley. Since data is stored in-memory it provides the flexibility and extensibility of MapReduce but at significantly higher speeds.
The Spark APIs to read, transform and aggregate data as well other actions are accessible in Java, Scala, Python, R and SQL. In terms of deployment Apache Spark can be deployed in standalone mode, over Yarn or Apache Mesos either…