How to use map, filter, groupByKey.
How to use map, filter, groupByKey Problem statement: Get daily revenue by product with CLOSED and COMPLETED orders. Save the output to a text, CSV, TSV and parguet file. The example is using spark-shell for executing all the steps. 1. Read all the files from local system, The same can be read from HDFS as well. For simplicity of the example I am reading it from local file system. val orders = sc.textFile("file:///home/cloudera/workspace/scala/rajudata/data-master/retail_db/orders") val order_items = sc.textFile("file:///home/cloudera/workspace/scala/rajudata/data-master/retail_db/order_items") val products = sc.textFile("file:///home/cloudera/workspace/scala/rajudata/data-master/retail_db/products") All the data files can be found at GitHub. https://github.com/dgadiraju/data/tree/master/retail_db 2. Check the content of each RDD. This will help to understand the data structure and position of each element. orders.take(3).f...