Spark: Read HDFS text file and filter records based on certain criteria.
Spark - Exercise 1: Read HDFS text file and filter records based on certain criteria. Problem statement: Find all the Person records having age greater than 30 years. 1. Create a file on local file system with name Person.txt >vi Person.txt. 2. Add below records. Name, Age Vinayak, 35 Nilesh, 37 Raju, 30 Karthik, 28 Shreshta,1 Siddhish, 2 3. Create a directory on HDFS file system. hadoop fs -mkdir /user/spark/PersonExample/ 4. Put Person.txt file onto HDFS system hadoop fs -put Person.txt /user/spark/PersonExample/ 5. Check whether the file is been uploaded? [root@localhost PersonExample]# hadoop fs -ls /user/spark/PersonExample/ Found 1 items -rw-r--r-- 1 root supergroup 77 2017-12-17 14:34 /user/spark/PersonExample/Person.txt 6. Start spark shell using command spark-shell $>spark-shell 7. Load the file using spark context. scala> var persons = sc.textFile("/user/spark/PersonExample/P