Conversion from one file format to other in Apache Spark
Read --> Write | V Text file sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username retail_dba --password cloudera \ --table orders \ --target-dir /user/cloudera/ReadDiffFileFormat/text \ --as-textfile Read: scala> val textFile = sc.textFile("/user/cloudera/ReadDiffFileFormat/text") textFile: org.apache.spark.rdd.RDD[String] = /user/cloudera/ReadDiffFileFormat/text MapPartitionsRDD[279] at textFile at <console>:30 Text file textFile.saveAsTextFile("/user/cloudera/ReadDiffFileFormat/textout") Using compression textFile.saveAsTextFile("/user/cloudera/ReadDiffFileFormat/text/textoutput/compressed", classOf[org.apache.hadoop.io.compress.BZip2Codec]) Sequence file For sequence file we need to have a key. val textMap = textFile.map(e => (e.split(",")(0).toInt, e)) textMap.saveAsSequenceFile("/user/cloudera/ReadDiffF...