Apache Spark saveAsTextFile error

Below article was copied from Solai Murugan's blog. All credits goes to him for the fine work.

Copied it over so that I'd know where to find it in the future should I ever forget about it.

Error

:19: error: value saveAsTextFile is not a member of Array[(String, Int)] arr.saveAsTextFile("hdfs://localhost:9000/sparkhadoop/sp1") 
Step to reproduce  
val file = sc.textFile("hdfs://master:9000/sparkdata/file2.txt")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
val arr = counts.collect()
arr.saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1") 

Solution 
Error caused on the bolted line above. Its due to storing the array value to the HDFS. In scala for Spark everything should be in RDD (Resilient Distributed datasets). so that scala variable can use Spark realated objects / methos. in this case just convert array into RDD ( replace bolded line by )
sc.makeRDD(arr).saveAsTextFile("hdfs://master:9000/sparkhadoop/sp1")

Comments

Popular posts from this blog

HIVE: Both Left and Right Aliases Encountered in Join

Assign select result to variable in Netezza stored procedure

Splitting value in Netezza using array_split