While creating the Spark data frame using the RDD case class getting some error and then provide some resolution. I am writing some code for Spark programming using Scala. After that getting some error on this code.
Below is code:
var dataRdd = sc.textFile ("file://home/sreekanth/practice/aadhar.csv") case class app(username: String, DOB: String, email: String, Phonenumber: Long, city: String, state: String, Zipcode: String) var dataRdd1 = dataRdd .flatMap( x => x.split(" , ")) var dataRdd2 = dataRdd1. map ( y => app(y(0). to String, y(1). toString,y(2). toString,y(3). toLong,y(4). toString,y(5). toString)) var appDataFrame = dataRdd2. toDF appDataFrame.show(5)
While running the above code Spark shell getting below error
Error:
java.lang.StringIndexOutOfBoundsException: String index out of range: 6
at java.lang.String.charAt(String.java:658)
at scala.collection.immutable.StringOps$.apply$extension(StringOps.scala:67)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:429)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:47)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:430)
Solution:
df = spark.read.format("csv").option("header", "true").load(aadhar.csv)
var dataRdd2 = dataRdd1. map ( y => app(y(0) to String, y(1). toString,y(2). toString,y(3). toLong,y(4). toString,y(5). toString))
Summary: Here is we missed e(0).toString should be there in place of e(0)to String that why it showing StringIndexOutOfbounds exception