Getting error while creating Spark Dataframe using RDD case class

While creating the Spark data frame using the RDD case class getting some error and then provide some resolution. I am writing some code for Spark programming using Scala. After that getting some error on this code.

Below is code:

var dataRdd = sc.textFile ("file://home/sreekanth/practice/aadhar.csv")
case class app(username: String, DOB: String, email: String, Phonenumber: Long, city: String, state: String, Zipcode: String)
var dataRdd1 = dataRdd .flatMap( x => x.split(" , "))
var dataRdd2 = dataRdd1. map ( y => app(y(0). to String, y(1). toString,y(2). toString,y(3). toLong,y(4). toString,y(5). toString))
var appDataFrame = dataRdd2. toDF

While running the above code Spark shell getting below error


java.lang.StringIndexOutOfBoundsException: String index out of range: 6
at java.lang.String.charAt(
at scala.collection.immutable.StringOps$.apply$extension(StringOps.scala:67)
at scala.collection.Iterator$$anon$
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:430)


df ="csv").option("header", "true").load(aadhar.csv)


var dataRdd2 = dataRdd1. map ( y => app(y(0) to String, y(1). toString,y(2). toString,y(3). toLong,y(4). toString,y(5). toString))


Summary: Here is we missed e(0).toString should be there in place of e(0)to String that why it showing StringIndexOutOfbounds exception


Leave a Reply

Your email address will not be published. Required fields are marked *