Getting error while creating Spark Dataframe using RDD case class

While creating the Spark data frame using the RDD case class getting some error and then provide some resolution. I am writing some code for Spark programming using Scala. After that getting some error on this code.




Below is code:

var dataRdd = sc.textFile ("file://home/sreekanth/practice/aadhar.csv")
case class app(username: String, DOB: String, email: String, Phonenumber: Long, city: String, state: String, Zipcode: String)
var dataRdd1 = dataRdd .flatMap( x => x.split(" , "))
var dataRdd2 = dataRdd1. map ( y => app(y(0). to String, y(1). toString,y(2). toString,y(3). toLong,y(4). toString,y(5). toString))
var appDataFrame = dataRdd2. toDF
appDataFrame.show(5)

While running the above code Spark shell getting below error

Error:

java.lang.StringIndexOutOfBoundsException: String index out of range: 6
at java.lang.String.charAt(String.java:658)
at scala.collection.immutable.StringOps$.apply$extension(StringOps.scala:67)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:429)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:47)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:430)




Solution:

df = spark.read.format("csv").option("header", "true").load(aadhar.csv)

 

var dataRdd2 = dataRdd1. map ( y => app(y(0) to String, y(1). toString,y(2). toString,y(3). toLong,y(4). toString,y(5). toString))

 

Summary: Here is we missed e(0).toString should be there in place of e(0)to String that why it showing StringIndexOutOfbounds exception

 

Leave a Reply

Your email address will not be published. Required fields are marked *