Spark Cast String Type to Integer Type (int)
In Spark SQL, to convert/cast string type to integer type (int), you can use cast() function of Column class, this function can be used withColumn(), select(), selectexpr() and Can do with SQL expressions. This function argument takes a string that represents the type you wanted to convert or any type that is a subclass of the datatype.
1.Using select() Example
// Using select
df.select(col("salary").cast("int").as("salary")).printSchema()
//Using selectExpr()
df.selectExpr("cast(salary as int) salary").printSchema()
2. Setup a DataFrame
val spark = SparkSession.builder
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
val simpleData = Seq(("James",34,"true","M","3000.6089"),
("Michael",33,"true","F","3300.8067"),
("Robert",37,"false","M","5000.5034")
)
import spark.implicits._
val df = simpleData.toDF("firstname","age","isGraduated","gender","salary")
df.printSchema()
Outputs below schema. Note that column salary
is a string type.
3. Using Spark SQL – Cast String to Integer Type
Spark SQL expression provides data type functions for casting and we can’t use cast()
function. Below INT(string column name)
is used to convert to Integer Type.
df.createOrReplaceTempView("CastExample")
df4=spark.sql("SELECT firstname,age,isGraduated,INT(salary) as salary from CastExample")
4. withColumn() – Cast String to Integer Type
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.IntegerType
// Convert String to Integer Type
val df2= df.withColumn("salary",col("salary").cast(IntegerType))
df2.printSchema()
df2.sho
Read Also – Most Important Kubectl commands You Must Need to Know
Alternatively, you can also change the data type using below.
df.withColumn("salary",col("salary").cast("int"))
df.withC
In this simple Spark article, We have completed how to convert the DataFrame column from String Type to Integer Type using cast() function and applying it with withColumn(), select(), selectExpr() and finally Spark SQL table.
Hope you like this blog….
- Jacoco - January 3, 2025
- Dependency Track – End To End CI/CD Pipeline - November 29, 2024
- Dependency-track Jenkins Integration - November 27, 2024