[VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBufferSize to number #8684

boneanxs · 2025-02-07T10:49:56Z

What changes were proposed in this pull request?

Fix the issue introduced by #8045, we could meet the error if manually set spark.unsafe.sorter.spill.reader.buffer.size to value like 2m

org.apache.gluten.exception.GlutenException: Non-whitespace character found after end of conversion: "m"
	at org.apache.gluten.vectorized.PlanEvaluatorJniWrapper.nativeCreateKernelWithIterator(Native Method)
	at org.apache.gluten.vectorized.NativePlanEvaluator.createKernelWithBatchIterator(NativePlanEvaluator.java:68)
	at org.apache.gluten.backendsapi.velox.VeloxIteratorApi.genFirstStageIterator(VeloxIteratorApi.scala:204)
	at org.apache.gluten.execution.GlutenWholeStageColumnarRDD.$anonfun$compute$1(GlutenWholeStageColumnarRDD.scala:88)
	at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
	at org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
	at org.apache.gluten.execution.GlutenWholeStageColumnarRDD.compute(GlutenWholeStageColumnarRDD.scala:77)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)
	at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:106)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:380)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:344)

We should parse value to number in bytes before put to nativeConf

(Fixes: #ISSUE-ID)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

github-actions · 2025-02-07T10:50:16Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2025-02-07T10:50:28Z

Run Gluten Clickhouse CI on x86

boneanxs · 2025-02-07T10:52:13Z

@jinchengchenghh @FelixYBW Hey, could you please help review this? Thanks!

Yohahaha · 2025-02-07T12:43:19Z

incubator-gluten/shims/common/src/main/scala/org/apache/spark/sql/internal/GlutenConfigUtil.scala

Lines 21 to 45 in e009208

    
           object GlutenConfigUtil { 
        
             private def getConfString(configProvider: ConfigProvider, key: String, value: String): String = { 
        
               Option(ConfigEntry.findEntry(key)) 
        
                 .map { 
        
                   _.readFrom(configProvider) match { 
        
                     case o: Option[_] => o.map(_.toString).getOrElse(value) 
        
                     case null => value 
        
                     case v => v.toString 
        
                   } 
        
                 } 
        
                 .getOrElse(value) 
        
             } 
        
             def parseConfig(conf: Map[String, String]): Map[String, String] = { 
        
               val provider = new MapProvider(conf.filter(_._1.startsWith("spark.gluten."))) 
        
               conf.map { 
        
                 case (k, v) => 
        
                   if (k.startsWith("spark.gluten.")) { 
        
                     (k, getConfString(provider, k, v)) 
        
                   } else { 
        
                     (k, v) 
        
                   } 
        
               }.toMap 
        
             } 
        
           }

spark.unsafe.sorter.spill.reader.buffer.size=2m should be converted by above codes, could you investigate why it not works?

Yohahaha · 2025-02-08T01:24:23Z

incubator-gluten/shims/common/src/main/scala/org/apache/spark/sql/internal/GlutenConfigUtil.scala

Lines 21 to 45 in e009208

object GlutenConfigUtil {

private def getConfString(configProvider: ConfigProvider, key: String, value: String): String = {

Option(ConfigEntry.findEntry(key))

.map {

_.readFrom(configProvider) match {

case o: Option[_] => o.map(_.toString).getOrElse(value)

case null => value

case v => v.toString

}

}

.getOrElse(value)

}

def parseConfig(conf: Map[String, String]): Map[String, String] = {

val provider = new MapProvider(conf.filter(_._1.startsWith("spark.gluten.")))

conf.map {

case (k, v) =>

if (k.startsWith("spark.gluten.")) {

(k, getConfString(provider, k, v))

} else {

(k, v)

}

}.toMap

}

}

spark.unsafe.sorter.spill.reader.buffer.size=2m should be converted by above codes, could you investigate why it not works?

oh, GlutenConfigUtil only process the config which prefix is 'spark.gluten'.

boneanxs · 2025-02-08T01:31:41Z

incubator-gluten/shims/common/src/main/scala/org/apache/spark/sql/internal/GlutenConfigUtil.scala

Lines 21 to 45 in e009208

object GlutenConfigUtil {

private def getConfString(configProvider: ConfigProvider, key: String, value: String): String = {

Option(ConfigEntry.findEntry(key))

.map {

_.readFrom(configProvider) match {

case o: Option[_] => o.map(_.toString).getOrElse(value)

case null => value

case v => v.toString

}

}

.getOrElse(value)

}

def parseConfig(conf: Map[String, String]): Map[String, String] = {

val provider = new MapProvider(conf.filter(_._1.startsWith("spark.gluten.")))

conf.map {

case (k, v) =>

if (k.startsWith("spark.gluten.")) {

(k, getConfString(provider, k, v))

} else {

(k, v)

}

}.toMap

}

}

spark.unsafe.sorter.spill.reader.buffer.size=2m should be converted by above codes, could you investigate why it not works?

oh, GlutenConfigUtil only process the config which prefix is 'spark.gluten'.

Do we need to extend this function to allow all configures?

Yohahaha · 2025-02-08T02:46:54Z

Do we need to extend this function to allow all configures?

yeah, we may need add new method GlutenConfigUtil#get(ConfigEntry) to process non-sql configs.

boneanxs · 2025-02-11T02:55:26Z

Do we need to extend this function to allow all configures?

yeah, we may need add new method GlutenConfigUtil#get(ConfigEntry) to process non-sql configs.

Hey @Yohahaha We cannot use ConfigEntry since they are private in Spark package only

https://github.com/apache/spark/blob/cea79dc1918b7f03870fe1cb189da9a152e3bbaf/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1892-L1899

How about only extract a specific method that handle bytes value only to reduce duplicates?

Yohahaha · 2025-02-11T04:40:26Z

Do we need to extend this function to allow all configures?

yeah, we may need add new method GlutenConfigUtil#get(ConfigEntry) to process non-sql configs.

Hey @Yohahaha We cannot use ConfigEntry since they are private in Spark package only

https://github.com/apache/spark/blob/cea79dc1918b7f03870fe1cb189da9a152e3bbaf/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1892-L1899

How about only extract a specific method that handle bytes value only to reduce duplicates?

sounds good to me.

…fferSize to number

github-actions · 2025-02-12T08:33:34Z

Run Gluten Clickhouse CI on x86

github-actions bot added the CORE works for Gluten Core label Feb 7, 2025

boneanxs added 2 commits February 12, 2025 03:32

[VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBu…

d96f3ef

…fferSize to number

Reduce duplicates

590a0a7

boneanxs force-pushed the fix_unexpected_character branch from 3cf701b to 590a0a7 Compare February 12, 2025 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBufferSize to number #8684

[VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBufferSize to number #8684

boneanxs commented Feb 7, 2025 •

edited

Loading

github-actions bot commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

boneanxs commented Feb 7, 2025

Yohahaha commented Feb 7, 2025

Yohahaha commented Feb 8, 2025

boneanxs commented Feb 8, 2025

Yohahaha commented Feb 8, 2025

boneanxs commented Feb 11, 2025

Yohahaha commented Feb 11, 2025

github-actions bot commented Feb 12, 2025

[VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBufferSize to number #8684

Are you sure you want to change the base?

[VL] Should convert kSpillReadBufferSize and kShuffleSpillDiskWriteBufferSize to number #8684

Conversation

boneanxs commented Feb 7, 2025 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

boneanxs commented Feb 7, 2025

Yohahaha commented Feb 7, 2025

Yohahaha commented Feb 8, 2025

boneanxs commented Feb 8, 2025

Yohahaha commented Feb 8, 2025

boneanxs commented Feb 11, 2025

Yohahaha commented Feb 11, 2025

github-actions bot commented Feb 12, 2025

boneanxs commented Feb 7, 2025 •

edited

Loading