03 Sep 09:51

weiting-chen

c82af60

v1.2.0 Latest

Latest

Release Notes - Gluten version 1.2.0

We are pleased to announce that Gluten v1.2.0 has been published as 1st official Apache release.

Highlights (Velox backend only)

Support Spark 3.2.2, 3.3.1, 3.4.2, and 3.5.1 with all UTs passed(if data type supported)
Support 31 common Spark Operators(based on Spark3.2)
Support 266 common Spark Functions(based on Spark3.2)
Velox codebase updated to 2024/07/05
New RSS support: add Apache Uniffle integration
New Data Lake support: Iceberge, Delta Lake
New File Format Support: CSV
Enhanced CI workflow
Refresh Documentations in Gluten website(https://gluten.apache.org/)
More Stability in Spill, OOM, and other cases support
More Bug Fixing

What's Changed

[CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
[GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
[GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
[CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
[GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
[VL] Allow replacing installed minio package by @PHILO-HE in #4825
[VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
[VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
[GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
[GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
[VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
[GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
[VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
[VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
[CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
[GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
[GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
[VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
[CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
[VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
[VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
[CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
[GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
[VL] minor change for delta ut by @zhli1142015 in #4869
[VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
[VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
[VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
[MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
[GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
[VL] Support inline function by @marin-ma in #4847
[VL] Add flushable decimal sum test case by @liujiayi771 in #4871
[CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
[VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
[VL] Fix and use flattenVector by @marin-ma in #4783
[VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
[CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
[VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
[CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
[GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
Add .asf.yml by @yaooqinn in #4892
Update Vulnerability Handling Process by @yaooqinn in #4894
[VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
[CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
[VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
[VL] Pass partition id to velox functions by @zhli1142015 in #4344
Add Incubation Standard Disclaimer by @yaooqinn in #4911
[GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
[Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
[GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
[VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
[DOC] Update release & configuration doc by @PHILO-HE in #4910
[VL] Support lead window function by @ulysses-you in #4902
[VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
[Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
[GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
[Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
[Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
[VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
[GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
[DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
[VL] Refine log plan/split json into one line by @Yohahaha in #4934
[VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
[CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
[VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
[GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
[VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
[VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
[GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in https://github.com/apache/i...

Contributors

zhouyuan, tigrux, and 73 other contributors

Assets 14

apache-gluten-1.2.0-incubating-bin-spark32.tar.gz

115 MB 2024-11-12T03:56:52Z
apache-gluten-1.2.0-incubating-bin-spark32.tar.gz.asc

833 Bytes 2024-11-12T03:58:49Z
apache-gluten-1.2.0-incubating-bin-spark32.tar.gz.sha512

180 Bytes 2024-11-12T03:58:50Z
apache-gluten-1.2.0-incubating-bin-spark33.tar.gz

115 MB 2024-11-12T03:56:45Z
apache-gluten-1.2.0-incubating-bin-spark33.tar.gz.asc

833 Bytes 2024-11-12T03:58:51Z
apache-gluten-1.2.0-incubating-bin-spark33.tar.gz.sha512

180 Bytes 2024-11-12T03:58:43Z
apache-gluten-1.2.0-incubating-bin-spark34.tar.gz

115 MB 2024-11-12T03:56:37Z
apache-gluten-1.2.0-incubating-bin-spark34.tar.gz.asc

833 Bytes 2024-11-12T03:58:46Z
apache-gluten-1.2.0-incubating-bin-spark34.tar.gz.sha512

180 Bytes 2024-11-12T03:58:47Z
apache-gluten-1.2.0-incubating-bin-spark35.tar.gz

115 MB 2024-11-12T03:56:26Z
Source code (zip)

2024-08-21T08:12:39Z
Source code (tar.gz)

2024-08-21T08:12:39Z

21 Aug 09:34

weiting-chen

v1.2.0-rc3

c82af60

v1.2.0-rc3 Pre-release

Pre-release

What's Changed

[CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
[GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
[GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
[CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
[GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
[VL] Allow replacing installed minio package by @PHILO-HE in #4825
[VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
[VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
[GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
[GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
[VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
[GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
[VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
[VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
[CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
[GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
[GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
[VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
[CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
[VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
[VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
[CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
[GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
[VL] minor change for delta ut by @zhli1142015 in #4869
[VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
[VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
[VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
[MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
[GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
[VL] Support inline function by @marin-ma in #4847
[VL] Add flushable decimal sum test case by @liujiayi771 in #4871
[CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
[VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
[VL] Fix and use flattenVector by @marin-ma in #4783
[VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
[CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
[VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
[CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
[GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
Add .asf.yml by @yaooqinn in #4892
Update Vulnerability Handling Process by @yaooqinn in #4894
[VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
[CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
[VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
[VL] Pass partition id to velox functions by @zhli1142015 in #4344
Add Incubation Standard Disclaimer by @yaooqinn in #4911
[GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
[Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
[GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
[VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
[DOC] Update release & configuration doc by @PHILO-HE in #4910
[VL] Support lead window function by @ulysses-you in #4902
[VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
[Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
[GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
[Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
[Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
[VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
[GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
[DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
[VL] Refine log plan/split json into one line by @Yohahaha in #4934
[VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
[CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
[VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
[GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
[VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
[VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
[GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
[VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
[VL] Support skewness aggregate function by @liujiayi771 in #4939
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
[VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
[VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
[VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...

Contributors

zhouyuan, tigrux, and 73 other contributors

Assets 2

14 Aug 07:04

weiting-chen

v1.2.0-rc2

27e988d

v1.2.0-rc2 Pre-release

Pre-release

What's Changed

[CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
[GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
[GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
[CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
[GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
[VL] Allow replacing installed minio package by @PHILO-HE in #4825
[VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
[VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
[GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
[GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
[VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
[GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
[VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
[VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
[CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
[GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
[GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
[VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
[CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
[VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
[VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
[CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
[GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
[VL] minor change for delta ut by @zhli1142015 in #4869
[VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
[VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
[VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
[MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
[GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
[VL] Support inline function by @marin-ma in #4847
[VL] Add flushable decimal sum test case by @liujiayi771 in #4871
[CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
[VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
[VL] Fix and use flattenVector by @marin-ma in #4783
[VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
[CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
[VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
[CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
[GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
Add .asf.yml by @yaooqinn in #4892
Update Vulnerability Handling Process by @yaooqinn in #4894
[VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
[CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
[VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
[VL] Pass partition id to velox functions by @zhli1142015 in #4344
Add Incubation Standard Disclaimer by @yaooqinn in #4911
[GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
[Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
[GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
[VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
[DOC] Update release & configuration doc by @PHILO-HE in #4910
[VL] Support lead window function by @ulysses-you in #4902
[VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
[Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
[GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
[Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
[Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
[VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
[GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
[DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
[VL] Refine log plan/split json into one line by @Yohahaha in #4934
[VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
[CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
[VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
[GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
[VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
[VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
[GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
[VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
[VL] Support skewness aggregate function by @liujiayi771 in #4939
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
[VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
[VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
[VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...

Contributors

zhouyuan, tigrux, and 73 other contributors

Assets 2

25 Jul 23:40

weiting-chen

v1.2.0-rc1

c9f3d89

v1.2.0-rc1 Pre-release

Pre-release

What's Changed

[CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
[GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
[GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
[CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
[GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
[VL] Allow replacing installed minio package by @PHILO-HE in #4825
[VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
[VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
[GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
[GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
[VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
[GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
[VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
[VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
[CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
[GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
[GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
[VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
[CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
[VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
[VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
[CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
[GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
[VL] minor change for delta ut by @zhli1142015 in #4869
[VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
[VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
[VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
[MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
[GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
[VL] Support inline function by @marin-ma in #4847
[VL] Add flushable decimal sum test case by @liujiayi771 in #4871
[CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
[VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
[VL] Fix and use flattenVector by @marin-ma in #4783
[VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
[CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
[VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
[CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
[GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
Add .asf.yml by @yaooqinn in #4892
Update Vulnerability Handling Process by @yaooqinn in #4894
[VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
[CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
[VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
[VL] Pass partition id to velox functions by @zhli1142015 in #4344
Add Incubation Standard Disclaimer by @yaooqinn in #4911
[GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
[Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
[GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
[VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
[DOC] Update release & configuration doc by @PHILO-HE in #4910
[VL] Support lead window function by @ulysses-you in #4902
[VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
[Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
[GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
[Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
[Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
[VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
[GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
[DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
[VL] Refine log plan/split json into one line by @Yohahaha in #4934
[VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
[CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
[VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
[GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
[VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
[VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
[GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
[VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
[VL] Support skewness aggregate function by @liujiayi771 in #4939
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
[VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
[VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
[VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...

Contributors

zhouyuan, tigrux, and 73 other contributors

Assets 2

06 Jul 12:35

weiting-chen

v1.2.0-rc0

c215035

v1.2.0-rc0 Pre-release

Pre-release

What's Changed

[CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
[GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
[GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
[CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
[GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
[VL] Allow replacing installed minio package by @PHILO-HE in #4825
[VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
[VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
[GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
[GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
[VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
[GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
[VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
[VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
[CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
[GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
[GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
[VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
[CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
[VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
[VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
[CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
[GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
[VL] minor change for delta ut by @zhli1142015 in #4869
[VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
[VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
[VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
[MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
[GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
[VL] Support inline function by @marin-ma in #4847
[VL] Add flushable decimal sum test case by @liujiayi771 in #4871
[CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
[VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
[VL] Fix and use flattenVector by @marin-ma in #4783
[VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
[CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
[VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
[CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
[GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
Add .asf.yml by @yaooqinn in #4892
Update Vulnerability Handling Process by @yaooqinn in #4894
[VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
[CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
[VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
[VL] Pass partition id to velox functions by @zhli1142015 in #4344
Add Incubation Standard Disclaimer by @yaooqinn in #4911
[GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
[Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
[GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
[VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
[DOC] Update release & configuration doc by @PHILO-HE in #4910
[VL] Support lead window function by @ulysses-you in #4902
[VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
[Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
[GLUTEN-4926][CELEBORN] CelebornShuffleManager should remove shuffleId from columnarShuffleIds after unregistering shuffle by @SteNicholas in #4927
[Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
[Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
[VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
[GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
[DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
[VL] Refine log plan/split json into one line by @Yohahaha in #4934
[VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
[CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
[VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
[GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
[VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
[VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
[GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
[VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
[VL] Support skewness aggregate function by @liujiayi771 in #4939
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
[VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
[VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
[VL] Untangle code of TransformPreOverrides by @zhztheplayer in ht...

Contributors

zhouyuan, tigrux, and 72 other contributors

Assets 2

02 Mar 05:29

weiting-chen

v1.1.1

7999b61

v1.1.1

Release Notes - Gluten - Version 1.1.1

We are pleased to announce that Gluten has been accepted as an Apache Incubating project. Additionally, we are excited to unveil the release of Gluten-1.1.1. This version marks the final release before our transition to Apache.

Highlights (Velox backend only)

Support Spark 3.2, 3.3, and 3.4(API only)
Support 30 common Spark Operators
Support 220 common Spark Functions
Velox codebase updated to 2024/02/29
Refactor Data Lake API to support Delta Lake Scan and Iceberg read COW table
Better S3, GCS support
More stability in Spill support
Enhance metric support for spill, shuffle, and additional metrics.
Enhance fallback case support by expanding coverage for missing cases and updating messages accordingly
Enhance Shuffle including merge before compressing, push based shuffle, and more
More Bug Fixing

What's Changed

[GLUTEN-3855][VL] Fix ORC related failed UT by @chenxu14 in #3805
[VL] Support IsNull filter pushdown by @rui-mo in #3791
[VL] Update velox-backend-limitations.md by @FelixYBW in #3639
[GLUTEN-2169][VL] Enable GlutenEnsureRequirementsSuite in unit tests by @JkSelf in #3860
[CH] Fix exception of pb MessageToJsonString by @exmy in #3823
[GLUTTEN-3851][VL] Add remaining filter time metric by @zhli1142015 in #3852
[VL] Support ignoreNulls for NthValue window function by @PHILO-HE in #3857
[VL] Enable using static link for QAT by @marin-ma in #3863
[VL] Fix assertion failures when mixing use of partial aggregation spilling and flushing by @zhztheplayer in #3872
[GLUTEN-3796][VL][FOLLOW_UP] Correct test name match and move black list to exclude in VeloxTestSettings by @zwangsheng in #3874
[GLUTEN-3528][VL] Construct unique & non-overlapping partition/sort keys for window operator by @PHILO-HE in #3883
[GLUTEN-3879][CH] salt 1% of TPCH-1 data to NULL instead of 10% by @binmahone in #3880
[VL] Doc refresh by @zhouyuan in #3882
[GLUTEN-3865][CH] Refactor aggregating without keys by @lgbo-ustc in #3866
[GLUTEN-3722][CH] Improve shuffle writer by @taiyang-li in #3728
[VL] Map date_format to a Velox function name by @PHILO-HE in #3878
[VL]Daily Update Velox Version (20231129) by @yma11 in #3877
[CORE] Add InputIteratorTransformer to decouple ReadRel and iterator index by @ulysses-you in #3854
[GLUTEN-3732][VL] Use arrow result-returning variants FileWriter::Open API by @yangzhg in #3733
[CORE] Move validate methods from TransformerApi to ValidatorApi by @exmy in #3881
[GLUTEN-3824][CH]Bug fix hdfs path contains space by @KevinyhZou in #3825
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20231201) by @lwz9103 in #3898
[VL] Break up spilling operation to two phases: shrink phase and spill phase by @zhztheplayer in #3895
[GLUTEN-1699][VL] Support loadLibFromJar on RedHat 7/8 by @ychris78 in #3893
[GLUTEN-3906] [VL] fix: fix package.sh failed for x86 by @lzjqsdd in #3907
[GLUTEN-3750][CH]Bug fix json parse error by @KevinyhZou in #3751
[GLUTEN-3902][VL] Add documentation to configure the Velox+GCS connector by @tigrux in #3902
[DOC] Revise Gluten document by @PHILO-HE in #3892
[VL]Daily Update Velox Version (20231203) by @yma11 in #3913
[VL] Minor improvements for CI stale bot by @zhztheplayer in #3888
[VL] Avoid reapplying code patches for external projects when ENABLE_EP_CACHE=ON by @zhztheplayer in #3916
[VL] minor change for fallback log by @zhli1142015 in #3919
[VL] Add sort merge join metrics by @ulysses-you in #3920
[GLUTEN-3378][CORE] Datasource V2 data lake read support by @liujiayi771 in #3843
[VL] ENABLE_EP_CACHE=ON still uses cached Velox build although the build arguments were changed by @zhztheplayer in #3926
[VL] Make bloom_filter_agg fall back when might_contain is not transformable by @zhli1142015 in #3917
[VL][CI] update docker build script by @zhouyuan in #3904
[GLUTEN-3917][FOLLOWUP] Add back SparkShimLoader import by @ulysses-you in #3940
[VL] Fix VeloxTPCHV1BhjSuite and VeloxTPCHV2Suite useV1SourceList by @liujiayi771 in #3930
[VL] Fix syntax error in stale.yml by @zhztheplayer in #3945
[GLUTEN-3854][CORE][FOLLOWUP] Add ColumnarInputAdapter back to recover UI graph by @ulysses-you in #3933
[GLUTEN-1632][CH]Daily Update Clickhouse Version (20231206) by @lwz9103 in #3938
[VL] Add output row metric for InputIteratorTransformer by @Yohahaha in #3939
[GLUTEN-3927][CH] Improve the performance of element_at by @taiyang-li in #3928
[GLUTEN-3908][CH] Improve shuffle split for clickhouse backend by remove ColumnNullable's memcmp by @KevinyhZou in #3909
[GLUTEN-3924][CORE] Match hive UDF name in case-insensitive mode during expression transformation by @taiyang-li in #3925
[GLUTEN-3958] Use getDeclaredConstructor().newInstance() in ScanTransformerFactory by @liujiayi771 in #3961
[GLUTEN-3944][CH]Fix gluten.jar with delta20 when use spark 3.3 by @lwz9103 in #3947
[VL] gluten-te: In dockerfiles, use symbolic link for /opt/velox by @zhztheplayer in #3946
[VL]Daily Update Velox Version (20231206) by @yma11 in #3954
Revert "[GLUTEN-3908][CH] Improve shuffle split for clickhouse backend by remove ColumnNullable's memcmp " by @baibaichen in #3965
[GLUTEN-3890][CH] Respect spill_threshold for all buffers in shuffle writer by @taiyang-li in #3891
[CORE] Fix wrong fallback cost by @ulysses-you in #3967
[GLUTEN-3922][CH] Fix incorrect shuffle hash id value when executing modulo by @zzcclp in #3923
[VL] quick fix for static build git conflict by @zhouyuan in #3971
[GLUTEN-3486][CH] Fix AQE cannot coalesce shuffle partitions by @exmy in #3941
[GLUTEN-3949][CH] Merge small blocks from upstream phase into a large one by @lgbo-ustc in #3952
[GLUTEN-3948][CH] Fix exception and diff of trunc function by @exmy in #3968
[GLUTEN-3979][CORE] Use exists() instead of map().exists() to improve code readability by @dcoliversun in #3980
[VL]Daily Update Velox Version (20231208) by @yma11 in #3973
Revert "[VL] Make bloom_filter_agg fall back when might_contain is not transformable (#3917)" by @loneylee in #3977
[GLUTEN-3580][VL] support read data from abfs with account key by @gaoyangxiaozhu in #3897
[GLUTEN-3991][CH] Fix the incorrect display name for the mergetree file format by @zzcclp in #3992
[VL] gluten-te: Enable BuildKit to support --cache-from by @zhztheplayer in #3964
[GLUTEN-3841][CH] Support spill in 2nd aggregate stage by @lgbo-ustc in #3772
[VL] Daily Update Velox Version (20231211) by @zhztheplayer in #3999
[VL] Fix StringToMap test failure by @PHILO-HE in #3995
[VL] Make bloom_filter_agg fall back when might_contain is not transformable by @zhli1142015 in #3994
[VL] Following #3996, fix CI error "Runtime factory already registered" by @zhztheplayer in #4001
[VL] Fix linking simdjson error when building benchmark by @PHILO-HE in #3960
[GLUTEN-4002][CH] Update InputIteratorTransformer metrics by @zzcclp in https://github.com/...

Contributors

zhouyuan, tigrux, and 54 other contributors

Assets 11

30 Nov 10:12

weiting-chen

v1.1.0

cce5596

Gluten v1.1.0

Release Notes - Gluten - Version 1.1.0

We are excited to announce the release of Gluten-1.1.0.
This version is the culmination of work from 45 contributors who have worked on features and bug-fixes for a total of over 800 commits since 1.0.0

Highlights (Velox backend only)

20% performance improvement in Decision Support Benchmarks comparing to v1.0.0
Support Spark 3.2 and Spark 3.3
Support Spark 3.4 (experimental)
Run Pass all Velox UTs, Spark 3.2/3.3 SQL related UTs
Support Ubuntu 20.04/22.04, CentOS 7/8, alinux 3, Anolis 7/8
Support File System: localfs, HDFS, S3, OSS(via s3a), GCS
Support File Format: Parquet, ORC
Support Data Lake: deltalake (experimental)
Support Data Types: Primitive Type, Decimal, Date, Timestamp, Array (partial), Map (partial), Struct (partial)
Support 28 common Spark Operators, detail here
Support 199 common Spark Functions, detail here
Support Dynamic Memory Pool and Spill
Support Velox UDF
Support Gluten UI to print fallback event in History Server
Support Hadoop HA and Kerberos
Velox code updated to 20231123(commit-id: aff0cde)
Document improvement for support features and configuration

Known Issues

Only support static partition write in Spark 3.2 and 3.3

New Features


#3722	[CH] improve mutex usage in shuffle writer
#2063	[CH] Spark sql config load dynamic by task
#3257	[VL] We may need more metrics collected by Velox
#3528	[VL] Construct unique partition/sort keys and removing overlapping sort key for window plan
#3381	[CH]Reuse last WholeStageTransformer instead of creating new one in FileFormatWriter
#2118	[CH] Support hive udtf
#2128	[CH]Support tablesample clause
#2163	[CH] support approx_percentile aggregate function
#2193	[CH] Support some array functions
#2207	[CH] Support function to_utc_timestamp/from_utc_timestamp
#2136	[CH] HiveTransform add metrics `readBytes`
#2439	[VL] array_aggregate support with lambda function
#2451	[CH] Support StaticInvoke function
#2460	Avoid force check Java thread in native side
#2465	Remove operator level fallback policy
#2472	[CH] Remove BasicScanExecTransformer#getInputFilePaths when CH support more general partition location parsing
#3187	[CH] Implement runtime native bloom filter
#2267	[CH] Support urldecoder which is used in reflect(""java.net.URLDecoder"", ""decode"",event.event_info['currenturl'], ""UTF-8"")
#2309	Implement Streaming Window in Velox backend to reduce the memory usage.
#2323	[CH] Build optimization
#2343	[VL] ShuffleWrite: Larger shuffle size than vanilla spark and long compression time
#2365	[CH] gluten should support setting max bytes for a partition for orc/parquet
#2390	[CH] Aligning the NULL and NaN compare semantics of Spark and CH
#2600	[CH] enhance S3 client caching
#2617	[VL][Spark 3.3+] support pushdown aggregate to native scan insteads of fallback
#2619	[VL][Spark 3.3+] support match columns use filedIds in native insteads of fallback
#2667	[VL] Stacktrace-categorized memory allocation dumping for debugging
#2730	Request for documentation on how to write a backend for 3rd party engines
#2761	[DOC] A doc named index.md share same content with README.md
#2772	[VL] When performance degradation，What factors may affect the performance？
#2783	[VL]Run CI with DEBUG build mode to enhance stability
#2791	[VL] Support spark function: concat_ws
#2793	Code refactor: move some common code to a root module named common
#2807	Code cleanup: FunctionConfig may be useless
#2515	when we will support spark -gpu ,now we need spark -gpu feature to train big model
#2535	UnsupportedOperationException is abused
#2593	List parquet write semantic differents in Spark and gluten
#2804	Handle timeZoneId for TimezoneAwareExpression
#2815	[VL] complex data type support in parquet scan
#2825	[VL] In Java, consolidate GlutenColumnarBatchSerializer and CelebornColumnarBatchSerializer
#2826	[VL] Use a dedicate class to maintain gluten native config
#2845	[VL] Separate each jni wrapper to different files
#2874	[VL] support `spark.sql.decimalOperations.allowPrecisionLoss`
#2877	[VL] Support read iceberg
#2905	[VL] Support percentile function
#2919	[VL] Support ORC format in HiveTableScanExecTransformer
#2956	[VL] Support NullType in Project
#2975	[VL] Track MemoryManager feature
#3015	[CH] ReusedExchange: Gluten does not touch it or does not support it
#3017	[VL] Allow users to set spill partitions/levels
#3033	[CH] Support aggregation spill for the second stage
#3049	[CORE] Statement level controls whether to use gluten
#3817	[CH] Optimize mergetree prewhwhere
#3704	[CH] support tuple subcolumn pruning for orc/parquet
#3784	DNM
#3144	[CH] Aggregation supports complicate type
#3715	[VL] Add support for GCS
#2106	[VL] CI: allow to benchmark TPCH performance on comment
#3702	[VL] Add sort based window support in velox backend
#2404	[VL] Enable Velox memory reclaimer for auto disk-spilling
#3082	[CORE] Support columnar CollectLimit
#3739	[VL] Add config to disable velox file handle cache
#3055	[VL] Use mixed memory (off-heap and on-heap) for native
#3077	[VL] EP: Centralized lifecycle management for C++ / JNI contextual objects
#3142	[VL] Tight Java-C++ object binding
#3075	[VL] Support static partition write in VL backend
#2533	Degrade Arrow version to 8.0 in VL backend.
#2629	Use Project + Unnest to implement Expand operator
#3132	Add streamingwindow support in velox backend
#3361	Support Spark 3.4 in Gluten.
#3425	[VL] Create Hdfs folder in Gluten side when writing hdfs file
#3541	[VL] Add minimal GHA CI job for debug build
[#3705](https://...

Assets 17

14 Jul 03:07

xieqi

v1.0.0

bfe394b

Gluten v1.0.0

Release Notes - Gluten - Version 1.0.0

Highlights (Velox backend only)

Support Spark 3.2 and Spark3.3
Run Pass all Velox, Spark3.2 UTs, and partially Spark3.3 UTs
Support Ubuntu 20.04/22.04, CentOS 7/8, alinux 3, Anolis 7/8
Support FileSystem: localfs, HDFS, S3, OSS (via s3a)
Support data types: Primitive type, Decimal, Date, Timestamp
Support 20 operators, detail here
Support 164 functions, detail here
Support native Parquet write
Support native ORC read
Support Intel® In-memory Analytics Accelerator (IAA/IAX) hardware accelerator in Shuffle compression
Support cap-based spill (static memory allocation) for join/agg/sort operator (experimental feature)
Support static build method via vcpkg
Support local cache (experimental feature)
2.71x speedup in Decision Support Benchmark1 (TPC-H Like) testing
2.29x speedup in Decision Support Benchmark2 (TPC-DS Like) testing
Velox code updated to commit
Document improvement for support features and configuration

Known Issues

Parquet write only support compression.codec, parquet.block.size and parquet.block.rows configurations
Velox backend does not support dynamic partition write and bucket write
Spill may throw OutOfMemoryExcetpion

New Features

[GLUTEN-1243][VL] Support bit_xor aggregate function
[GLUTEN-1245][VL][Feat] Add VeloxParquetFileFormat to support parquet write in velox backend
[GLUTEN-1270][VL][Feat] Support multiple HDFS endpoints
[GLUTEN-1306][VL] feat: Link static depends via vcpkg
[GLUTEN-1306][FOLLOWUP] vcpkg setup script add alinux3 support
[GLUTEN-1346][VL] Support native velox row to column
[GLUTEN-1367] Support running gluten on anolis
[GLUTEN-1371][VL] Support First/Last aggregate functions
[GLUTEN-1374][VL] RangePartitioning supports velox columnar batch
[GLUTEN-1409][VL] feat: Support named_struct in Velox backend
[GLUTEN-1476][VL] Support GetStructField
[GLUTEN-1478] Support ordered result check for MapData
[GLUTEN-1490] refactor substrait literals using generics, and support map/struct/array literals based on it
[GLUTEN-1521][Core] Support to add the customer columnar rules by config
[GLUTEN-1623][VL] Support asinh, acosh, atanh, sec, csc math functions for Velox backend
[GLUTEN-1638][VL] feat: Add hdfs support in parquet write
[GLUTEN-1640] Support judging whether the execution plan has a fallback
[GLUTEN-1654][VL] support approx_count_distinct for velox
[GLUTEN-1658][CORE] feat: Support SparkResourcesUtil.scala in k8s
[GLUTEN-1662][VL] feat: Support InsertIntoHiveDirCommand in velox parquet write
[GLUTEN-1704][VL] Support metrics on splits and row groups by
[GLUTEN-1794][VL] support split preload
[GLUTEN-1860] StructLiteral support null literal
[CORE] Support submit subqueries concurrently to improve scalar subquery performance
[VL] package.sh support centos7 and centos8
[VL] feat: support partial merge phase in aggregation
[VL] package and velox scripts add alinux support
[VL] feat: support more distinct functions
[VL] Support mocking map stage with no input files in micro benchmark
[VL] add support for reading ORC
[VL] add long decimal type support for Orc file format

Improvements

[GLUTEN-842][VL] convert expand op to expand exec in velox
[GLUTEN-842] remove group id transformer
[GLUTEN-1108][VL] Init NativeRowToColumnarJniWrapper with memory pool and schema
[GLUTEN-1199] Avoid throwing exception from destructor of JavaInputStreamAdaptor
[GLUTEN-1205][VL] Rename some class name and dir name for columnar sh…
[GLUTEN-1205][VL] Refactor shuffle partition writer
[GLUTEN-1205][VL] Refactor shuffle partitioner
[GLUTEN-1205][VL][FOLLOWUP] Refactor shuffle partition writer
[GLUTEN-1209][VL] refactor: Refactor Java Celeborn into an independent module
[GLUTEN-1296][VL] Remove some logs in CI
[GLUTEN-1325][VL] Optimize decimal arithmetic
[GLUTEN-1331][CORE] Enable some functions
[GLUTEN-1336][VL] add spark3.3 UT under connector and expression
[GLUTEN-1336][VL] move Spark3.3 Unit tests to seperate job
[GLUTEN-1336][VL] add more spark3.3 UT
[GLUTEN-1336][VL] CI: move slow tests into another job for Spark3.3
[GLUTEN-1357][CORE] Change soft-affinity log level from INFO to DEBUG
[GLUTEN-1369][Core] Move config 'spark.gluten.enabled' to GlutenConfig from QueryPlanSelector
[GLUTEN-1393][VL] feat: Change velox pipeline input from arrow to velox ValueStreamNode
[GLUTEN-1407] Let profile control shim version
[GLUTEN-1416][VL] NoSuchMethodError from shaded Arrow
[GLUTEN-1433][VL] feat: offload timestamp scan to Velox - phase 1
[GLUTEN-1433][VL] Enable GlutenStatisticsCollectionSuite
[GLUTEN-1434][VL] Delete some unused files and functions
[GLUTEN-1434][VL] Refactor to add ColumnarBatchIterator
[GLUTEN-1434][VL] Remove unused arrow code and add GLUTEN_CHECK and GLUTEN_DCHECK
[GLUTEN-1458][VL][CI] feat: Adding Spark3.3 w/ Ubuntu22.04 test
[GLUTEN-1476][VL] Enable scan on struct and map types
[GLUTEN-1476][CORE] Use correct field name in struct type
[GLUTEN-1478][VL] enable timestamp expression tests
[GLUTEN-1478] Enable failed UT in GlutenIntervalExpressionsSuite
[GLUTEN-1478][VL] Enable some spark UTs for cast function
[GLUTEN-1478][VL] Enable tests on casting from string to decimal
[GLUTEN-1478][VL] Enable test on casting from decimal to bool
[GLUTEN-1480][DOC] Refactor to enable github pages
[GLUTEN-1491][VL][feat] Refine row_number() method in velox backend
[GLUTEN-1500][VL] feat: Use 0.6 * task memory cap as spill threshold for all spillable operators
[GLUTEN-1500][VL] Implement OOM cap shared by tasks, and spill threshold shared by tasks and operators
[GLUTEN-1500][VL] Integrate with Velox arbitration API
[GLUTEN-1533][VL][Feat] Replace sort agg with gluten hash agg
[[GLUTEN-1534][VL]](https://github.com/oap-proj...

Contributors

zhouyuan, xieqi, and 40 other contributors

Assets 31

07 Apr 09:32

zhejiangxiaomai

0.5.0

3c3267a

Gluten 0.5.0 Pre-release

Pre-release

Change log

Generated on 2023-04-07

Gluten 0.5.0

Gluten 0.5.0 is the 1st preview release from the repository(https://github.com/oap-project/gluten).
In this release, we have merged 971 PRs and fixed 216 issues.

Here is the major highlight in Gluten 0.5.0:

Support Spark3.2 and Spark3.3
Support Ubuntu20.04 or later
Support CentOS7 and 8
Support JDK8 only
Support GCC9 or later
Use Substrait as unified plan
Use Velox as default backend engine
Use Celeborn as default RSS
Support most popular data types including Boolean, Byte, Short, Int, Long, Float, Double, Date, Decimal, String, ...etc.
Support Spill for Sort, Agg, and Join operators
Run Pass all Spark3.2 Unit Test
2.5x speedup in Decision Support Benchmark1(TPC-H Like) testing
2x speedup in Decision Support Benchmark2(TPC-DS Like) testing
Support Intel QAT accelerators in Shuffle compression

Limitations

Not Support Complex data type such as Array, Map, Struct
OOM happened in some operators not support Spill
Decimal result may mismatch in some cases

Features


#974	[CH] Supprt string repeat function
#1008	[CH] Support locate function
#1273	Implement cast decimal to int
#1223	[CH] support reading from S3 and using Clickhouse local cache to speed up
#1131	[Gluten-core] Add an option to only fallback once
#1165	Reduce GC Time when executing BHJ for CH backend.
#1147	[Gluten-core]Make validate failure logLevel configuable
#1100	Making transformer plan log more obvious
#1112	Refactor Gluten metrics and add apis for each backend
#926	gluten timezone not the same as backend
#1039	Remove compute pid metric in shuffle operator.
#882	Selective query execution
#959	Upgrade Arrow version to 11.0.0
#969	Docker for gluten running on centos 8
#986	Align and enrich metrics compare to Spark
#972	Can we separate native dynamic library from build generated jars?
#913	No Spark Shim Provider found for 3.2.0
#853	Support named struct type
#888	Clickhouse backend broadcast relation support r2c
#850	Add cast check in ExpressionTransformer
#825	Setup development environment for macOS
#788	Pass needed hadoop conf from driver to executor

Bugs Fixed


#1284	Scala double data is wronlgy compared with null in a ut
#729	Validation failed for GlutenHashAggregateExecTransformer class
#799	This operator doesn't support doExecuteColumnar
#527	archives for Spark patch versions become unavailable on new releases affecting shims versioning
#523	Some basic failed SQL cases
#1028	[VL] SusbtraitToVeloxPlan error
#858	Sort result mismatch issue with different input records.
#877	Array/Map DataType result mismatch issue when containing null value
#1227	[CH] Scalar subquery filters execute twice for parquet file
#1265	[CH] Rescale decimal trigger fallback
#1233	[CH] Fix fallback issue when reading csv files
#1235	[CH] Fix missing reading from the broadcasted value when executing DPP
#1234	[CH] Fix error 'Invalid number of columns in chunk pushed to OutputPort' when executing hash agg after union all
#1207	shims-spark32 and shims-spark33 may be depencied at the same time
#1161	Bundle built by `buildbundle-veloxbe.sh` for Spark3.3 is broken
#1210	[CH] Fix the wrong table path of the orders table for TPCH in UT
#1175	FileNotFoundException while executing spark jobs -.so files
#1179	[VL] CI is failing on boost's checksum
#1162	[CK]fix CoaleseBatches metrics
#1124	Memory management not suitable with Velox split preload feature.
#1149	Run tpc-ds core
#741	Handle remainder for the case that its right input is zero
#1090	[TPCH][VL] tpch has some query execution error logs but queries could finish and the result is correct
#1068	[VL] Managed memory leak in imported Spark UTs
#772	Velox does not install folly in centos8 by default, break compile in centos8.
#789	Jar conflicts on Arrow and Protobuf between Vanilla Spark and Gluten
#700	AARCH64 port of Gluten
#1027	[VL] unsupported method
#1072	[CH] Fix NPE when executing BatchScanExecTransformer.getInputFilePaths with MergeTree DS V2
#489	cannot build gluten (velox backend) in Amazon Linux 2
#1012	Enable local cache throw exception
#995	Fix memory leak for ClickHouse Backend
#914	System variables related to Folly could not be found when compiling gluten.
#990	Failed to build velox
#946	Upgrade arrow version to 10.0.1
#860	CH backend inset result not equals spark result
#601	Can't decide data type of null value in gluten test framework, when transforming InteralRow to DataFrame
#843	Unable to convert BHJ to SHJ by using hint
#826	ch_backend not support inset is empty
#815	Gluten + Velox backend does not support Struct dataset with same element name.
#563	Error compiling within -Pbackends-xx,spark-3.3,spark-ut
#560	An unsupportedOperationException interrupted the query execution
#770	VeloxRuntimeError when reading parquet file with only meta data
#800	[UT]ExpectedAnswer may not match SparkAnswer when is sorted
#676	WholeStageTransformerSuite#logForFailedTest() swallows exceptions
#790	Join RuntimeException when having duplicated equal-join keys
#757	Parquet scan not offloaded
#797	It won't load the libparquet.so.1000 when we use Gluten with Velox backend and run it on the yarn.
#784	No Spark Shim Provider found for 3.3.0
#547	Jar conflict issue
#727	build from local velox repo doesn't work

PRs


#1266	[GLUTEN-1246] [CORE] Fix scale may be negative issue
#1313	[VL] Update doc for centos7 install
#1312	[CH] Ignore ch backend tpcds suite
#1198	[VL] fix: Update Velox setup scripts for centos 7
#1294	[VL] Following #1185, do some clean-ups against Velox + Celeborn CI
[#1196](https://github.com/oa...

Assets 28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Notes - Gluten version 1.2.0

Highlights (Velox backend only)

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Release Notes - Gluten - Version 1.1.1

Highlights (Velox backend only)

What's Changed

Contributors

Release Notes - Gluten - Version 1.1.0

Highlights (Velox backend only)

Known Issues

New Features

Release Notes - Gluten - Version 1.0.0

Highlights (Velox backend only)

Known Issues

New Features

Improvements

Contributors

Change log

Gluten 0.5.0

Features

Bugs Fixed

PRs

Releases: apache/incubator-gluten

v1.2.0

Release Notes - Gluten version 1.2.0

Highlights (Velox backend only)

What's Changed

Contributors

v1.2.0-rc3

What's Changed

Contributors

v1.2.0-rc2

What's Changed

Contributors

v1.2.0-rc1

What's Changed

Contributors

v1.2.0-rc0

What's Changed

Contributors

v1.1.1

Release Notes - Gluten - Version 1.1.1

Highlights (Velox backend only)

What's Changed

Contributors

Gluten v1.1.0

Release Notes - Gluten - Version 1.1.0

Highlights (Velox backend only)

Known Issues

New Features

Gluten v1.0.0

Release Notes - Gluten - Version 1.0.0

Highlights (Velox backend only)

Known Issues

New Features

Improvements

Contributors

Gluten 0.5.0

Change log

Gluten 0.5.0

Features

Bugs Fixed

PRs