v1.2.0-rc0
Pre-release
Pre-release
What's Changed
- [CORE] Move all columnar rules to post-columnar transitions by @zhztheplayer in #4790
- [GLUTEN-4398][FOLLOW] Mask PullOutPostProject and PullOutPreProject id by @zwangsheng in #4815
- [GLUTEN-2956][VL] Support Spark NullType by @PHILO-HE in #2996
- [CORE] Add logical link to rewritten spark plan by @ulysses-you in #4817
- [GLUTEN-4803][UT] Add Golden Files for TPC-H Spark33 + Gluten Execution Plan by @zwangsheng in #4804
- [VL] Allow replacing installed minio package by @PHILO-HE in #4825
- [VL] Daily Update Velox Version (2024_03_01) by @GlutenPerfBot in #4821
- [VL] Enable more tests of GlutenParquetIOSuite for Spark32/33/34 by @Yohahaha in #4823
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240302) by @lwz9103 in #4837
- [GLUTEN-4039][VL] support map_keys and map_values by @konjac in #4826
- [GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten by @JkSelf in #4822
- [VL] Daily Update Velox Version (2024_03_04) by @GlutenPerfBot in #4841
- [GLUTEN-4813] Replace resize/reserve to resize_extact/reserve_exact to reduce memory by @taiyang-li in #4824
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240305) by @lwz9103 in #4849
- [VL] Fix boost installation issue and remove useless QueryCtx by @PHILO-HE in #4850
- [VL] Enable "parquet v2 pages - delta encoding" test for Spark33/Spark34 by @Yohahaha in #4816
- [CORE] Support FileSourceScanExec driver metrics for spark3.4/3.5 by @zhli1142015 in #4848
- [GLUTEN-4772][VL] Support empty map/array literal by @WangGuangxin in #4771
- [GLUTEN-4860][CELEBORN] Replace celeborn link by @kerwin-zk in #4861
- [VL][CI] Fix CI failure related to Celeborn by @PHILO-HE in #4862
- [CORE] Support In list option contains non-foldable expression by @ulysses-you in #4843
- [VL] Daily Update Velox Version (2024_03_05) by @GlutenPerfBot in #4852
- [VL] Enable more tests in GlutenParquetQuerySuite for Spark32/33/34 by @Yohahaha in #4854
- [CORE] ColumnarShuffleExchangeExec should respect advisoryPartitionSize for Spark3.5 by @ulysses-you in #4865
- [GLUTEN-4853][CORE] Only trim Alias when its child is semantically equal to resAttr by @liujiayi771 in #4857
- [VL] minor change for delta ut by @zhli1142015 in #4869
- [VL] Add libsodium.so to thirdparty lib for CentOS8 by @kerwin-zk in #4870
- [VL] Updated documentation, refactoring and added more testcases for BNLJ by @Surbhi-Vijay in #4782
- [VL] Daily Update Velox Version (2024_03_06) by @GlutenPerfBot in #4868
- [MINOR] Remove ExtendedAnalysisException by @PHILO-HE in #4864
- [GLUTEN-4831][VL] Support StructType in HashAggregate by @WangGuangxin in #4832
- [VL] Support inline function by @marin-ma in #4847
- [VL] Add flushable decimal sum test case by @liujiayi771 in #4871
- [CORE] Add synchronized for ExplainUtils processPlan by @ulysses-you in #4876
- [VL] Rewrite collect_set and collect_list aggregate function by @ulysses-you in #4805
- [VL] Fix and use flattenVector by @marin-ma in #4783
- [VL] Enable tests of ParquetPartitionDisconverySuite for Spark33/34 by @Yohahaha in #4881
- [CORE] Minor adjustment to columnar rule list, and move all columnar sub-rules to one source folder by @zhztheplayer in #4863
- [VL] Merge Partial and PartialMerge logic in generateMergeCompanionNode by @liujiayi771 in #4883
- [CORE] Fix Spark-3.5 CI by @ulysses-you in #4886
- [GLUTEN-4424][CORE] Follow up upgrading spark version to 3.5.1 by @JkSelf in #4845
- Add .asf.yml by @yaooqinn in #4892
- Update Vulnerability Handling Process by @yaooqinn in #4894
- [VL] Daily Update Velox Version (2024_03_07) by @GlutenPerfBot in #4877
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240308) by @lwz9103 in #4890
- [CORE] ColumnarBroadcastExchangeExec should set/cancel with job tag for Spark3.5 by @ulysses-you in #4882
- [VL] Daily Update Velox Version (2024_03_08) by @GlutenPerfBot in #4895
- [VL] Pass partition id to velox functions by @zhli1142015 in #4344
- Add Incubation Standard Disclaimer by @yaooqinn in #4911
- [GLUTEN-4835][CORE] Match metric names with Spark by @clee704 in #4834
- [Gluten-4732][CH] delta-mergetree support update/delete/upsert/insert in a more native delta way by @binmahone in #4733
- [GLUTEN-4898][CH]Bug fix to date diff by @KevinyhZou in #4900
- [VL] Daily Update Velox Version(2024_03_11) by @GlutenPerfBot in #4908
- [DOC] Update release & configuration doc by @PHILO-HE in #4910
- [VL] Support lead window function by @ulysses-you in #4902
- [VL] Fix protobuf configure arguments in get_velox.sh by @liujiayi771 in #4920
- [Gluten-4918][CH]support CTAS for clickhouse table by @binmahone in #4919
- [GLUTEN-4926][CELEBORN]
CelebornShuffleManager
should removeshuffleId
fromcolumnarShuffleIds
after unregistering shuffle by @SteNicholas in #4927 - [Gluten-4912][CH]Support Specifying columns in clickhouse tables to b… by @binmahone in #4925
- [Gluten-4706] [CH][CORE] Add a mode to execute count distinct directly instead o… by @binmahone in #4708
- [VL] Daily Update Velox Version (2024_03_12) by @GlutenPerfBot in #4923
- [GLUTEN-4914][CH] Fix exceptions in ASTParser by @taiyang-li in #4916
- [DOC] Minor fix for wrong gluten folder used in doc by @leoluan2009 in #4938
- [VL] Refine log plan/split json into one line by @Yohahaha in #4934
- [VL] Support posexplode function and code refactoring on GenerateExecTransformer by @marin-ma in #4901
- [CORE] Prior to #4893, add vanilla Spark's original scan source code to keep git history by @zhztheplayer in #4931
- [VL] Fix wrong plan equality due to case class inheritance by @zhztheplayer in #4893
- [GLUTEN-3559][VL] enable more sql query tests for Spark34 by @zhouyuan in #4880
- [VL] Daily Update Velox Version (2024_03_13) by @GlutenPerfBot in #4944
- [VL]Bucket join support for Iceberg tables by @SinghAsDev in #4859
- [GLUTEN-4827][UT] Add Golden Files for TPC-H Spark34 + Gluten Execution Plan by @zwangsheng in #4828
- [VL] Verify unhex has been offloaded to native successfully by @Yohahaha in #4937
- [VL] Support skewness aggregate function by @liujiayi771 in #4939
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240314) by @lwz9103 in #4948
- [VL] parquet file metadata columns support in velox by @gaoyangxiaozhu in #3870
- [VL] Daily Update Velox Version (2024_03_14) by @GlutenPerfBot in #4949
- [VL] Untangle code of TransformPreOverrides by @zhztheplayer in #4888
- [CORE] Refactor the inheritance relationship of joins by @ulysses-you in #4950
- [VL] Remove glog level config in unit test by @liujiayi771 in #4958
- [GLUTEN-3378][VL] Feat: Support read iceberg mor table for Velox backend by @liujiayi771 in #4779
- [CORE] Add Complete case match in PullOutPreProject by @liujiayi771 in #4968
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240315) by @lwz9103 in #4966
- [GLUTEN-4903][CELEBORN] Support multiple versions of Celeborn by @kerwin-zk in #4913
- [GLUTEN-4973][VL] Upgrade boost version to 1.84.0 by @liujiayi771 in #4974
- [VL] Daily Update Velox Version (2024_03_15) by @GlutenPerfBot in #4965
- [VL] Copy the libsodium.so with version number in dynamic packaging by @liujiayi771 in #4978
- [GLUTEN-4973][VL] Upgrade boost version to 1.84.0 for Ubuntu and Debian by @liujiayi771 in #4977
- [VL] Daily Update Velox Version (2024_03_17) by @GlutenPerfBot in #4984
- [VL] Add spark monotonically_increasing_id function support by @gaoyangxiaozhu in #4954
- [GLUTEN-4981] Add MacOS specific .DS_Store file to .gitignore by @xumingming in #4982
- [GLUTEN-4875][VL]Support spark sql conf sortBeforeRepartition to avoid stage partial retry casuing result mismatch by @zjuwangg in #4872
- [VL] Daily Update Velox Version (2024_03_18) by @GlutenPerfBot in #4985
- [VL] Remove a fix for installing boost from get_velox.sh by @PHILO-HE in #4986
- [VL] Fix generic benchmark usage when both split and data exists by @Yohahaha in #4972
- [GLUTEN-4796][VL] Force fallback for orc char type scan by @kerwin-zk in #4797
- [GLUTEN-4943][CH] Reserved padding area in DB::Memory should never be read or written by @taiyang-li in #4957
- [GLUTEN-4994][CH]Fix function conversions by @KevinyhZou in #4995
- [VL] Add support for spark_partition_id function by @gaoyangxiaozhu in #4969
- [GLUTEN-5003][VL] Fix Null literal fallback by @WangGuangxin in #5004
- [CORE] Introduce GlutenNotSupportException for expected fallback behavior by @PHILO-HE in #4996
- [VL] Add velox decimal avg sum large precision test case by @liujiayi771 in #4961
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240319) by @lwz9103 in #5008
- [VL] Daily Update Velox Version (2024_03_19) by @GlutenPerfBot in #5011
- [GLUTEN-3452][CH]Bug fix decimal divide by @KevinyhZou in #4951
- [VL] Enable make_timestamp by @marin-ma in #4746
- [DOC] adding mail list / wechat group / slack channel by @zhouyuan in #5013
- [GLUTEN-3582] Support PageIndex by @baibaichen in #4634
- [VL] Update clang-format version to v4.11.0 for Velox backend by @yma11 in #5035
- [GLUTEN-5009][CH] Fix TPCDS q9 failed with debug build by @exmy in #5015
- [GLUTEN-4899][VL]Fix 3.5 build issue with -Pspark-ut by @ayushi-agarwal in #4975
- [CH][Build][Minor] Fix Build Due to Clickhouse refactor by @baibaichen in #5034
- [GLUTEN-4999] Fix ColumnarUnionExec to get PartitionerAwareUnionRDD used if child RDDs share same partitioner by @guixiaowen in #5021
- [CORE] Catch more specific exceptions in Gluten plan validation by @PHILO-HE in #5036
- [GLUTEN-5001][CORE] Support mapping to native function for HiveGenericUDF by @WangGuangxin in #5002
- [VL] Support regexp_replace function with position argument by @PHILO-HE in #4411
- [GLUTEN-3378][VL][FOLLOWUP] Use List to store Iceberg delete files by @liujiayi771 in #4971
- [VL] Support lead/lag window function with negative input offset by @PHILO-HE in #5026
- [GLUTEN-5020][VL] Add sudo for macOS related commands by @xumingming in #5031
- [VL] Daily Update Velox Version (2024_03_20) by @yma11 in #5038
- [GLUTEN-5041][CH] Fix primary not used when query with filter by @loneylee in #5045
- [GLUTEN-4989][CH] Support function timestamp_add by @loneylee in #5012
- [GLUTEN-5051][Core] Optimize document: HowTo.md by @xumingming in #5052
- [GLUTEN-5027][VL] Fail fast for unsupported compilers by @xumingming in #5030
- [GLUTEN-4933][VL] Update iceberg version to 1.4.3 for Spark 3.4 and above by @yma11 in #4967
- [GLUTEN-5049][CH] Clean code in substring function parser and fix s3 building issue by @taiyang-li in #5050
- [CH][Minor] Fix build due to Clickhouse Refactor by @baibaichen in #5059
- [VL] Fix bug where session config is lost when benchmark is enabled by @FelixYBW in #5054
- [GLUTEN-4956][CH] Fix parsing string with blank prefix/suffix to number by @taiyang-li in #5022
- [GLUTEN-5061][CH] Fix assert error when writing mergetree data with select * from table limit n by @zzcclp in #5068
- [GLUTEN-5060][CH] Remove unnecessary FilterExec execution when querying from MergeTree with the prewhere by @zzcclp in #5067
- [DOC] Remove arrow version setting with 11.0.0-gluten by @PHILO-HE in #5065
- [VL][Minor] Update micro benchmark by @marin-ma in #4959
- [CORE] Pullout pre-project for ExpandExec by @liujiayi771 in #5066
- [GLUTEN-5062][CH] Add a UT to ensure that IN filtering can apply on CH primary key by @zzcclp in #5072
- [VL] Replace "git clone" by downloading tar package for boost in setup-ubuntu.sh by @liujiayi771 in #5071
- [VL] Support uuid function by @zhli1142015 in #5014
- [VL] Daily Update Velox Version (2024_03_21) by @GlutenPerfBot in #5064
- [GLUTEN-4830][VL] Support MapType substrait signature by @WangGuangxin in #4833
- [GLUTEN-5024][VL] Enhance buildbundle-veloxbe.sh to run single step by @xumingming in #5032
- [CORE] Enable bit_length Spark function by @PHILO-HE in #5069
- [GLUTEN-5039][VL] Add support for AppleClang compiler by @xumingming in #5053
- [VL] Remove docs about threshold-based spill by @zhztheplayer in #5078
- [GLUTEN-5016][CH] Fix exchange fallback in simple aggregation sql if spark.gluten.sql.columnar.preferColumnar=false by @lwz9103 in #5042
- [VL] Daily Update Velox Version (2024_03_22) by @GlutenPerfBot in #5077
- [GLUTEN-4997][CH]Fix year diff by @KevinyhZou in #5079
- [Gluten-4912][CH] fix bug when a query has no shuffle by @binmahone in #5081
- [GLUTEN-4675][CH] Support write mergetree to s3 by @loneylee in #4676
- [VL][CI] Use pre-installed celeborn to avoid download failure by @PHILO-HE in #5082
- [GLUTEN-5085] [VL] Fix get_velox.sh on macOS by @xumingming in #5086
- [CORE] Add support for Spark url_decode function by @gaoyangxiaozhu in #5070
- [VL] Remove lead/lag ignoreNulls workaround by @ulysses-you in #5084
- [VL] Add config for the thread num of velox spill executor by @WangGuangxin in #4794
- [GLUTEN-4917][VL] GHA with pre-built docker image and github runner by @zhouyuan in #4936
- [VL] Remove installing openssl in centos8 setup script by @yma11 in #5087
- [GLUTEN-4946][CH] Fix avg(bigint) overflow by @loudongfeng in #5048
- [GLUTEN-5074][VL] fix: UDF load error in yarn-cluster mode by @kecookier in #5075
- [CORE] Pullout pre/post project for generate by @marin-ma in #4952
- [VL] Daily Update Velox Version (2024_03_25) by @marin-ma in #5097
- [VL] Add timetravel and partition filter UTs for Iceberg scan by @yma11 in #5092
- [GLUTEN-5112] Support build bundle package through github action by @wangyum in #5056
- [GLUTEN-5108][CH] Fix the classes in the hadoop-common conflict when running ut local by @zzcclp in #5109
- [CORE] Add a config to fall back all regexp expressions by @PHILO-HE in #5099
- [GLUTEN-4263][VL] Fix compression type 2 not supported in static build by @zhztheplayer in #5121
- [Gluten][Spark 3.5] Fix spark35 ut build by @gaoyangxiaozhu in #5111
- [VL] Daily Update Velox Version (2024_03_26) by @marin-ma in #5115
- [VL] Supports register udf with different signatures by @marin-ma in #5104
- [MINOR] Remove redundant string format by @wForget in #5126
- [VL][MINOR] Refactor operator/function tests by @PHILO-HE in #5037
- [VL] Velox patch to avoid installing libunwind-dev no longer works by @zhztheplayer in #5127
- [GLUTEN-5133]Modify the prompt information for TakeOrderedAndProjectE… by @guixiaowen in #5134
- [CORE] Move BackendBuildInfo case class from GlutenPlugin to Backend class file by @wForget in #5129
- [VL] Enable SPARK-10634 timestamp test case by @liujiayi771 in #5090
- [CH] Issue 5018 by @binmahone in #5019
- [CORE] Support JDK17 by @ulysses-you in #5120
- [GLUTEN-5083][CH] Invalid result with
mergeTwoPhasesHashBaseAggregateIfNeed
enable by @lgbo-ustc in #5137 - [GLUTEN-5136][VL] Duplicated output from Spark-to-Velox broadcast relation conversion by @zhztheplayer in #5141
- [GLUTEN-5142][CELEBORN] Remove Incubating of Celeborn from reference by @SteNicholas in #5143
- [GLUTEN-4884][VL] Call getPartitions once in WholeStageTransformer by @acvictor in #4885
- [VL]Revert thrift build options change by @yma11 in #5146
- [VL] Daily Update Velox Version (2024_03_27) by @marin-ma in #5138
- [CORE] Basic runnable version of ACBO (Advanced CBO) by @zhztheplayer in #5058
- [VL] Enable a Spark test for window row frame with constant preceding/following by @PHILO-HE in #3887
- [CORE] Introduce aggregateExpressionMappings interface in SparkShims by @liujiayi771 in #5154
- [VL][CI] Add no-transfer-progress maven option to reduce verbose download log by @liujiayi771 in #5156
- [GLUTEN-5123][INFRA]set up java and maven according to os in build_bundle_package.yml by @dcoliversun in #5124
- [GLUTEN-4964][CORE]Fallback complex data type in parquet write for Spark32 & Spark33 by @JkSelf in #5107
- [CORE] Port "SPARK-39983 Should not cache unserialized broadcast relations on the driver" by @ulysses-you in #5149
- [VL] Support YearMonthIntervalType and enable make_ym_interval by @marin-ma in #4798
- [VL] Fix spark34 group-by.sql(.out) in GlutenSQLQueryTestSuite by @liujiayi771 in #5162
- [DOC] Add archive link for Gluten dev mail list by @PHILO-HE in #5172
- [CORE] Refine OOM message by @Yohahaha in #5166
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240328) by @lwz9103 in #5157
- [CI] Narrow down workflow's filter paths by @PHILO-HE in #5163
- [VL] Update gluten related jars info in docs by @yma11 in #5179
- [VL][CI] Add back upload golden files by @ulysses-you in #5173
- [VL] Daily Update Velox Version (2024_03_28) by @marin-ma in #5158
- [Core] minor change for eliminateProjectList by @zhli1142015 in #5144
- [VL] Fix Shuffle split get wrong binary length by @marin-ma in #5168
- [GLUTEN-5096][CH]Bug fix regex extract diff by @KevinyhZou in #5100
- [CH] Support Logarithm function by @exmy in #5184
- [GLUTEN-5189][VL] Correct boost lib path by @wForget in #5190
- [VL] Fix wrong task info when log split info by @Yohahaha in #5167
- [CORE][VL] ACBO: Add GlutenMetadataModel, move Gluten schema def from property model to metadata model by @zhztheplayer in #5159
- [VL] Remove useless agg mode check in applyExtractStruct by @liujiayi771 in #5161
- [GLUTEN-2163][CH] support aggregate function approx_percentile by @taiyang-li in #4829
- [VL] Restore the test cases for corr in group-by.sql and udf-group-by.sql by @liujiayi771 in #5175
- [VL] Enable bitwise_and, bitwise_not, bitwise_or, bitwise_xor for tinyint & int by @zhli1142015 in #5150
- [CH] Move project transformer rewriting code to CH backend by @zhztheplayer in #5171
- [VL] Daily Update Velox Version (2024_03_29) by @GlutenPerfBot in #5188
- [VL] Support kurtosis aggregate function by @liujiayi771 in #5151
- [CORE] Restore the function signature for eliminateProjectList by @liujiayi771 in #5191
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240330) by @kyligence-git in #5213
- [VL] Daily Update Velox Version (2024_03_30) by @GlutenPerfBot in #5214
- [GLUTEN-5203][VL] Support url_encode function by @wForget in #5204
- [Gluten-5152][CH]Support Optimize and VACUUM command for clickhouse tables by @binmahone in #5153
- [VL] CI: Enable GHA dependency cache on static Velox build by @zhztheplayer in #5145
- [CORE] Change domain name to org.apache.gluten by @yma11 in #5185
- [VL] Daily Update Velox Version (2024_04_01) by @GlutenPerfBot in #5223
- [GLUTEN-5211][VL] Fix typos in velox-backend-support-progress.md by @xumingming in #5212
- [VL] gluten-it: Remove specific result matching code for Q65 by @zhztheplayer in #5226
- [VL] gluten-it: Shorten table creation and query runner logs by @zhztheplayer in #5227
- [HOTFIX][CH] Ignore MergeTree write on hdfs/s3 ut by @zzcclp in #5232
- [CORE] Rename acbo to ras by @zhztheplayer in #5231
- [VL] Support regr_r2 aggregate function by @liujiayi771 in #5210
- [Gluten-5152][CH] fix core dump issues when running in parallel by @binmahone in #5235
- [VL] Support native UDAF by @marin-ma in #5130
- [VL] Correct preProjection to rowConstruction for hash aggregate metrics by @liujiayi771 in #5241
- Revert "[VL] gluten-it: Shorten table creation and query runner logs" by @zhztheplayer in #5237
- [DOC] highlight contributors in Readme by @zhouyuan in #5238
- [VL] Refactor native validation exception handling by @zhli1142015 in #5233
- [VL] Enable to_utc_timestamp Spark function by @acvictor in #5139
- [GLUTEN-4745][CH] support Sort Merge Join by @loudongfeng in #4812
- [GLUTEN-4917][VL][CI] Enable Gluten CPP tests by @PHILO-HE in #5114
- [GLUTEN-5178][VL] Revert "[VL] Remove installing openssl in centos8 setup script (#5087)" by @PHILO-HE in #5245
- [Core] Supports generating nested complex type in RandomParquetDataGenerator by @marin-ma in #5200
- [VL] Enable from_utc_timestamp Spark function by @acvictor in #5140
- [VL] Hot-fix for checkOperator by @marin-ma in #5262
- [VL] Daily Update Velox Version (2024_04_02) by @GlutenPerfBot in #5244
- [GLUTEN-5196][CH]Fix comments regexp extract by @KevinyhZou in #5198
- [CH] Support nanvl function by @exmy in #5199
- [CH] Support csc/sec/cot function by @exmy in #5239
- [VL] Fix yasm installation by @PHILO-HE in #5261
- [CORE] Generate junit xml for Delta/Iceberg module by @Yohahaha in #5263
- [GLUTEN-5264][VL] Correct libglog.so.0 & libglog.so.1 path when building third-lib package on centos-8 by @dcoliversun in #5265
- [CH] Hot fix for checkOperatorMatch by @liujiayi771 in #5267
- [VL] Support regr_slope aggregate function by @liujiayi771 in #5216
- [VL] Add 3 configs of spill by @FelixYBW in #5088
- [VL] Fix load and link libglog.so.1 in SharedLibraryLoaderCentos8 by @liujiayi771 in #5271
- [GLUTEN-4964][VL] Add null value in data validation for parquet and orc by @JkSelf in #5259
- [DOC] Fix minor typos in README by @sujithjay in #5270
- [VL] CI: Update dependency cache only when main branch is updated by @zhztheplayer in #5234
- [Gluten-5256][CH]optimizing table after spark restart bug by @binmahone in #5258
- [GLUTEN-4917][VL] Enable celeborn test in CI by @PHILO-HE in #5247
- [VL] Enable array_remove Spark function by @acvictor in #5268
- [GLUTEN-5277] Add a notebook example for enabling Gluten in PySpark by @yma11 in #5278
- [VL] Support regr_intercept aggregate function by @liujiayi771 in #5273
- [VL][DOC] Update velox-backend-support-progress.md by @PHILO-HE in #5284
- [VL] Daily Update Velox Version (2024_04_03) by @marin-ma in #5272
- [GLUTEN-4917][VL] Enable iceberg/delta in new CI by @yma11 in #5230
- [CH][Minor] Fix build due to Clickhouse Refactor by @baibaichen in #5291
- [VL] Daily Update Velox Version (2024_04_04) by @GlutenPerfBot in #5290
- [VL] Daily Update Velox Version (2024_04_05) by @GlutenPerfBot in #5294
- [GLUTEN-5219][CH]Fix the table metadata sync issue for the CH backend by @zzcclp in #5221
- [VL] RAS: Add EnumeratedApplier to manage columnar rule applications when ras is enabled by @zhztheplayer in #5276
- [GLUTEN-4917][VL][CI] update to use apache/gluten docker by @zhouyuan in #5186
- [VL] Re-enable some failed CI jobs by @zhztheplayer in #5224
- [VL] Daily Update Velox Version (2024_04_07) by @GlutenPerfBot in #5299
- [VL][DOC] Reformat velox-backend-support-progress.md and add regr_avgx regr_avgy regr_count by @liujiayi771 in #5292
- [GLUTEN-5102][VL] Support cast date as timestamp in velox by @dcoliversun in #5240
- [VL] Enable unix_date Spark function by @acvictor in #5287
- [VL] Use collect_list aggregate function in velox by @liujiayi771 in #5285
- [VL] Support regr_sxy aggregate function by @liujiayi771 in #5295
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240407) by @kyligence-git in #5298
- [VL][CI] Add a nightly build job by @PHILO-HE in #5280
- [GLUTEN-5182] [CH] fix fail to parse post join filter by @shuai-xu in #5183
- [GLUTEN-5301] Remove unnecessary test util method 'isSparkVersionAtleast' by @yma11 in #5302
- [CORE] [UT] Ensure GlutenBroadcastJoinSuite run with Gluten enabled by @Yohahaha in #5283
- [VL] Daily Update Velox Version (2024_04_08) by @GlutenPerfBot in #5313
- [CORE][VL] RAS: Refactor memo cache to look up on cluster-canonical node rather than on group-canonical node by @zhztheplayer in #5305
- [VL] RAS: Group reduction support by @zhztheplayer in #5201
- [CELEBORN] Check
spark.shuffle.compress
first to decide whether to compress shuffle data by @kerwin-zk in #5228 - [CORE][VL] RAS: Group expansion support by @zhztheplayer in #5323
- [GLUTEN-5309] Add UTs for Spark 3.5 by @yma11 in #5310
- [GLUTEN-5316][CORE] Add @OverRide annotation for some methods by @dcoliversun in #5317
- [GLUTEN-5324]Error message prompted during LocalLimitExec conversion … by @guixiaowen in #5325
- [VL] Fix Alinux3 velox compilation exception by @kerwin-zk in #5318
- [GLUTEN-5331]Update the doc of Build Gluten with Velox Backend #5331 by @guixiaowen in #5332
- [CORE] Remove useless GlutenFilePartition by @WangGuangxin in #5319
- [VL] Daily Update Velox Version (2024_04_09) by @zhztheplayer in #5328
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240409) by @kyligence-git in #5329
- [VL] RAS: Include aggregate transformation into enumerated transform, and add TPC-H golden checks for RAS by @zhztheplayer in #5333
- [VL] Enable Window oom in ci job by @JkSelf in #4929
- [VL] Enable Spark3.4 linear-regression.sql test case in GlutenSQLQueryTestSuite by @liujiayi771 in #5306
- [VL] Daily Update Velox Version (2024_04_10) by @GlutenPerfBot in #5347
- [GLUTEN-4039][VL] Add array filter function support by @ivoson in #5334
- [GLUTEN-4917][VL] Update dependencies in static packaging by @zhouyuan in #5339
- [VL] Fix wrong result caused by missing metadata in hash registration by @marin-ma in #5355
- [VL] Fix kParquetWriteTimestampUnit to kParquetWriteTimestampUnitSession by @Yohahaha in #5281
- [CORE][VL] RAS: Pattern matching by node classes by @zhztheplayer in #5361
- [CORE] Enhance gluten config parsing by @Yohahaha in #5357
- [CORE][VL] RAS: Avoid re-exploring explored nodes in DpPlanner by @zhztheplayer in #5363
- [VL] Add uniffle integration by @summaryzb in #3767
- [VL] Daily Update Velox Version (2024_04_11) by @GlutenPerfBot in #5360
- [VL] Fix wrong result for try_add by @zhli1142015 in #5356
- [GLUTEN-5336][DOC] Update build parameters #5338 by @guixiaowen in #5340
- [GLUTEN-5309][VL] Enable Spark3.5 UTs with failed ones excluded by @yma11 in #5342
- [GLUTEN-5344][VL] Add some parquet example files from parquet-mr for native read test by @yma11 in #5345
- [Gluten-5152][CH] fix bugs for optimizing tables on s3 by @binmahone in #5282
- [CORE] Support KnownNullable and KnownNotNull by @Yohahaha in #5365
- [VL] RAS: Integrate filter rules into enumerated transform by @zhztheplayer in #5367
- [VL] Enable weekofyear function by @Yohahaha in #5371
- [CH] Fix diff of factorial function by @exmy in #5330
- [VL] Spark 3.5: fix and enable all ut for GlutenFileMetadataStructSuite by @gaoyangxiaozhu in #5377
- [VL] Daily Update Velox Version (2024_04_12) by @GlutenPerfBot in #5375
- [CH] [Minor] fix ut due to ClickHouse/ClickHouse#61216 by @baibaichen in #5388
- [VL] Daily Update Velox Version (2024_04_13) by @GlutenPerfBot in #5390
- [DOC] Fix broken link in NewToGluten.md by @yew1eb in #5394
- [VL] Daily Update Velox Version (2024_04_14) by @GlutenPerfBot in #5397
- [VL] fix bug of string buffer size calculation in shuffle by @FelixYBW in #5395
- [GLUTEN-5381] Refine testWithSpecifiedSparkVersion to compare major version by @yma11 in #5382
- [GLUTEN-5403][CH] Fix build error due to delta.package.name not set by default by @lwz9103 in #5404
- [VL] Daily Update Velox Version (2024_04_15) by @GlutenPerfBot in #5400
- [GLUTEN-5341][VL] Fix and enable some uts of spark 3.5 by @gaoyangxiaozhu in #5379
- [VL] Fix timestamp precision loss in serializer by @zhli1142015 in #5376
- [VL] Move out a non-common test case from VeloxTPCHSuite by @zhztheplayer in #5402
- [GLUTEN-5303][CH]Fix get_json_object on abnormal string contains
NULL
control character by @KevinyhZou in #5304 - [GLUTEN-5380][CH] Support bin function by @exmy in #5383
- [GLUTEN-4483][CH]Improve divide by @KevinyhZou in #5387
- [GLUTEN-5341] Fix and enable delta UTs for Spark3.5 by @yma11 in #5393
- [VL] Support collect_list in window by @liujiayi771 in #5408
- [GLUTEN-5391][CH] Fix equalTo NaN issue by @loudongfeng in #5392
- [GLUTEN-5335][VL] Use common name for both celeborn and uniffle by @summaryzb in #5385
- [GLUTEN-5341][VL][Part 3] Fix and enable some uts of spark 3.5 by @gaoyangxiaozhu in #5411
- [GLUTEN-5417][CH] Fix CH backend build error due to uniffle not supported by @lwz9103 in #5418
- [GLUTEN-5249] [CH] fix throw Unexpected empty column when reading csv file by @shuai-xu in #5254
- [VL] Add independent operator for top-n processing in TakeOrderedAndProjectExecTransformer by @zhztheplayer in #5409
- [VL] Daily Update Velox Version (2024_04_16) by @GlutenPerfBot in #5413
- [VL] Fix ORC reader for ByteType by @kecookier in #5416
- [CORE] Move memory off-heap conf checks to driver plugin by @wForget in #5128
- [GLUTEN-5419][CH] Support writing and reading the mergetree data by the path based table by @zzcclp in #5421
- [VL] Fallback window operator when the range frame contain literal by @JkSelf in #5431
- [VL] Daily Update Velox Version (2024_04_17) by @GlutenPerfBot in #5429
- [VL] Add a bad test case when bloom_filter_agg is fallen back while might_contain is not by @zhztheplayer in #5433
- [DOCS] Make gluten_golden_file_upload.png size small by @ulysses-you in #5436
- [GLUTEN-5341]Fix test write parquet with compression codec by @ayushi-agarwal in #5424
- [GLUTEN-5341][VL] Enable UT of GlutenExpressionMappingSuite by @gaoyangxiaozhu in #5423
- [VL][UT] Fix scalar-subquery-select.sql in spark35 by @liujiayi771 in #5425
- [GLUTEN-5341] Fix VeloxParquetWriteForHiveSuite.scala by @ayushi-agarwal in #5426
- [CORE] Fix negative buffer size by @WangGuangxin in #5441
- [GLUTEN-5307][VL] Fix Potential Overflow Issue in VeloxShuffleWriter Due to Mismatched Data Types of RowNumber by @yangzhg in #5326
- [VL] Daily Update Velox Version (2024_04_18) by @GlutenPerfBot in #5443
- [GLUTEN-5251][VL] Fix inconsistency of the default value for spark.gluten.sql.columnar.backend.velox.maxSpillFileSize by @kecookier in #5450
- [CH] Support shuffle function by @exmy in #5432
- [CH] Support expm1 function by @exmy in #5422
- [VL] Remove batch size limit by @marin-ma in #5446
- [GLUTEN-5405][CH] Add rewrite todate function by @loneylee in #5406
- [GLUTEN-5341] Fix some Spark 3.5 UTs by @yma11 in #5445
- [VL] Support regr_sxx and regr_syy aggregate functions for Spark 3.4 by @liujiayi771 in #5444
- [VL] Daily Update Velox Version (2024_04_19) by @GlutenPerfBot in #5452
- [GLUTEN-5457][CH] Fix merge cause an error log when use mergetree by @loneylee in #5458
- [VL] Rework co-fallback mechanism of bloom-filter might_contain/agg by @zhztheplayer in #5435
- [GLUTEN-5454][CH] Support delete/update/optimize/vacuum API for the MergeTree + Delta by @zzcclp in #5460
- [GLUTEN-5341] Fix and enable all ut of VeloxAggregateFunctionsSuite by @gaoyangxiaozhu in #5466
- [VL][GLUTEN-5362] Enable iceberg tpch partitioned table test by @liujiayi771 in #5373
- [GLUTEN-5341][VL] Enable linear-regression.sql in GlutenSQLQueryTestSuite for Spark 3.5 by @liujiayi771 in #5469
- [GLUTEN-5448][VL] Fix the issue of cleaning when left with previous build in velox-backends part by @Donvi in #5449
- [VL] Refine install libhdfs3 script by @Yohahaha in #5465
- [VL] Daily Update Velox Version (2024_04_21) by @zhztheplayer in #5474
- [GLUTEN-5225][CH] Add mergetree index filter on driver by @loneylee in #5308
- [GLUTEN-5341] fix fail 3.5 ut of VeloxParquetWriteSuite by @gaoyangxiaozhu in #5463
- [CH]feat: Support external sort shuffle, reduce shuffle memory usage when the number of partitions is high by @liuneng1994 in #5279
- [GLUTEN-4039][VL] Add array forall and exists function support by @lyy-pineapple in #5420
- [CH][HOTFIX] Fix compile error after merging PR#5308 by @zzcclp in #5478
- [VL] Fix negative function mapping by @zhli1142015 in #5481
- [VL] Support fallback processing of velox_bloom_filter_agg by @zhztheplayer in #5477
- [VL] Avoid using debug instance of JniWorkspace in VeloxBloomFilterTest by @zhztheplayer in #5482
- [VL][DOC] Separate udf doc by @marin-ma in #5475
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240423) by @kyligence-git in #5488
- [CH] Fix CH CI by @zhztheplayer in #5493
- [VL] Support array transform function by @Yohahaha in #5410
- [VL] Daily Update Velox Version (2024_04_23) by @GlutenPerfBot in #5489
- [CH] Speed up mergetree metadata reloading by @liuneng1994 in #5498
- [VL] Fix case-class inheritance for VeloxColumnarWriteFilesExec by @zhztheplayer in #5480
- [GLUTEN-5502][VL] UnsafeProjection is only constructed once when converting rows to columns by @lyy-pineapple in #5503
- [GLUTEN-5484] add missing tests for clickhouse by @shuai-xu in #5485
- [VL] Rename Velox backend APIs to make consistent with CH (#5464) by @yma11 in #5464
- [GLUTEN-5499][VL] Enable HighOrderFunctionSuites by @Yohahaha in #5505
- [GLUTEN-4836][VL]Add support for WindowGroupLimitExec in gluten by @ayushi-agarwal in #5398
- [VL] Refactor VeloxParquetDatasource by @gaoyangxiaozhu in #5486
- [VL] CI: Add TPC-H / TPC-DS job at SF30 with Spark 3.4 by @zhztheplayer in #5490
- [VL] CI: Minor optimizations for cache build settings by @zhztheplayer in #5517
- [VL] Daily Update Velox Version (2024_04_24) by @GlutenPerfBot in #5510
- [GLUTEN-5512][CH] Fix the incorrect transformer for the round function with the decimal data type by @zzcclp in #5513
- [GLUTEN-5341] Support iceberg bucket join for Spark3.5 by @yma11 in #5378
- [VL][DOC] Add ABFS doc by @acvictor in #5479
- [CH] Support celeborn on external sort shuffle by @liuneng1994 in #5516
- [VL] Fix Velox Parquet Write UT by @gaoyangxiaozhu in #5483
- [VL] Enable remaining passed 3.5 UT by @gaoyangxiaozhu in #5494
- [CORE] Untangle AddTransformHintRule to extract pre-validation code out by @zhztheplayer in #5514
- [VL] CI: Split SF30 job to 4 jobs to speed up execution by @zhztheplayer in #5526
- [GLUTEN-4914][CH][FOLLOWUP] Fix exceptions in ASTParser by @exmy in #5518
- [VL] No need to increment the fixedWidthIdx variable by @XinShuoWang in #5520
- [VL] Improve checkNativeWrite in VeloxParquetWriteForHiveSuite by @Zouxxyy in #5496
- [VL] Daily Update Velox Version (2024_04_25) by @GlutenPerfBot in #5522
- [VL] Remove unused variable in VeloxJniWrapper by @liujiayi771 in #5528
- [GLUTEN-5341][VL][TEST] Fix SPARK-42782: Hive compatibility check for get_json_object by @ayushi-agarwal in #5467
- [VL] Remove linking jemalloc_extension lib belonging to DuckDB build by @PHILO-HE in #5537
- [VL] Use slice instead of resize in ensureFlattened by @marin-ma in #5523
- [GLUTEN-5532] Clean up some dead code in GlutenPlugin by @ivoson in #5533
- [VL][Doc] Remove duplicate content for local cache part by @gaoyangxiaozhu in #5535
- [VL] UDF: Support variable arity in function sigatures by @marin-ma in #5495
- [GLUTEN-5461] FEAT: ColumnarArrowPythonEvalExec support for Velox backend by @yma11 in #5462
- [CORE] Upgrade Arrow to 15.0.0 by @Yohahaha in #5174
- [VL] Allow user to specify os to load corresponding third-party libraries by @ulysses-you in #5549
- [GLUTEN-4917][CI][VL] adding basic Velox unit tests by @zhouyuan in #5501
- [VL] [BUG fix] Make the hasNext method can be called multi times by @JkSelf in #5545
- [VL] Daily Update Velox Version (2024_04_26) by @GlutenPerfBot in #5543
- [GLUTEN-5547][VL] Add config to force fallback scan for timestamp type by @zml1206 in #5546
- [VL] RAS: Remove AddTransformHintRule route from EnumeratedApplier by @zhztheplayer in #5552
- [GLUTEN-5525][BUILD] Run
mvn clean
fails if a spark profile is not specified by @zhouyifan279 in #5519 - [UT] Add ignoreGluten method by @PHILO-HE in #5553
- [VL] Fix literal bound fallback for window range frame by @PHILO-HE in #5561
- [GLUTEN-5559][VL] Fix threads number setting for building gluten cpp on MacOS by @NEUpanning in #5560
- [VL] Daily Update Velox Version (2024_04_28) by @GlutenPerfBot in #5558
- [VL] Fix year_of_week function by @PHILO-HE in #5386
- [GLUTEN-5565][VL][BUILD] Fix setup-macos.sh: folly is installed twice by @zhouyifan279 in #5566
- [CORE] Fix delta.package.name error by @ulysses-you in #5564
- [CORE] Reuse broadcast exchange for different build keys with same table by @ulysses-you in #5563
- [GLUTEN-4652][VL] Fix min_by/max_by result mismatch by @yma11 in #5544
- [VL] Daily Update Velox Version (2024_04_29) by @GlutenPerfBot in #5568
- [GLUTEN-5571][DOC] Update Velox build info #5571 by @guixiaowen in #5572
- [VL][Doc] Update Velox backend support progress by @acvictor in #5578
- [VL] Bloom-filter expressions are unexpectedly fallen back by @zhztheplayer in #5579
- fix shuffle OOM if input batch is extremely large by @guhaiyan0221 in #5536
- [VL][DOC] Fix certificate expired when build on CentOS 7 by @zml1206 in #5576
- [VL] Daily Update Velox Version (2024_04_30) by @GlutenPerfBot in #5583
- [CORE] Fix gluten createOptional config contains Some by @ulysses-you in #5573
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240430) by @kyligence-git in #5585
- [VL] Daily Update Velox Version (2024_05_01) by @GlutenPerfBot in #5589
- [VL] Daily Update Velox Version (2024_05_02) by @GlutenPerfBot in #5592
- [VL] Daily Update Velox Version (2024_05_03) by @GlutenPerfBot in #5594
- [VL] Daily Update Velox Version (2024_05_04) by @GlutenPerfBot in #5596
- [GLUTEN-5586] Fix multiple generate functions failure by @marin-ma in #5587
- [VL] Daily Update Velox Version (2024_05_06) by @GlutenPerfBot in #5607
- [VL] Enable unix_millis,unix_micros,timestamp_millis,timestamp_micros functions by @zhli1142015 in #5601
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240506) by @kyligence-git in #5605
- [GLUTEN-5476][CH] Triger merge on insert task by @loneylee in #5529
- [VL] Enable arrays_zip function by @zhli1142015 in #5609
- [VL] Add more metrics for generate by @ulysses-you in #5608
- [VL][CI] Fix nightly build job by @PHILO-HE in #5562
- [VL] CI: Reformat gluten-it code with Spark331's scalafmt configuration by @zhztheplayer in #5615
- [GLUTEN-5603] Add new added Spark3.4 UTs in Gluten by @yma11 in #5604
- [VL] CI: Gluten-it: Print planning time as well as execution time in test report by @zhztheplayer in #5616
- [GLUTEN-5618][CH] Fix 'Position x is out of bound in Block' error when executing count distinct by @zzcclp in #5619
- [CORE] Only return columns of partitions that require read for iceberg by @Zouxxyy in #5624
- [VL] Enable unix_seconds Spark function by @acvictor in #5602
- [VL] Daily Update Velox Version (2024_05_07) by @GlutenPerfBot in #5628
- [GLUTEN-5611] [VL] Avoid trigger Spark memory listener when native memory request can be handled internally by @Yohahaha in #5631
- [VL] RAS: Include rewrite rules used by RewriteSparkPlanRulesManager in EnumeratedTransform by @zhztheplayer in #5575
- [GLUTEN-5580][CH]Fix cast to int exceed max by @KevinyhZou in #5581
- [GLUTEN-5622] Add new added Spark3.5 UTs in Gluten by @yma11 in #5623
- [GLUTEN-4811][VL] Abfs FileSink Onboard by @gaoyangxiaozhu in #5527
- [VL] Enable split preloading by default by @zhli1142015 in #5456
- [GLUTEN-5603] Add new added Spark3.4 UTs in Gluten for Spark3.5 by @yma11 in #5637
- [GLUTEN-4917][CI] remove miniconda folder in image by @zhouyuan in #5646
- [VL] Enable map_zip_with, zip_with functions by @zhli1142015 in #5610
- [VL] Add a bad test case that final aggregate of collect_list is fallen back while partial aggregate is not by @zhztheplayer in #5649
- [GLUTEN-5414] [VL] Support Arrow native memory pool usage track by @jinchengchenghh in #5550
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240508) by @kyligence-git in #5645
- [GLUTEN-5352][GLUTEN-5459][CH]Fix and improve year function by @KevinyhZou in #5455
- [GLUTEN-5620][CORE] Simplify Decimal process logic by @baibaichen in #5621
- [VL] Fix clang-format version by @PHILO-HE in #5650
- [VL] Add test for getbit Spark function by @acvictor in #5633
- [GLUTEN-5613][CH] Fix CH function SparkCheckoverflow return type not equals with spark by @loneylee in #5614
- [GLUTEN-5414][VL] FEAT: Support read CSV by @jinchengchenghh in #5447
- [GLUTEN-5651][CH] Fix error 'Illegal type of argument of function parseDateTimeInJodaSyntaxOrNull, expected String, got Date32' when executing to_date/to_timestamp by @zzcclp in #5652
- [VL] Generate hdfs-client.xml for libhdfs by @ulysses-you in #5661
- [GLUTEN-5639] [CH] Support spark.sql.decimalOperations.allowPrecisionLoss = true by @baibaichen in #5640
- [VL] Daily Update Velox Version (2024_05_08) by @GlutenPerfBot in #5647
- [GLUTEN-5656][CORE] Avoid executing subqueries with complex data type during validation by @zhztheplayer in #5658
- [GLUTEN-5630][VL] Decrease peak memory by taking freeBytes into account by @Yohahaha in #5635
- [VL] Substrait-to-Velox: Support nested complex type signature parsing by @zhztheplayer in #5665
- [VL][Doc] Remove spark.gluten.sql.columnar.backend.lib config from example by @acvictor in #5671
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240509) by @kyligence-git in #5666
- [VL] Defer debug log generation by @Yohahaha in #5672
- [VL] Daily Update Velox Version (2024_05_09) by @GlutenPerfBot in #5664
- [DOC]add Gluten logo by @weiting-chen in #5680
- [GLUTEN-5662][VL] Fix literal array conversion with nested empty array/map ahead of non-empty by @ivoson in #5663
- [VL] Use the default pip3.6 from the alinux3 in the build of velox by @kerwin-zk in #5676
- [GLUTEN-5673][VL] Fix arbitrator grow logic when exist concurrent memory request by @Yohahaha in #5674
- [VL] Daily Update Velox Version (2024_05_10) by @GlutenPerfBot in #5678
- [VL] Add -Wno-stringop-overflow for alinux3 by @kerwin-zk in #5686
- [GLUTEN-2620][VL] Enable compile_arrow_java by default to avoid invalid pointer error by @PHILO-HE in #5648
- [GLUTEN-4917][VL] CI adding TPCDS benchmark by @zhouyuan in #5693
- [GLUTEN-4039][VL] Add flatten function support by @ivoson in #5551
- [VL] Add InsertIntoHadoopFsRelationCommand test case for csv format by @liujiayi771 in #5681
- [VL] Rename parsePartitionAndMetadataColumns to parseColumnTypes by @gaoyangxiaozhu in #5685
- [GLUTEN-5414] [VL] Fix and enable arrow native memory pool track in CSV scan by @jinchengchenghh in #5683
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240511) by @kyligence-git in #5694
- [VL] Fix NullPointerException when collect_list / collect_set are partially fallen back by @zhztheplayer in #5655
- [GLUTEN-5599][VL] Support json_tuple by @WangGuangxin in #5600
- [VL] Fix async io coredump by @marin-ma in #5657
- [VL] Daily Update Velox Version (2024_05_11) by @GlutenPerfBot in #5695
- [VL] Daily Update Velox Version (2024_05_12) by @GlutenPerfBot in #5705
- [GLUTEN-5414] [VL] Move ArrowFileScanExec class to module backends-velox by @jinchengchenghh in #5667
- [GLUTEN-5682][VL] Fix incorrect result when isNull & isNotNull coexist in filter by @zjuwangg in #5670
- [BUILD] Remove duplicated arrow-dataset dependency from gluten-data/pom.xml by @zhouyifan279 in #5703
- [GLUTEN-5708][VL] Minor wording polishing for NewToGluten.md by @xumingming in #5707
- [CORE] Add a compilation-time check to forbid case-class inheritance by @zhztheplayer in #5723
- [VL] Add test for shuffle function by @zhli1142015 in #5722
- Revert "[CORE] Add a compilation-time check to forbid case-class inheritance " by @zhztheplayer in #5727
- [VL]: Fix VeloxColumnarWriteFilesExecwithNewChildren doesn't replace the dummy child by @zhztheplayer in #5726
- [GLUTEN-4652][VL] Fix min_by/max_by result mismatch when RDD partition num > 1 by @zhouyifan279 in #5711
- [GLUTEN-5724][VL] Remove redundant counter for calculating VeloxShuffleWriter spill time by @marin-ma in #5725
- [CORE] Add a compilation-time check to forbid case-class inheritance by @zhztheplayer in #5729
- [GLUTEN-5739][VL] Fix ShuffleReaderMetrics deserializeTime always is zero by @zjuwangg in #5738
- [GLUTEN-5620][CORE] Remove check_overflow and refactor code by @jinchengchenghh in #5654
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240514) by @kyligence-git in #5732
- [GLUTEN-5745][VL] Add more comments for GenerateRel conversion logic by @xumingming in #5746
- [VL] Daily Update Velox Version (2024_05_14) by @GlutenPerfBot in #5733
- [VL] Drop the test table after all tests in FallbackSuite by @ivoson in #5737
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240515) by @kyligence-git in #5747
- [VL] CI: Gluten-it: Fix unreadable test reporting when there are query failures by @zhztheplayer in #5753
- [VL] Fix build script in Alinux3 by @Yohahaha in #5749
- [VL] Enable GlutenParqutRowIndexSuite for Spark 3.4/3.5 by @gaoyangxiaozhu in #5740
- [GLUTEN-5731][CORE] Fix the logic to calculate rss shuffle write time by @hahazyb201 in #5742
- [VL] Move velox related configs to VeloxConfig.h by @Yohahaha in #5743
- [VL] Daily Update Velox Version (2024_05_15) by @GlutenPerfBot in #5748
- [VL] Enable length function for binary type by @zhli1142015 in #5761
- [CORE] Remove wrong comment for JoinSelectionOverrides by @zml1206 in #5730
- [GLUTEN-5696] Add preprojection support for ArrowEvalPythonExec by @yma11 in #5697
- [GLUTEN-5414] [VL] Support datasource v2 scan csv by @jinchengchenghh in #5717
- [VL] Daily Update Velox Version (2024_05_16) by @GlutenPerfBot in #5756
- [GLUTEN-5759][CORE] Optimze checkGlutenOperatorMatch to show clearer error message by @xumingming in #5760
- [VL][CI] disable Velox UT by @zhouyuan in #5780
- [VL] Refine CMAKE_CXX_FLAGS setting logic by @Yohahaha in #5769
- [GLUTEN-5438] feat: Dynamically sizing off-heap memory by @supermem613 in #5439
- [GLUTEN-5775][CELEBORN] Fix invoke celebornShuffleId exception by @onebox-li in #5776
- [GLUTEN-5777][VL] Supporting specify spark version when build by @xumingming in #5778
- [VL] Support celeborn sort based shuffle by @kerwin-zk in #5675
- [CORE] Add decimal precision tests by @ulysses-you in #5752
- [VL] Add BufferedOutputStream to track the memory usage in PrestoSerializer by @marin-ma in #5785
- [VL] Use MemConfig to replace MemConfigMutable, which makes the code cleaner and may also lead to some performance improvements. by @kecookier in #5784
- [VL][CI] Cache native libraries to re-use them in Spark test jobs by @PHILO-HE in #5768
- [CORE] Unify the transforming for shuffle expression by @exmy in #5793
- [VL] Refine evict logic in sort shuffle writer by @marin-ma in #5786
- [VL] Support simulate task spilling in GenericBenchmark by @Yohahaha in #5795
- [VL][CI] disable nightly job by @zhouyuan in #5803
- [VL] Daily Update Velox Version (2024_05_17) by @rui-mo in #5781
- [CORE]Add branch protection rule by @weiting-chen in #5808
- [CORE] ASF repo config: Set required_signatures to false by @zhztheplayer in #5810
- [CORE] Rework planner C2R / R2C code with new transition facilities by @zhztheplayer in #5767
- [GLUTEN-5741][CH] Fix core dump when executor exits by @exmy in #5787
- [CORE] Unify the aggregate function name mapping by @ulysses-you in #5809
- [GLUTEN-5792][CORE] Fix build on macOS by @xumingming in #5800
- [VL] Move memory reservation block computation logic into ListenableAllocator by @Yohahaha in #5770
- [VL] Daily Update Velox Version (2024_05_20) by @GlutenPerfBot in #5807
- [VL][Minor] Fix warnings caused by -Wunused-but-set-variable by @kecookier in #5797
- [VL] Enable rint function by @zhli1142015 in #5791
- [VL][CI] disable SF30 tpc tests on GHA by @zhouyuan in #5818
- [VL] Ensure get(GetArrayItem) function is offloaded by @Yohahaha in #5789
- [VL][DOC] Update udf doc by @marin-ma in #5814
- [VL] Daily Update Velox Version (2024_05_21) by @GlutenPerfBot in #5819
- [GLUTEN-5773][VL] Update aws-sdk-cpp version to 1.11.285 (from 1.11.169) by @yma11 in #5774
- [CORE] Refactor ExpressionTransformer by @ulysses-you in #5796
- [VL] Refactor data filter in scan transformer by @gaoyangxiaozhu in #5812
- [CORE] Remove duplicate pipeline metrics measurement by @Yohahaha in #5821
- [VL] Add config for memory pool init capacity to reduce arbitration times by @Yohahaha in #5815
- [GLUTEN-4039][VL] Implement stack function by @xumingming in #5813
- [VL] Remove unused code for sort based shuffle by @Yohahaha in #5826
- [VL] Not fallback for function spark_partition_id by @gaoyangxiaozhu in #5830
- [GLUTEN-5837][VL] Fix duplicated projection name during substrait GenerateRel conversion by @xumingming in #5838
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240522) by @kyligence-git in #5835
- [VL] RAS: Reuse same code path with heuristic planner for convention enforcement by @zhztheplayer in #5824
- [VL] Daily Update Velox Version (2024_05_22) by @GlutenPerfBot in #5834
- [GLUTEN-5832][VL] Fix build on macOS by @xumingming in #5833
- [GLUTEN-5844][CORE] Refactor the usage of spark.gluten.enabled by @sharmaplkt in #5845
- [GLUTEN-5771][VL] Add metrics for ColumnarArrowEvalPythonExec by @yma11 in #5772
- [VL] CI image update by @zhouyuan in #5842
- [VL] RAS: Optimize offload rule code to gain better compatibility with rewrite rules by @zhztheplayer in #5836
- [VL] Enable local sort-based shuffle by @marin-ma in #5811
- [GLUTEN-5757][CORE] Remove unnecessary ProjectExecTransformer for Generate by @xumingming in #5782
- [VL] Daily Update Velox Version (2024_05_23) by @rui-mo in #5847
- [GLUTEN-4917][VL][CI] update the docker image in nightly cache job by @zhouyuan in #5855
- [VL] Enable spark rand function by @gaoyangxiaozhu in #5829
- [VL] Enable NaN tests for array functions by @rui-mo in #5854
- [CORE] Remove static modifier on TreeMemoryConsumers.Factory.map by @zhztheplayer in #5849
- [VL] Spark width_bucket function support by @gaoyangxiaozhu in #5634
- [VL] Daily Update Velox Version (2024_05_24) by @GlutenPerfBot in #5860
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240524) by @kyligence-git in #5857
- [GLUTEN-5859][VL] Add support for GCS retry properties by @tigrux in #5858
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240526) by @kyligence-git in #5870
- [VL] RAS: Add config option for setting user cost model, remove fallback strategies from RAS rules list by @zhztheplayer in #5861
- [VL] Daily Update Velox Version (2024_05_27) by @GlutenPerfBot in #5872
- [VL] Allow hash on map for round robin repartitioning by @marin-ma in #5349
- [VL] Enable soundex function by @zhli1142015 in #5877
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240527) by @kyligence-git in #5871
- [VL] Enable arrays_overlap function by @zhli1142015 in #5878
- [VL] Support DecimalType for approx_count_distinct by @liujiayi771 in #5868
- [CORE] Avoid copy in ByteLiteralNode by @jinchengchenghh in #5763
- [INFRA] Do not require all conversations resolved by @ulysses-you in #5865
- [VL] Following #5861, append some nit changes by @zhztheplayer in #5873
- [CH] Add Compatibility test found by internal by @baibaichen in #5882
- [GLUTEN-5840][VL] Fix udaf register simple intermediate type by @marin-ma in #5876
- [VL][Core] SampleExec Operator Native Support by @gaoyangxiaozhu in #5856
- [VL] Include ClickBench benchmark in gluten-it by @zhztheplayer in #5887
- [CORE] Only materialize subquery before doing transform by @ulysses-you in #5862
- [VL] Fix build error by @zhli1142015 in #5891
- [VL] Upgrade folly to v2024.04.01.00 by @PHILO-HE in #5314
- [GLUTEN-5314][VL] Separate FileSink instantiation for different file systems by @PHILO-HE in #5881
- [GLUTEN-4422][CORE] Fix core dump caused by spill on closed iterator by @WangGuangxin in #5874
- [VL] Enable partial merge mode for HLL by @zhli1142015 in #5754
- [VL] Keep gluten fat jar built out previously for other Spark versions by @PHILO-HE in #5905
- [VL] Gluten-it: Improve test report table rendering by @zhztheplayer in #5889
- [VL] Daily Update Velox Version (2024_05_29) by @GlutenPerfBot in #5903
- [GLUTEN-5852] [CH] fix mismatch result columns size exception by @shuai-xu in #5853
- [GLUTEN-4917][VL] CI: update GHA docker image by @zhouyuan in #5907
- [CORE] Enable SortShuffleSuite with ColumnarShuffleManager by @acvictor in #5816
- [VL] Following #5889, correct / simplify the table indenting algorithm by @zhztheplayer in #5917
- [GLUTEN-4942][VL] refine vcpkg package script by @zhouyuan in #5900
- [CH] Adaptive sort memory controll and support memory sort shuffle by @liuneng1994 in #5893
- [GLUTEN-5656][CORE][FOLLOWUP] Support GetStructField with NullLiteralNode as subqueries not executing during validation by @jackylee-ch in #5923
- [GLUTEN-5898][CH] Fix regexp_extract function use bracket has diff behaver with spark by @loneylee in #5908
- [GLUTEN-5691][CH] Enable merge on local disk first after insert into mergetree by @loneylee in #5692
- [VL] Daily Update Velox Version (2024_05_30) by @GlutenPerfBot in #5919
- [GLUTEN-5701][VL] Add overflow test case for from_unixtime function by @NEUpanning in #5894
- [GLUTEN-5904][CH] Convert
nan
tonull
which comes fromstddev
by @lgbo-ustc in #5913 - [VL] Fix shuffle with round robin partitioning fail by @ulysses-you in #5928
- [CORE] Move driver/executor endpoint to CH backend by @ulysses-you in #5914
- [VL] Gluten-it: Optimize Maven dependency list by @zhztheplayer in #5925
- [GLUTEN-5921][CH] Function trim of trim_character support value from column by @loneylee in #5922
- [CORE] Use the smaller table to build hashmap in shuffled hash join by @zml1206 in #5750
- [VL] Fall back collect_set, min and max when input is complex type by @zhli1142015 in #5934
- [GLUTEN-5896][CH]Fix greatest diff by @KevinyhZou in #5920
- [CORE] Remove IteratorApi.genNativeFileScanRDD, both velox and clickhouse backend needn't it. by @baibaichen in #5937
- [GLUTEN-5901][CH] Support CH backend parquet + delta by @zzcclp in #5902
- [VL] Daily Update Velox Version (2024_05_31) by @GlutenPerfBot in #5931
- [GLUTEN-5944][CH] Fallback to run delta vacuum command by @zzcclp in #5945
- [GLUTEN-5939][CH] Support java timezone id named 'GMT+8' or 'GMT+08:00' by @loneylee in #5940
- [GLUTEN-5414] [VL] Support arrow csv option and schema by @jinchengchenghh in #5850
- [VL] Upgrade simdjson to 3.9.3 in vcpkg build by @PHILO-HE in #5938
- [VL] Daily Update Velox Version (2024_06_03) by @GlutenPerfBot in #5956
- [GLUTEN-5668][CH] Support mixed conditions in shuffle hash join by @lgbo-ustc in #5735
- [GLUTEN-3582] Support FLBAType and BOOLEAN by @baibaichen in #5962
- [VL] update mirror for Centos8 by @zhouyuan in #5970
- [VL] Gluten-it: Add option --scan-partitions by @zhztheplayer in #5958
- [VL] Remove reselect build side in ShuffledHashJoinExecTransformer by @zml1206 in #5935
- [VL][CI] Follow-up: update centos-8 mirror list by @PHILO-HE in #5972
- [CH] Fix left and substring with length -1 by @liuneng1994 in #5943
- [GLUTEN-5620][CH] Simplify Decimal process for Remainder(%) operator by @baibaichen in #5977
- [VL] Fix shuffle with null type failure by @ulysses-you in #5961
- [CORE] HashJoinLikeExecTransformer simpleStringWithNodeId adds buildSide info by @zml1206 in #5978
- [GLUTEN-5959] Fix function replace report an error with null value by @loneylee in #5960
- [VL] Quick fix for Uniffle CI error by @zhztheplayer in #5986
- [CORE] ExpandFallbackPolicy should propagate fallback reason to vanilla SparkPlan by @ulysses-you in #5971
- [VL] Do not skip updating children's metrics while visiting an operator with NoopMetricsUpdater by @zhztheplayer in #5933
- [VL] Add unknown type to shuffle cpp ut by @marin-ma in #5973
- [VL] Daily Update Velox Version (2024_06_04) by @GlutenPerfBot in #5968
- [GLUTEN-3582][CH] Using ParquetBlockInputFormat instead of VectorizedParquetBlockInputFormat for complex type by @baibaichen in #5995
- [CORE] Drop inputAdaptor in plan tree string by @zml1206 in #5993
- [GLUTEN-5720][VL] Enable left and right semi join type in smj by @JkSelf in #5825
- [GLUTEN-5841][CH]Fix session timezone diff by @KevinyhZou in #5892
- [GLUTEN-5957][CH]Fix get_json_object on filter condition by @KevinyhZou in #5989
- [GLUTEN-5787][CH]Make pipeline and shuffle exit gracefully when tasks in executors are killed or interrupted by @taiyang-li in #5839
- [CORE] Rename CoalesceExecTransformer to ColumnarCoalesceExec by @ulysses-you in #6000
- [VL] Handle try_subtract, try_multiply, try_divide by @zhli1142015 in #5985
- [GLUTEN-5996][CH] Fixed missing columns in join with mixed conditions by @lgbo-ustc in #5997
- [VL] Make ColumnarBatch::getRowBytes leak-safe by @zhztheplayer in #6002
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240606) by @kyligence-git in #5999
- [VL] Daily Update Velox Version (2024_06_05) by @GlutenPerfBot in #5998
- Update CPP Formatting Script by @acvictor in #6006
- [GLUTEN-5910] [CH] add custom type to ASTLiteral by @shuai-xu in #5911
- [VL] Daily Update Velox Version (2024_06_07) by @GlutenPerfBot in #6007
- [VL] Update to_utc_timestamp and from_utc_timestamp tests by @acvictor in #5358
- [CH] Disable automatic switching of sort shuffle by @liuneng1994 in #6015
- [GLUTEN-5981][CH] Make the result be null when the queried field in
get_json_object
isnull
by @lgbo-ustc in #6001 - [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240608) by @kyligence-git in #6023
- [VL] Use
mvn -ntp
for all workflow jobs by @Yohahaha in #6025 - [VL] Daily Update Velox Version (2024_06_09) by @GlutenPerfBot in #6028
- [CI][VL] Re-enable native benchmark test by @PHILO-HE in #6020
- [VL] Pass file size and modification time in split by @acvictor in #6029
- [VL] Optimize the performance of hash based shuffle by accumulating batches by @XinShuoWang in #5951
- [VL] RAS: Validate against all offloaded plan nodes to decide whether to do this offload by @zhztheplayer in #6017
- [VL] Daily Update Velox Version (2024_06_11) by @GlutenPerfBot in #6034
- [VL] Provide options to combine small batches before sending to shuffle by @zhztheplayer in #6009
- [VL] Add gluten iceberg jar to bundle package by @leoluan2009 in #6008
- [GLUTEN-5827][CH]support utc timestamp transfrom by @KevinyhZou in #5828
- [GLUTEN-5979][CH] Fix CHListenerApi initialize twice on spark local mode by @lwz9103 in #6037
- [GLUTEN-6040][CH] Fix can't not load part after restart spark session by @loneylee in #6041
- [GLUTEN-5625][VL] Support window range frame by @WangGuangxin in #5626
- [CH] add throttler to GlutenHDFSDisk by @liuneng1994 in #6046
- [CORE] Rework Gluten + DPP compatibility by @zhztheplayer in #6035
- [VL] Support Row Index Metadata Column by @gaoyangxiaozhu in #5351
- [MISC] adding discussion link in issue template by @zhouyuan in #6047
- [VL] Support PreciseTimestampConversion function by @zhli1142015 in #6036
- [VL] Daily Update Velox Version (2024_06_12) by @GlutenPerfBot in #6051
- [GLUTEN-6042][CH]Fix to_date function result type nullable check by @KevinyhZou in #6043
- [VL] Fix load udf library for spark local mode by @marin-ma in #6038
- [GLUTEN-5720][VL][FOLLOWUP][MIRROR] Fix invalid adding int to string by @jackylee-ch in #6054
- [VL] Gluten-it: Improve test report table format for parameterized test by @zhztheplayer in #6052
- [VL] Update supportColumnarShuffleExec for Velox to consider enableColumnarShuffle config by @acvictor in #6055
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240612) by @kyligence-git in #6050
- [CORE] Use SortShuffleManager instance in ColumnarShuffleManager by @acvictor in #6022
- [CORE] Rephrase metric names using "totaltime" as prefix by @zhztheplayer in #6058
- [VL] Change VLOG to DLOG in shuffle to fix performance issue in corner cases by @marin-ma in #6044
- [VL][BUILD] Improve compilation speed for Arrow by @jackylee-ch in #6061
- [VL] Daily Update Velox Version (2024_06_13) by @JkSelf in #6070
- [CH][UT] Fix UT due to ClickHouse/ClickHouse#64427 by @baibaichen in #6079
- [CI] Add CMake format check by @PHILO-HE in #5941
- [GLUTEN-5965][VL] Support pushdown "not in" to scan node by @WangGuangxin in #5966
- [VL] Fix undefined symbol with qat by @marin-ma in #6081
- [VL] Daily Update Velox Version (2024_06_14) by @GlutenPerfBot in #6084
- [VL] Fix inaccurate calculation of task slot number used by s.g.s.c.b.v.IOThreads by @zhztheplayer in #6071
- [Core] Remove getPartitionFilters from scanTransformer by @gaoyangxiaozhu in #6076
- [GLUTEN-6067][CH][Part 1] Support CH backend with Spark3.5 by @zzcclp in #6068
- [VL] Small change to always use testGluten instead of legacy old way by @gaoyangxiaozhu in #6075
- [DOC] Document how to use cmake-format in vs code by @PHILO-HE in #6089
- [VL] Minor refactors on ColumnarRuleApplier by @zhztheplayer in #6086
- [GLUTEN-6026][VL] Add Support for HiveFileFormat parquet write for Spark 3.4+ by @surnaik in #6062
- [GLUTEN-6091][CH] Avoid using LD_PRELOAD in child process by @baibaichen in #6092
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240616) by @kyligence-git in #6100
- [GLUTEN-6072][BUILD] Permission denied error run ./dev/package.sh by @zhouyifan279 in #6073
- [VL] Enable sort-based shuffle in micro benchmark by @marin-ma in #5942
- [GLUTEN-6110] Parallel run gluten ut and spark ut by @lwz9103 in #6090
- [CH] Support function base64/unbase64 by @liuneng1994 in #6077
- [GLUTEN-6067][CH][Minor] Compile Spark-3.5 ut with backends-clickhouse by @baibaichen in #6114
- [VL] Doc update since Spark 3.5.1 has been fully supported by @gaoyangxiaozhu in #6097
- [VL] Fix RowToColumn metric convert time by @jinchengchenghh in #6106
- [VL] RAS: New rule RemoveSort to remove unnecessary sorts by @zhztheplayer in #6107
- [VL] Daily Update Velox Version (2024_06_17) by @JkSelf in #6109
- [GLUTEN-6082][CH]Fix lag diff by @KevinyhZou in #6085
- [GLUTEN-6111][CH]Fix core problem of get_json_object by @KevinyhZou in #6113
- [GLUTEN-6091][CI] Disable ENABLE_GWP_ASAN by @baibaichen in #6119
- [GLUTEN-6053][CH] Move collect native metrics from last hasNext to close and cancel by @lwz9103 in #6069
- [VL] Set s.g.s.c.b.v.coalesceBatchesBeforeShuffle=true by default by @zhztheplayer in #6056
- [VL] Daily Update Velox Version (2024_06_18) by @JkSelf in #6120
- [VL] Support Spark transform_keys, transform_values function by @gaoyangxiaozhu in #6095
- [VL] [Core] Spark Input_file_name Support by @gaoyangxiaozhu in #6021
- [GLUTEN-6078][CH] Enable mergetree hdfs suite by @loneylee in #6080
- [VL] Gluten-it: Reuse Spark sessions that share same configuration by @zhztheplayer in #6117
- [VL] Support linking system libprotobuf.a when building arrow by @jackylee-ch in #6129
- [MINOR] Add spark 3.4.x and 3.5.x options in Github issue by @jackylee-ch in #6141
- [GLUTEN-6134] Polish Configuration.md by @xumingming in #6135
- [CH] support function rint by @liuneng1994 in #6121
- [VL] Daily Update Velox Version (2024_06_19) by @GlutenPerfBot in #6138
- [GLUTEN-6064][VL] Support loading shared libraries on RedHat-9 by @deepashreeraghu in #6063
- [GLUTEN-6016][CH] Add uts for decimal convert to int overflow case by @taiyang-li in #6018
- [VL] Minor command script correction in GHA CI by @zhztheplayer in #6142
- [VL] Prefer to use
path.getFileSystem
instead ofFileSystem.get
to createFileSystem
by @yikf in #6123 - Revert SortShuffleManager changes in ColumnarShuffleManager by @acvictor in #6149
- [CORE] Support JDK 11 by @surnaik in #6112
- [VL] Daily Update Velox Version (2024_06_19) by @GlutenPerfBot in #6153
- [VL] Avoid using WriteFilesSpec which is not serialzable by @jackylee-ch in #6144
- [CORE] Add custom cost evaluator for optimize buildSide of shuffled hash join by @zml1206 in #6143
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240620) by @kyligence-git in #6150
- [VL][Core] Turn off InputFileNameReplaceRule with feature flag by default by @gaoyangxiaozhu in #6161
- [CORE] Bump scalawarts version to prepare for Scala 2.13 support by @zhztheplayer in #6154
- [VL] Daily Update Velox Version (2024_06_20) by @GlutenPerfBot in #6158
- [BUILD] Syntax error when run
./dev/builddeps-veloxbe.sh --enable_s3=ON
on Ubuntu by @zhouyifan279 in #6169 - [VL] Daily Update Velox Version (2024_06_21) by @GlutenPerfBot in #6173
- [CH] support Levenshtein distance by @liuneng1994 in #6108
- [VL] Fix build package script on GHA by @zhouyuan in #5969
- [GLUTEN-4451] [CH] fix header maybe changed by FilterTransform by @shuai-xu in #6166
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240621) by @kyligence-git in #6170
- [GLUTEN-6151] Reset local property after finishing write operator by @JkSelf in #6163
- [VL] RAS: Incorporate query plan's logical link into metadata model by @zhztheplayer in #6165
- [CORE] Use sc.listFiles instead of addedFiles.keys by @ulysses-you in #6175
- [GLUTEN-6122] Fix crash when driver send shutdown command to executor by @taiyang-li in #6130
- [GLUTEN-6178][CH] Add config to insert remote file system directly by @loneylee in #6192
- [VL] Support KnownNullable for Spark 3.5 by @zhli1142015 in #6193
- [VL] Daily Update Velox Version (2024_06_24) by @GlutenPerfBot in #6187
- [GLUTEN-6067][CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform by @baibaichen in #6197
- [GLUTEN-6176][CH] Support aggreate avg return decimal by @loneylee in #6177
- [GLUTEN-5659][VL] Add more configs for AWS s3 by @yma11 in #5660
- [CH] Support flatten by @liuneng1994 in #6194
- [VL] Fix greatest and least function tests by @zhli1142015 in #6209
- [VL] Fix udf segfault for static build by @marin-ma in #6215
- [VL] Daily Update Velox Version (2024_06_25) by @marin-ma in #6204
- [GLUTEN-6219] Fix some code style issue for BasicScanExecTransformer by @xumingming in #6220
- [GLUTEN-6180][VL] Fix NPE if spilling is requested during task creation by @zhztheplayer in #6205
- [CELEBORN] Fix potential ClassNotFoundException by @kerwin-zk in #6217
- [VL] Add a benchmark to track on iterator facility's performance by @zhztheplayer in #6225
- [GLUTEN-5643] Fix the failure when the pre-project of GenerateExec falls back by @marin-ma in #6167
- [VL] Daily Update Velox Version (2024_06_26) by @marin-ma in #6223
- [GLUTEN-6208][CH] Enable more uts in GlutenStringExpressionsSuite by @taiyang-li in #6218
- [GLUTEN-6124][CH]Fix json output diff by @KevinyhZou in #6125
- [GLUTEN-6156][CH]Fix least diff by @KevinyhZou in #6155
- [VL][Minor] Fix udf jni signature mismatch by @marin-ma in #6212
- [VL] Make jni debug workspace configurable by @Yohahaha in #6228
- [VL] Link lib jemalloc produced by custom building by @PHILO-HE in #4747
- [VL] Remove the registry for Velox's prestosql scalar functions by @PHILO-HE in #5202
- [CELEBORN] Upgrade celeborn to 0.4.1 to support scala 2.13-based compilation by @kerwin-zk in #6226
- [CELEBORN] Add config to control celeborn fallback for CI by @kerwin-zk in #6230
- [VL] Remove useless function registering code by @PHILO-HE in #6245
- [GLUTEN-6235][CH] Fix crash on ExpandTransform::work() by @exmy in #6238
- [CH] Support use dynamic disk path by @liuneng1994 in #6232
- [VL] Daily Update Velox Version (2024_06_27) by @GlutenPerfBot in #6242
- [UT] Remove isVeloxBackendLoaded usage from file metadata UT by @gaoyangxiaozhu in #6249
- [Core] Log unknown fallback reason by @gaoyangxiaozhu in #6237
- [GLUTEN-2790][CH] Fix diff between ch char and spark chr by @taiyang-li in #6236
- [Core] Rename isTransformable API to maybeTransformable by @gaoyangxiaozhu in #6233
- [GLUTEN-6251][CH] Disable GlutenSortShuffleSuite in clickhouse backend by @lwz9103 in #6252
- [CH] Fix array distinct core dump by @liuneng1994 in #6256
- [GLUTEN-6257][CH] Mismatched headers in broadcast join by @lgbo-ustc in #6258
- [VL] Support building arrow CPP and finding installed arrow libs from system by @PHILO-HE in #6229
- [CORE] Remap the name of LOG/LOGARITHM by @exmy in #6266
- [VL] Daily Update Velox Version (2024_06_28) by @marin-ma in #6261
- [CORE][VL] Add OffloadProject to offload project having input_file_name's support considered by @gaoyangxiaozhu in #6200
- [GLUTEN-6253] Use internal udf config to avoid modify the original one by @marin-ma in #6255
- [CORE] Creates vanilla plan when the join operators fall back by @zml1206 in #6093
- [CELEBORN] Avoid CelebornShuffleManager#getWriter adding shuffle id repeatedly to columnarShuffleIds by @SteNicholas in #6281
- [CH] Support bit_get/bit_count function by @exmy in #5636
- [CH] Support bit_length/octet_length function by @exmy in #6259
- [CORE] Execution runtime / native memory manager refactor by @zhztheplayer in #6243
- [CELEBORN] Fix Celeborn support of get-started document by @SteNicholas in #6282
- [CI] Fix centos7 CI build error by @marin-ma in #6298
- [VL] Daily Update Velox Version (2024_06_30) by @GlutenPerfBot in #6284
- [GLUTEN-6300][CH] Ensure same hash results from NaN by @lgbo-ustc in #6301
- [VL] Disable protobuf build by default by @PHILO-HE in #6297
- [VL] Daily Update Velox Version (2024_07_02) by @GlutenPerfBot in #6303
- [GLUTEN-5248][VL] Directly pass legacySizeOfNull to native size function by @PHILO-HE in #6014
- [VL] CI: Fix CPP tests are not running by @zhztheplayer in #6295
- [VL] CI: Update job
run-tpc-test-ubuntu-oom
for latest memory usage status by @zhztheplayer in #6291 - [VL] Link lib gluten to arrow's static libraries by @PHILO-HE in #6231
- [GLUTEN-6159][CH] Support array functions with lambda functions by @lgbo-ustc in #6248
- [VL] IndicatorVectorPool to avoid sharing native columnar batches' ownerships among runtime instances by @zhztheplayer in #6293
- [CORE] Fix non-deterministic filter executed twice when push down to scan by @zml1206 in #6296
- [VL] Add isStreamingAgg info to HashAggregateTransformer by @liujiayi771 in #6307
- [GLUTEN-6279][CH] Inroduce JNI safe array by @baibaichen in #6280
- [VL] Add test for log function by @zhli1142015 in #6211
- [MINOR] ADD NOTICE by @caicancai in #6277
- [VL] bug fix for S3 read by @FelixYBW in #6313
- [GLUTEN-6067][CH] Support Spark3.5 with Scala2.13 for CH backend by @zzcclp in #6311
- [CORE] Rename TransformHint to FallbackTag by @gaoyangxiaozhu in #6254
- [CH] Support replicaterows by @liuneng1994 in #6308
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240703) by @kyligence-git in #6314
- [GLUTEN-6272] Revert #6130 by @baibaichen in #6273
- [VL] Daily Update Velox Version (2024_07_03) by @GlutenPerfBot in #6315
- [CELEBORN] Support celeborn 0.5.0 by @yikf in #6264
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240704) by @kyligence-git in #6327
- [VL] Fix build Velox script incorrectly judged as successful when run make by @j7nhai in #6331
- [GLUTEN-6334][CH] Support ntile window function by @zzcclp in #6335
- [CORE] Drop redundant partial sort which has pre-project when offload sortAgg by @zml1206 in #6294
- [VL] RAS: Remove NoopFilter if has same output with child by @zml1206 in #6324
- [GLUTEN-6333][CH] Support rangepartitioning by timestamptype by @loneylee in #6336
- [VL] Add Support for tencentos 2.4 by @zhixingheyi-tian in #5207
- [VL] Daily Update Velox Version (2024_07_04) by @GlutenPerfBot in #6328
- [GLUTEN-6159][CH] Support
array_sort
by @lgbo-ustc in #6323 - [CI] Hotfix centos7 CI failure by @marin-ma in #6340
- [VL] Deduplicate sorting keys by @ulysses-you in #6332
- [VL] Daily Update Velox Version (2024_07_05) by @GlutenPerfBot in #6339
- [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240705) by @kyligence-git in #6338
- [GLUTEN-2874][VL] support allowDecimalPrecisionLoss by @zhouyuan in #2895
- [VL] Fix row to column batch size by @jinchengchenghh in #6342
- Fix gcs build issue when vcpkg build is enabled by @PHILO-HE in #6343
- [CORE]update profile for apache release by @weiting-chen in #6349
New Contributors
- @konjac made their first contribution in #4826
- @clee704 made their first contribution in #4834
- @SteNicholas made their first contribution in #4927
- @leoluan2009 made their first contribution in #4938
- @guixiaowen made their first contribution in #5021
- @wangyum made their first contribution in #5056
- @acvictor made their first contribution in #4885
- @sujithjay made their first contribution in #5270
- @ivoson made their first contribution in #5334
- @yew1eb made their first contribution in #5394
- @Donvi made their first contribution in #5449
- @lyy-pineapple made their first contribution in #5420
- @XinShuoWang made their first contribution in #5520
- @Zouxxyy made their first contribution in #5496
- @zml1206 made their first contribution in #5546
- @zhouyifan279 made their first contribution in #5519
- @NEUpanning made their first contribution in #5560
- @hahazyb201 made their first contribution in #5742
- @supermem613 made their first contribution in #5439
- @onebox-li made their first contribution in #5776
- @sharmaplkt made their first contribution in #5845
- @deepashreeraghu made their first contribution in #6063
- @caicancai made their first contribution in #6277
Full Changelog: v1.1.1...v1.2.0-rc0