From e27abcd478a7322fc5e450b7407d11c5a9d1166b Mon Sep 17 00:00:00 2001 From: Balazs Meszaros Date: Fri, 26 Apr 2019 12:56:43 +0200 Subject: [PATCH] HBASE-22220 Release hbase-connectors-1.0.0 * added NOTICE.txt * generated new changelog and release notes * added license header for README.mds --- CHANGELOG.md | 93 ++++++++--- LICENSE.txt | 2 +- NOTICE.txt | 2 + README.md | 18 +++ RELEASENOTES.md | 146 +++++++++++++++--- .../main/assembly/hbase-connectors-bin.xml | 12 +- kafka/README.md | 18 +++ pom.xml | 1 - spark/README.md | 18 +++ 9 files changed, 264 insertions(+), 46 deletions(-) mode change 100755 => 100644 LICENSE.txt create mode 100644 NOTICE.txt diff --git a/CHANGELOG.md b/CHANGELOG.md index ad605207..8879d49e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,26 +1,24 @@ + + # HBase Changelog - -## Release connector-1.0.0 - Unreleased (as of 2019-04-23) +## Release connector-1.0.0 - Unreleased (as of 2019-04-26) @@ -28,24 +26,57 @@ | JIRA | Summary | Priority | Component | |:---- |:---- | :--- |:---- | -| [HBASE-15320](https://issues.apache.org/jira/browse/HBASE-15320) | HBase connector for Kafka Connect | Major | Replication | +| [HBASE-13992](https://issues.apache.org/jira/browse/HBASE-13992) | Integrate SparkOnHBase into HBase | Major | hbase-connectors, spark | +| [HBASE-14150](https://issues.apache.org/jira/browse/HBASE-14150) | Add BulkLoad functionality to HBase-Spark Module | Major | hbase-connectors, spark | +| [HBASE-14181](https://issues.apache.org/jira/browse/HBASE-14181) | Add Spark DataFrame DataSource to HBase-Spark Module | Minor | hbase-connectors, spark | +| [HBASE-14340](https://issues.apache.org/jira/browse/HBASE-14340) | Add second bulk load option to Spark Bulk Load to send puts as the value | Minor | hbase-connectors, spark | +| [HBASE-14849](https://issues.apache.org/jira/browse/HBASE-14849) | Add option to set block cache to false on SparkSQL executions | Major | hbase-connectors, spark | +| [HBASE-15572](https://issues.apache.org/jira/browse/HBASE-15572) | Adding optional timestamp semantics to HBase-Spark | Major | hbase-connectors, spark | +| [HBASE-17933](https://issues.apache.org/jira/browse/HBASE-17933) | [hbase-spark] Support Java api for bulkload | Major | hbase-connectors, spark | +| [HBASE-15320](https://issues.apache.org/jira/browse/HBASE-15320) | HBase connector for Kafka Connect | Major | hbase-connectors, Replication | ### IMPROVEMENTS: | JIRA | Summary | Priority | Component | |:---- |:---- | :--- |:---- | +| [HBASE-14515](https://issues.apache.org/jira/browse/HBASE-14515) | Allow spark module unit tests to be skipped with a profile | Minor | build, hbase-connectors, spark | +| [HBASE-14158](https://issues.apache.org/jira/browse/HBASE-14158) | Add documentation for Initial Release for HBase-Spark Module integration | Major | documentation, hbase-connectors, spark | +| [HBASE-14159](https://issues.apache.org/jira/browse/HBASE-14159) | Resolve warning introduced by HBase-Spark module | Minor | build, hbase-connectors, spark | +| [HBASE-15434](https://issues.apache.org/jira/browse/HBASE-15434) | [findbugs] Exclude scala generated source and protobuf generated code in hbase-spark module | Major | hbase-connectors, spark | +| [HBASE-16638](https://issues.apache.org/jira/browse/HBASE-16638) | Reduce the number of Connection's created in classes of hbase-spark module | Critical | hbase-connectors, spark | +| [HBASE-16823](https://issues.apache.org/jira/browse/HBASE-16823) | Add examples in HBase Spark module | Major | hbase-connectors, spark | +| [HBASE-17549](https://issues.apache.org/jira/browse/HBASE-17549) | HBase-Spark Module : Incorrect log at println and unwanted comment code | Major | hbase-connectors, spark | +| [HBASE-18176](https://issues.apache.org/jira/browse/HBASE-18176) | add enforcer rule to make sure hbase-spark / scala aren't dependencies of unexpected modules | Major | build, hbase-connectors, spark | | [HBASE-21491](https://issues.apache.org/jira/browse/HBASE-21491) | [hbase-connectors] Edit on spark connector README | Trivial | hbase-connectors | | [HBASE-21841](https://issues.apache.org/jira/browse/HBASE-21841) | Allow inserting null values throw DataSource API | Major | spark | | [HBASE-21880](https://issues.apache.org/jira/browse/HBASE-21880) | [hbase-connectors] clean up site target | Minor | hbase-connectors | | [HBASE-21842](https://issues.apache.org/jira/browse/HBASE-21842) | Properly use flatten-maven-plugin in hbase-connectors | Major | hbase-connectors | | [HBASE-21931](https://issues.apache.org/jira/browse/HBASE-21931) | [hbase-connectors] Bump surefire version | Major | hbase-connectors | +| [HBASE-14789](https://issues.apache.org/jira/browse/HBASE-14789) | Enhance the current spark-hbase connector | Major | hbase-connectors, spark | ### BUG FIXES: | JIRA | Summary | Priority | Component | |:---- |:---- | :--- |:---- | +| [HBASE-14377](https://issues.apache.org/jira/browse/HBASE-14377) | JavaHBaseContextSuite not being run | Critical | hbase-connectors, spark | +| [HBASE-14406](https://issues.apache.org/jira/browse/HBASE-14406) | The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior | Blocker | hbase-connectors, spark | +| [HBASE-15184](https://issues.apache.org/jira/browse/HBASE-15184) | SparkSQL Scan operation doesn't work on kerberos cluster | Critical | hbase-connectors, spark | +| [HBASE-16804](https://issues.apache.org/jira/browse/HBASE-16804) | JavaHBaseContext.streamBulkGet is void but should be JavaDStream | Major | hbase-connectors, spark | +| [HBASE-17547](https://issues.apache.org/jira/browse/HBASE-17547) | HBase-Spark Module : TableCatelog doesn't support multiple columns from Single Column family | Major | hbase-connectors, spark | +| [HBASE-17574](https://issues.apache.org/jira/browse/HBASE-17574) | Clean up how to run tests under hbase-spark module | Major | hbase-connectors, spark | +| [HBASE-15597](https://issues.apache.org/jira/browse/HBASE-15597) | Clean up configuration keys used in hbase-spark module | Critical | hbase-connectors, spark | +| [HBASE-17909](https://issues.apache.org/jira/browse/HBASE-17909) | Redundant exclusion of jruby-complete in pom of hbase-spark | Minor | hbase-connectors, spark | +| [HBASE-17546](https://issues.apache.org/jira/browse/HBASE-17546) | Incorrect syntax at HBase-Spark Module Examples | Minor | hbase-connectors, spark | +| [HBASE-19387](https://issues.apache.org/jira/browse/HBASE-19387) | HBase-spark snappy.SnappyError on Arm64 | Minor | hbase-connectors, spark, test | +| [HBASE-16179](https://issues.apache.org/jira/browse/HBASE-16179) | Fix compilation errors when building hbase-spark against Spark 2.0 | Critical | hbase-connectors, spark | +| [HBASE-20124](https://issues.apache.org/jira/browse/HBASE-20124) | Make hbase-spark module work with hadoop3 | Major | dependencies, hadoop3, hbase-connectors, spark | +| [HBASE-20177](https://issues.apache.org/jira/browse/HBASE-20177) | Fix warning: Class org.apache.hadoop.minikdc.MiniKdc not found in hbase-spark | Minor | hbase-connectors | +| [HBASE-20375](https://issues.apache.org/jira/browse/HBASE-20375) | Remove use of getCurrentUserCredentials in hbase-spark module | Major | hbase-connectors, spark | +| [HBASE-20880](https://issues.apache.org/jira/browse/HBASE-20880) | Fix for warning It would fail on the following input in hbase-spark | Minor | hbase-connectors | +| [HBASE-21038](https://issues.apache.org/jira/browse/HBASE-21038) | SAXParseException when hbase.spark.use.hbasecontext=false | Major | hbase-connectors | +| [HBASE-20175](https://issues.apache.org/jira/browse/HBASE-20175) | hbase-spark needs scala dependency convergance | Major | dependencies, hbase-connectors, spark | | [HBASE-21429](https://issues.apache.org/jira/browse/HBASE-21429) | [hbase-connectors] pom refactoring adding kafka dir intermediary | Minor | hbase-connectors, kafka | | [HBASE-21431](https://issues.apache.org/jira/browse/HBASE-21431) | [hbase-connectors] Fix build and test issues | Blocker | hbase-connectors | | [HBASE-21434](https://issues.apache.org/jira/browse/HBASE-21434) | [hbase-connectors] Cleanup of kafka dependencies; clarify hadoop version | Major | hbase-connectors, kafka | @@ -53,12 +84,26 @@ | [HBASE-21448](https://issues.apache.org/jira/browse/HBASE-21448) | [hbase-connectors] Make compile/tests pass on scala 2.10 AND 2.11 | Major | hbase-connectors, spark | | [HBASE-21878](https://issues.apache.org/jira/browse/HBASE-21878) | [hbase-connectors] Fix hbase-checkstyle version reference | Critical | hbase-connectors | | [HBASE-21923](https://issues.apache.org/jira/browse/HBASE-21923) | [hbase-connectors] Make apache-rat pass | Critical | hbase-connectors | +| [HBASE-21450](https://issues.apache.org/jira/browse/HBASE-21450) | [documentation] Point spark doc at hbase-connectors spark | Major | documentation, hbase-connectors, spark | + + +### TESTS: + +| JIRA | Summary | Priority | Component | +|:---- |:---- | :--- |:---- | +| [HBASE-18175](https://issues.apache.org/jira/browse/HBASE-18175) | Add hbase-spark integration test into hbase-spark-it | Critical | hbase-connectors, spark | +| [HBASE-20176](https://issues.apache.org/jira/browse/HBASE-20176) | Fix warnings about Logging import in hbase-spark test code | Minor | hbase-connectors | ### SUB-TASKS: | JIRA | Summary | Priority | Component | |:---- |:---- | :--- |:---- | +| [HBASE-15336](https://issues.apache.org/jira/browse/HBASE-15336) | Support Dataframe writer to the spark connector | Major | hbase-connectors, spark | +| [HBASE-15333](https://issues.apache.org/jira/browse/HBASE-15333) | [hbase-spark] Enhance dataframe filters to handle naively encoded short, integer, long, float and double | Major | hbase-connectors, spark | +| [HBASE-15473](https://issues.apache.org/jira/browse/HBASE-15473) | Documentation for the usage of hbase dataframe user api (JSON, Avro, etc) | Blocker | documentation, hbase-connectors, spark | +| [HBASE-19482](https://issues.apache.org/jira/browse/HBASE-19482) | Fix Checkstyle errors in hbase-spark-it | Minor | hbase-connectors | +| [HBASE-19597](https://issues.apache.org/jira/browse/HBASE-19597) | Fix Checkstyle errors in hbase-spark | Minor | hbase-connectors, spark | | [HBASE-21002](https://issues.apache.org/jira/browse/HBASE-21002) | Create assembly and scripts to start Kafka Proxy | Minor | hbase-connectors | | [HBASE-21435](https://issues.apache.org/jira/browse/HBASE-21435) | [hbase-connectors] Cleanup of kafka dependencies; clarify hadoop version; addendum | Minor | hbase-connectors, kafka | @@ -67,6 +112,10 @@ | JIRA | Summary | Priority | Component | |:---- |:---- | :--- |:---- | +| [HBASE-14184](https://issues.apache.org/jira/browse/HBASE-14184) | Fix indention and typo in JavaHBaseContext | Minor | hbase-connectors, spark | +| [HBASE-21022](https://issues.apache.org/jira/browse/HBASE-21022) | Review kafka-connection repo's POMs | Major | hbase-connectors, kafka | +| [HBASE-20257](https://issues.apache.org/jira/browse/HBASE-20257) | hbase-spark should not depend on com.google.code.findbugs.jsr305 | Minor | build, hbase-connectors, spark | +| [HBASE-21273](https://issues.apache.org/jira/browse/HBASE-21273) | Move classes out of org.apache.spark namespace | Major | hbase-connectors, spark | | [HBASE-21432](https://issues.apache.org/jira/browse/HBASE-21432) | [hbase-connectors] Add Apache Yetus integration for hbase-connectors repository | Major | build, hbase-connectors | | [HBASE-22221](https://issues.apache.org/jira/browse/HBASE-22221) | Extend kafka-proxy documentation with required hbase settings | Major | hbase-connectors | | [HBASE-22210](https://issues.apache.org/jira/browse/HBASE-22210) | Fix hbase-connectors-assembly to include every jar | Major | hbase-connectors | diff --git a/LICENSE.txt b/LICENSE.txt old mode 100755 new mode 100644 index 1db8e3cf..8467187b --- a/LICENSE.txt +++ b/LICENSE.txt @@ -187,7 +187,7 @@ same "printed page" as the copyright notice for easier identification within third-party archives. - Copyright 2018 Apache HBase + Copyright 2019 Apache HBase Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. diff --git a/NOTICE.txt b/NOTICE.txt new file mode 100644 index 00000000..1927c39c --- /dev/null +++ b/NOTICE.txt @@ -0,0 +1,2 @@ +Apache HBase - Connectors +Copyright 2019 The Apache Software Foundation diff --git a/README.md b/README.md index b0159f94..3534f511 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,21 @@ + + # hbase-connectors Connectors for [Apache HBase™](https://hbase.apache.org) diff --git a/RELEASENOTES.md b/RELEASENOTES.md index 86b0a45a..b3bbacf8 100644 --- a/RELEASENOTES.md +++ b/RELEASENOTES.md @@ -1,30 +1,116 @@ -# RELEASENOTES - +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + # HBase connector-1.0.0 Release Notes These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. +--- + +* [HBASE-13992](https://issues.apache.org/jira/browse/HBASE-13992) | *Major* | **Integrate SparkOnHBase into HBase** + +This release includes initial support for running Spark against HBase with a richer feature set than was previously possible with MapReduce bindings: + +\* Support for Spark and Spark Streaming against Spark 2.1.1 +\* RDD/DStream formation from scan operations +\* convenience methods for interacting with HBase from an HBase backed RDD / DStream instance +\* examples in both the Spark Java API and Spark Scala API +\* support for running against a secure HBase cluster + + +--- + +* [HBASE-14849](https://issues.apache.org/jira/browse/HBASE-14849) | *Major* | **Add option to set block cache to false on SparkSQL executions** + +For user configurable parameters for HBase datasources. Please refer to org.apache.hadoop.hbase.spark.datasources.HBaseSparkConf for details. + +User can either set them in SparkConf, which will take effect globally, or configure it per table, which will overwrite the value set in SparkConf. If not set, the default value will take effect. + +Currently three parameters are supported. +1. spark.hbase.blockcache.enable for blockcache enable/disable. Default is enable, but note that this potentially may slow down the system. +2. spark.hbase.cacheSize for cache size when performing HBase table scan. Default value is 1000 +3. spark.hbase.batchNum for the batch number when performing HBase table scan. Default value is 1000. + + +--- + +* [HBASE-15184](https://issues.apache.org/jira/browse/HBASE-15184) | *Critical* | **SparkSQL Scan operation doesn't work on kerberos cluster** + +Before this patch, users of the spark HBaseContext would fail due to lack of privilege exceptions. + +Note: +\* It is preferred to have spark in spark-on-yarn mode if Kerberos is used. +\* This is orthogonal to issues with a kerberized spark cluster via InputFormats. + + +--- + +* [HBASE-15572](https://issues.apache.org/jira/browse/HBASE-15572) | *Major* | **Adding optional timestamp semantics to HBase-Spark** + +Right now the timestamp is always latest. With this patch, users can select timestamps they want. +In this patch, 4 parameters, "timestamp", "minTimestamp", "maxiTimestamp" and "maxVersions" are added to HBaseSparkConf. Users can select a timestamp, they can also select a time range with minimum timestamp and maximum timestamp. + + +--- + +* [HBASE-17574](https://issues.apache.org/jira/browse/HBASE-17574) | *Major* | **Clean up how to run tests under hbase-spark module** + +Run test under root dir or hbase-spark dir +1. mvn test //run all small and medium java tests, and all scala tests +2. mvn test -P skipSparkTests //skip all scala and java tests in hbase-spark +3. mvn test -P runAllTests //run all tests, including scala and all java test even the large test + +Run specified test case, since we have two plugins, we need specify both java and scala. +When only test scala or jave test case, disable the other one use -Dxx=None as follow: +1. mvn test -Dtest=TestJavaHBaseContext -DwildcardSuites=None // java unit test +2. mvn test -Dtest=None -DwildcardSuites=org.apache.hadoop.hbase.spark.BulkLoadSuite //scala unit test, only support full name in scalatest plugin + + +--- + +* [HBASE-17933](https://issues.apache.org/jira/browse/HBASE-17933) | *Major* | **[hbase-spark] Support Java api for bulkload** + + +The integration module for Apache Spark now includes Java-friendly equivalents for the `bulkLoad` and `bulkLoadThinRows` methods in `JavaHBaseContext`. + + +--- + +* [HBASE-18175](https://issues.apache.org/jira/browse/HBASE-18175) | *Critical* | **Add hbase-spark integration test into hbase-spark-it** + + +HBase now ships with an integration test for our integration with Apache Spark. + +You can run this test on a cluster by using an equivalent to the below, e.g. if the version of HBase is 2.0.0-alpha-2 + +``` +spark-submit --class org.apache.hadoop.hbase.spark.IntegrationTestSparkBulkLoad HBASE_HOME/lib/hbase-spark-it-2.0.0-alpha-2-tests.jar -Dhbase.spark.bulkload.chainlength=500000 -m slowDeterministic +``` + + +--- + +* [HBASE-16179](https://issues.apache.org/jira/browse/HBASE-16179) | *Critical* | **Fix compilation errors when building hbase-spark against Spark 2.0** + +As of this JIRA, Spark version is upgraded from 1.6 to 2.1.1 + + --- * [HBASE-21002](https://issues.apache.org/jira/browse/HBASE-21002) | *Minor* | **Create assembly and scripts to start Kafka Proxy** @@ -46,4 +132,22 @@ Cleaned up kafka submodule dependencies. Added used dependencies to pom and remo Updates our hbase-spark integration so defaults spark 2.4.0 (October 2018) from 2.1.1 and Scala 2.11.12 (from 2.11.8). +--- + +* [HBASE-15320](https://issues.apache.org/jira/browse/HBASE-15320) | *Major* | **HBase connector for Kafka Connect** + +This commit adds a kafka connector. The connectors acts as a replication peer and sends modifications in HBase to kafka. + +For further information, please refer to kafka/README.md. + + +--- + +* [HBASE-14789](https://issues.apache.org/jira/browse/HBASE-14789) | *Major* | **Enhance the current spark-hbase connector** + +New features in hbase-spark: +\* native type support (short, int, long, float, double), +\* support for Dataframe writes, +\* avro support, +\* catalog can be defined in json. diff --git a/hbase-connectors-assembly/src/main/assembly/hbase-connectors-bin.xml b/hbase-connectors-assembly/src/main/assembly/hbase-connectors-bin.xml index 080608c2..284939df 100755 --- a/hbase-connectors-assembly/src/main/assembly/hbase-connectors-bin.xml +++ b/hbase-connectors-assembly/src/main/assembly/hbase-connectors-bin.xml @@ -49,5 +49,15 @@ - + + + + ../LICENSE.txt + / + + + ../NOTICE.txt + / + + diff --git a/kafka/README.md b/kafka/README.md index 827e86f5..02770529 100755 --- a/kafka/README.md +++ b/kafka/README.md @@ -1,3 +1,21 @@ + + # Apache HBase™ Kafka Proxy Welcome to the HBase kafka proxy. The purpose of this proxy is to act as a _fake peer_. diff --git a/pom.xml b/pom.xml index 5393b527..7ee9a610 100755 --- a/pom.xml +++ b/pom.xml @@ -348,7 +348,6 @@ .git/** **/.settings/** **/patchprocess/** - README.md **/.flattened-pom.xml **/src/main/resources/META-INF/LEGAL diff --git a/spark/README.md b/spark/README.md index eccf663b..f0199e2a 100755 --- a/spark/README.md +++ b/spark/README.md @@ -1,3 +1,21 @@ + + # Apache HBase™ Spark Connector ## Scala and Spark Versions