forked from datastax/spark-cassandra-connector
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES.txt
280 lines (251 loc) · 13.1 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
1.2.0 alpha 1
* Added support for TTL and timestamp in the writer (#153)
* Added support for UDT column types (SPARKC-1)
* Upgraded Spark to version 1.2.0 (SPARKC-15)
* For 1.2.0 release, table name with dot is not supported for Spark SQL,
it will be fixed in the next release
* Added fast spanBy and spanByKey methods to RDDs useful for grouping Cassandra
data by partition key / clustering columns. Useful for e.g. time-series data. (SPARKC-2)
* Refactored the write path so that the writes are now token-aware (SPARKC-5, previously #442)
* Added support for INSET predicate pushdown (patch by granturing)
********************************************************************************
1.1.1
* Fixed NoSuchElementException in SparkSQL predicate pushdown code (SPARKC-7, #454)
1.1.0
* Switch to java driver 2.1.3 and Guava 14.0.1 (yay!).
1.1.0 rc 3
* Fix NPE when saving CassandraRows containing null values (#446)
1.1.0 rc 2
* Added JavaTypeConverter to make is easy to implement custom TypeConverter in Java (#429)
* Fix SparkSQL failures caused by presence of non-selected columns of UDT type in the table.
1.1.0 rc 1
* Fixed problem with setting a batch size in bytes (#435)
* Fixed handling of null column values in Java API (#429)
1.1.0 beta 2
* Fixed bug in Java API which might cause ClassNotFoundException
* Added stubs for UDTs. It is possible to read tables with UDTs, but
values of UDTs will come out as java driver UDTValue objects (#374)
* Upgraded Java driver to 2.1.2 version and fixed deprecation warnings.
Use correct protocolVersion when serializing/deserializing Cassandra columns.
* Don't fail with "contact points contain multiple datacenters"
if one or more of the nodes given as contact points don't have DC information,
because they are unreachable.
* Removed annoying slf4j warnings when running tests (#395)
* CassandraRDD is fully lazy now - initialization no longer fetches Cassandra
schema (#339).
1.1.0 beta 1
* Redesigned Java API, some refactorings (#300)
* Simplified AuthConf - more responsibility on CassandraConnectionFactory
* Enhanced and improved performance of the embedded Kafka framework
- Kafka consumer and producer added that are configurable
- Kafka shutdown cleaned up
- Kafka server more configurable for speed and use cases
* Added new word count demo and a new Kafka streaming word count demo
* Modified build file to allow easier module id for usages of 'sbt project'
1.1.0 alpha 4
* Use asynchronous prefetching of multi-page ResultSets in CassandraRDD
to reduce waiting for Cassandra query results.
* Make token range start and end be parameters of the query, not part of the query
template to reduce the number of statements requiring preparation.
* Added type converter for GregorianCalendar (#334)
1.1.0 alpha 3
* Pluggable mechanism for obtaining connections to Cassandra
Ability to pass custom CassandraConnector to CassandraRDDs (#192)
* Provided a row reader which allows to create RDDs of pairs of objects as well
as RDDs of simple objects handled by type converter directly;
added meaningful compiler messages when invalid type was provided (#88)
* Fixed serialization problem in CassandraSQLContext by making conf transient (#310)
* Cleaned up the SBT assembly task and added build documentation (#315)
1.1.0 alpha 2
* Upgraded Apache Spark to 1.1.0.
* Upgraded to be Cassandra 2.1.0 and Cassandra 2.0 compatible.
* Added spark.cassandra.connection.local_dc option
* Added spark.cassandra.connection.timeout_ms option
* Added spark.cassandra.read.timeout_ms option
* Added support for SparkSQL (#197)
* Fixed problems with saving DStreams to Cassandra directly (#280)
1.1.0 alpha 1
* Add an ./sbt/sbt script (like with spark) so people don't need to install sbt
* Replace internal spark Logging with own class (#245)
* Accept partition key predicates in CassandraRDD#where. (#37)
* Add indexedColumn to ColumnDef (#122)
* Upgrade Spark to version 1.0.2
* Removed deprecated toArray, replaced with collect.
* Updated imports to org.apache.spark.streaming.receiver
and import org.apache.spark.streaming.receiver.ActorHelper
* Updated streaming demo and spec for Spark 1.0.2 behavior compatibility
* Added new StreamingEvent types for Spark 1.0.2 Receiver readiness
* Added the following Spark Streaming dependencies to the demos module:
Kafka, Twitter, ZeroMQ
* Added embedded Kafka and ZooKeeper servers for the Kafka Streaming demo
- keeping non private for user prototyping
* Added new Kafka Spark Streaming demo which reads from Kafka
and writes to Cassandra (Twitter and ZeroMQ are next)
* Added new 'embedded' module
- Refactored the 'connector' module's IT SparkRepl, CassandraServer and
CassandraServerRunner as well as 'demos' EmbeddedKafka
and EmbeddedZookeeper to the 'embedded' module. This allows the 'embedded'
module to be used as a dependency by the 'connector' IT tests, demos,
and user local quick prototyping without requiring a Spark and Cassandra
Cluster, local or remote, to get started.
********************************************************************************
1.0.6
* Upgraded Java Driver to 2.0.8 and added some logging in LocalNodeFirstLoadBalancingPolicy (SPARKC-18)
1.0.5
* Fixed setting output consistency level which was being set on prepared
statements instead of being set on batches (#463)
1.0.4
* Synchronized TypeConverter.forType methods to workaround some Scala 2.10
reflection thread-safety problems (#235)
* Synchronized computation of TypeTags in TypeConverter#targetTypeTag,
ColumnType#scalaTypeTag methods and other places to workaround some of
Scala 2.10 reflection thread-safety problems (#364)
* Downgraded Guava to version 14.
Upgraded Java driver to 2.0.7.
Upgraded Cassandra to 2.0.11. (#366)
* Made SparkContext variable transient in SparkContextFunctions (#373)
* Fixed saving to tables with uppercase column names (#377)
* Fixed saving collections of Tuple1 (#420)
1.0.3
* Fixed handling of Cassandra rpc_address set to 0.0.0.0 (#332)
1.0.2
* Fixed batch counter columns updates (#234, #316)
* Expose both rpc addresses and local addresses of cassandra nodes in partition
preferred locations (#325)
* Cleaned up the SBT assembly task and added build documentation
(backport of #315)
1.0.1
* Add logging of error message when asynchronous task fails in AsyncExecutor.
(#265)
* Fix connection problems with fetching token ranges from hosts with
rpc_address different than listen_address.
Log host address(es) and ports on connection failures.
Close thrift transport if connection fails for some reason after opening the transport,
e.g. authentication failure.
* Upgrade cassandra driver to 2.0.6.
1.0.0
* Fix memory leak in PreparedStatementCache leaking PreparedStatements after
closing Cluster objects. (#183)
* Allow multiple comma-separated hosts in spark.cassandra.connection.host
1.0.0 RC 6
* Fix reading a Cassandra table as an RDD of Scala class objects in REPL
1.0.0 RC 5
* Added assembly task to the build, in order to build fat jars. (#126)
- Added a system property flag to enable assembly for the demo module
which is disabled by default.
- Added simple submit script to submit a demo assembly jar to a local
spark master
* Fix error message on column conversion failure. (#208)
* Add toMap and nameOf methods to CassandraRow.
Reduce size of serialized CassandraRow. (#194)
* Fixed a bug which caused problems with connecting to Cassandra under
heavy load (#185)
* Skip $_outer constructor param in ReflectionColumnMapper, fixes working with
case classes in Spark shell, added appropriate test cases (#188)
* Added streaming demo with documentation, new streaming page to docs,
new README for running all demos. (#115)
1.0.0 RC 4
* Upgrade Java driver for Cassandra to 2.0.4. (#171)
* Added missing CassandraRDD#getPreferredLocations to improve data-locality. (#164)
* Don't use hosts outside the datacenter of the connection host. (#137)
1.0.0 RC 3
* Fix open Cluster leak in CassandraConnector#createSession (#142)
* TableWriter#saveToCassandra accepts ColumnSelector instead of Seq[String] for
passing a column list. Seq[String] still accepted for backwards compatibility,
but deprecated.
* Added Travis CI build yaml file.
* Added demos module. (#84)
* Extracted Java API into a separate module (#99)
1.0.0 RC 2
* Language specific highlighting in the documentation (#105)
* Fixed a bug which caused problems when a column of VarChar type was used
in where clause. (04fd8d9)
* Fixed an AnyObjectFactory bug which caused problems with instantiation of
classes which were defined inside Scala objects. (#82)
* Added support for Spark Streaming. (#89)
- Added implicit wrappers which simplify access to Cassandra related
functionality from StreamingContext and DStream.
- Added a stub for further Spark Streaming integration tests.
* Upgraded Java API. (#98)
- Refactored existing Java API
- Added CassandraJavaRDD as a JAVA counterpart of CassandraRDD
- Added Java helpers for accessing Spark Streaming related methods
- Added several integration tests
- Added a documentation page for JAVA API
- Extended Java API demo
- Added a lot of API docs
1.0.0 RC 1
* Ability to register custom TypeConverters. (#32)
* Handle null values in StringConverter. (#79)
* Improved error message when there are no replicas in the local DC. (#69)
1.0.0 beta 2
* DSE compatibility improvements. (#64)
- Column types and type converters use TypeTags instead of Strings to
announce their types.
- CassandraRDD#tableDef is public now.
- Added methods for getting keyspaces and tables by name from the Schema.
- Refactored Schema class - loading schema from Cassandra moved
from the constructor to a factory method.
- Remove unused methods for returning system keyspaces from Schema.
* Improved JavaDoc explaining CassandraConnector withClusterDo
and withSessionDo semantics.
* Support for updating counter columns. (#27)
* Configure consistency level for reads/writes. Set default consistency
levels to LOCAL_ONE for reads and writes. (#42)
* Values passed as arguments to `where` are converted to proper types
expected by the java-driver. (#26)
* Include more information in the exception message when query in
CassandraRDD fails. (#69)
* Fallback to describe_ring in case describe_local_ring does not exist to
improve compatibility with earlier Cassandra versions. (#47)
* Session object sharing in CassandraConnector. (#41 and #53)
* Modify cassandra.* configuration settings to prefix with "spark." so they
can be used from spark-shell and set via conf/spark-default.conf (#51)
* Fixed race condition in AsyncExecutor causing inaccuracy of success/failure
counters. (#40)
* Added Java API. Fixed a bug in ClassBasedRowReader which caused
problems when data were read into Java beans. Added type converters
for boxed Java primitive types. (#11)
* Extracted out initial testkit for unit and integration tests, and future
testkit module.
* Added new WritableToCassandra trait which both RDDFunction and
DStreamFunction both implement. Documentation moved to WritableToCassandra.
* Fixed broken links in API documentation.
* Refactored RDDFunctions and DStreamFunctions - merged saveToCassandra
overloaded methods into a single method with defaults.
1.0.0 beta 1
* CassandraRDD#createStatement doesn't obtain a new session, but reuses
the task's Session.
* Integration tests. (#12)
* Added contains and indexOf methods to CassandraRow. Missing value from
CassandraRow does not break writing - null is written instead.
* Caching of PreparedStatements. Subsequent preparations of the same
PreparedStatement are returned from the cache and don't cause
a warning. (#3)
* Move partitioner ForkJoinPool to companion object to share it between RDD's.
(#24)
* Fixed thread-safety of ClassBasedRowReader.
* Detailed user guide with code examples, reviewed by Kris Hahn. (#15)
* Support for saving RDD[CassandraRow]. New demo program copying data from one
table to another. (#16)
* Using a PreparedStatement make createStatement method compatible with
Cassandra 1.2.x. (#17)
* More and better logging. Using org.apache.spark.Logging instead of log4j.
(#13)
* Better error message when attempting to write to a table that doesn't exist.
(#1)
* Added more robust scala build to allow for future clean releases, and
publish settings for later integration. (#8)
* Refactored classes and objects used for authentication to support pluggable
authentication.
* Record cause of TypeConversionException.
* Improved error messages informing about failure to convert column value.
Fixed missing conversion for setters.
* Split CassandraWriter into RowWriter and TableWriter.
* Refactored package structure. Moved classes from rdd to rdd.reader
and rdd.partitioner packages. Renamed RowTransformers to RowReaders.
* Fix writing ByteBuffers to Cassandra.
* Throw meaningful exception when non-existing column is requested by name.
* Add isNull method on CassandraRow.
* Fix converting blobs to arrays of bytes in CassandraRow. Fix printing blobs
and collections.