Releases: yurkao/spark-dns
Releases · yurkao/spark-dns
Spark-DNS 1.0.3
Spark-DNS 1.0.2
- Added integration tests
- Added Spark DNS batch write support (Dataset API and Spark SQL) to publish DNS update to DNS server
Features added
Support for ignoring XFR failures
Spark-DNS
Introduction
Spark data source for retrieving DNS A
type records from DNS server.
The spark DNS data source uses zone transfers to retrieve data from DNS server.
It tries to use IXFR
for every zone transfer though some DNS server implementation may return AXFR
response.
The spark DNS data source may operate on multiple DNS zones in single data frame.
Due to nature of DNS zone transfer, data retrieval for single zone transfer cannot be done in parallel,
though data from multiple zones is retrieved in parallel (each DNS zone is handled in different Spark partition of RDD)
Rationale
- Learning Spark internals
- integrating Spark with 3rd party data sources
- Just for fun
Features and limitations
Limitations
- Providing multiple DNS servers in options for same the same dataset/table is currently not supported
- Continuous Structured Streaming is not supported yet
- On Spark 2.4 (incl CDH 6.3.x) only batch reading is supported.
Currently implemented features
- Spark batch read
- Retrieving DNS
A
records from multiple DNS zone (though from single DNS server) - New DNS SOA serial of DNS zone is available in Accumulator via Spark UI (refer to relevant stage)
- Spark Structured Streaming read support (Only trigger Once and Prcessing time is supported)
- Zone transfer timeout
- Specifying explicit zone transfer type (AXFR/IXFR) to use when retrieving data from DNS server.
- When suing
xfr=ixfr
, only DNS zone updates from initial serial will be returned.- On Structured Streaming this may produce empty DataFrames on no updates
- When using
xfr=axfr
, entire DNS zoneA
records will be returned
- When suing
- Handling temporary failures during zone transfer (similar to
failOnDataLoss
in Spark+Kafka)