From 93041c837f508d7f50c1d268a4e0e7ba689e633e Mon Sep 17 00:00:00 2001
From: XuzhouQin <17144939+qxzzxq@users.noreply.github.com>
Date: Thu, 20 Aug 2020 17:07:12 +0200
Subject: [PATCH 1/5] Update Connector.md

---
 docs/Connector.md | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/docs/Connector.md b/docs/Connector.md
index 81d860fe..223758fc 100644
--- a/docs/Connector.md
+++ b/docs/Connector.md
@@ -98,6 +98,33 @@ To use `org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider`:
 | fs.s3a.secret.key | your_s3a_secret_key | 
 | fs.s3a.session.token | your_s3a_session_token | 
 
+| key | value |
+| ------ | ------ |
+| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
+| fs.s3a.access.key | your_s3a_access_key | 
+| fs.s3a.secret.key | your_s3a_secret_key | 
+| fs.s3a.session.token | your_s3a_session_token | 
+| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
+| fs.s3a.access.key | your_s3a_access_key | 
+| fs.s3a.secret.key | your_s3a_secret_key | 
+| fs.s3a.session.token | your_s3a_session_token | 
+| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
+| fs.s3a.access.key | your_s3a_access_key | 
+| fs.s3a.secret.key | your_s3a_secret_key | 
+| fs.s3a.session.token | your_s3a_session_token | 
+| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
+| fs.s3a.access.key | your_s3a_access_key | 
+| fs.s3a.secret.key | your_s3a_secret_key | 
+| fs.s3a.session.token | your_s3a_session_token | 
+| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
+| fs.s3a.access.key | your_s3a_access_key | 
+| fs.s3a.secret.key | your_s3a_secret_key | 
+| fs.s3a.session.token | your_s3a_session_token | 
+| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
+| fs.s3a.access.key | your_s3a_access_key | 
+| fs.s3a.secret.key | your_s3a_secret_key | 
+| fs.s3a.session.token | your_s3a_session_token | 
+
 To use `com.amazonaws.auth.InstanceProfileCredentialsProvider`:
 
 | key | value |

From 8a7ddfb89e55bf4900997373db218af354834c48 Mon Sep 17 00:00:00 2001
From: XuzhouQin <17144939+qxzzxq@users.noreply.github.com>
Date: Thu, 20 Aug 2020 17:10:05 +0200
Subject: [PATCH 2/5] Update Connector.md

---
 docs/Connector.md | 53 +++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/docs/Connector.md b/docs/Connector.md
index 223758fc..715fe163 100644
--- a/docs/Connector.md
+++ b/docs/Connector.md
@@ -24,9 +24,11 @@ trait Connector extends Logging {
 The **Connector** trait was inherited by two abstract classes: **FileConnector** and **DBConnector**
 
 ## Implementation
+
 [![](https://mermaid.ink/img/eyJjb2RlIjoiICBncmFwaCBURDtcblxuICBDb25uZWN0b3IgLS0-IEZpbGVDb25uZWN0b3I7XG4gIENvbm5lY3RvciAtLT4gREJDb25uZWN0b3I7XG5cbiAgRmlsZUNvbm5lY3RvciAtLT4gQ1NWQ29ubmVjdG9yO1xuICBGaWxlQ29ubmVjdG9yIC0tPiBKU09OQ29ubmVjdG9yO1xuICBDb25uZWN0b3IgLS0-IEV4Y2VsQ29ubmVjdG9yO1xuICBGaWxlQ29ubmVjdG9yIC0tPiBQYXJxdWV0Q29ubmVjdG9yO1xuXG4gIERCQ29ubmVjdG9yIC0tPiBDYXNzYW5kcmFDb25uZWN0b3I7XG4gIERCQ29ubmVjdG9yIC0tPiBEeW5hbW9EQkNvbm5lY3RvcjsiLCJtZXJtYWlkIjp7InRoZW1lIjoiZGVmYXVsdCJ9fQ)](https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoiICBncmFwaCBURDtcblxuICBDb25uZWN0b3IgLS0-IEZpbGVDb25uZWN0b3I7XG4gIENvbm5lY3RvciAtLT4gREJDb25uZWN0b3I7XG5cbiAgRmlsZUNvbm5lY3RvciAtLT4gQ1NWQ29ubmVjdG9yO1xuICBGaWxlQ29ubmVjdG9yIC0tPiBKU09OQ29ubmVjdG9yO1xuICBDb25uZWN0b3IgLS0-IEV4Y2VsQ29ubmVjdG9yO1xuICBGaWxlQ29ubmVjdG9yIC0tPiBQYXJxdWV0Q29ubmVjdG9yO1xuXG4gIERCQ29ubmVjdG9yIC0tPiBDYXNzYW5kcmFDb25uZWN0b3I7XG4gIERCQ29ubmVjdG9yIC0tPiBEeW5hbW9EQkNvbm5lY3RvcjsiLCJtZXJtYWlkIjp7InRoZW1lIjoiZGVmYXVsdCJ9fQ)
 
 ## FileConnector
+
 [**FileConnector**](https://github.com/SETL-Developers/setl/tree/master/src/main/scala/com/jcdecaux/setl/storage/connector/FileConnector.scala) could be used to access files stored in the different file systems
 
 ### Functionalities
@@ -38,38 +40,47 @@ val fileConnector = new FileConnector(spark, options)
 where `spark` is the current **SparkSession** and `options` is a `Map[String, String]` object.
 
 #### Read
+
 Read data from persistence storage. Need to be implemented in a concrete **FileConnector**.
 
 #### Write
+
 Write data to persistence storage. Need to be implemented in a concrete **FileConnector**.
 
 #### Delete
+
 Delete a file if the value of `path` defined in **options** is a file path. If `path` is a directory, then delete the directory with all its contents.
 
 Use it with care!
 
 #### Schema
+
 The schema of data could be set by adding a key `schema` into the **options** map of the constructor. The schema must be a DDL format string:
 > partition1 INT, partition2 STRING, clustering1 STRING, value LONG
 
 #### Partition
+
 Data could be partitioned before saving. To do this, call `partitionBy(columns: String*)` before `write(df)` and *Spark* will partition the *DataFrame* by creating subdirectories in the root directory.
 
 #### Suffix
+
 A suffix is similar to a partition, but it is defined manually while calling `write(df, suffix)`. **Connector** handles the suffix by creating a subdirectory with the same naming convention as Spark partition (by default it will be `_user_defined_suffix=suffix`.
 
 >:warning: Currently (v0.3), you **can't** mix with-suffix write and non-suffix write when your data are partitioned. An **IllegalArgumentException** will be thrown in this case. The reason for which it's not supported is that, as suffix is handled by *Connector* and partition is handled by *Spark*, a suffix may confuse Spark when the latter tries to infer the structure of DataFrame.
 
 
 #### Multiple files reading and name pattern matching
+
 You can read multiple files at once if the `path` you set in **options** is a directory (instead of a file path). You can also filter files by setting a regex pattern `filenamePattern` in **options**.
 
 #### File system support
+
 - Local file system
 - AWS S3
 - Hadoop File System
 
 #### S3 Authentication
+
 To access S3, if *authentication error* occurs, you may have to provide extra settings in **options** for its authentication process. There are multiple authentication methods that could be set by changing **Authentication Providers**. 
 
 To configure authentication, you can:
@@ -98,33 +109,6 @@ To use `org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider`:
 | fs.s3a.secret.key | your_s3a_secret_key | 
 | fs.s3a.session.token | your_s3a_session_token | 
 
-| key | value |
-| ------ | ------ |
-| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
-| fs.s3a.access.key | your_s3a_access_key | 
-| fs.s3a.secret.key | your_s3a_secret_key | 
-| fs.s3a.session.token | your_s3a_session_token | 
-| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
-| fs.s3a.access.key | your_s3a_access_key | 
-| fs.s3a.secret.key | your_s3a_secret_key | 
-| fs.s3a.session.token | your_s3a_session_token | 
-| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
-| fs.s3a.access.key | your_s3a_access_key | 
-| fs.s3a.secret.key | your_s3a_secret_key | 
-| fs.s3a.session.token | your_s3a_session_token | 
-| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
-| fs.s3a.access.key | your_s3a_access_key | 
-| fs.s3a.secret.key | your_s3a_secret_key | 
-| fs.s3a.session.token | your_s3a_session_token | 
-| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
-| fs.s3a.access.key | your_s3a_access_key | 
-| fs.s3a.secret.key | your_s3a_secret_key | 
-| fs.s3a.session.token | your_s3a_session_token | 
-| fs.s3a.aws.credentials.provider | org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider |
-| fs.s3a.access.key | your_s3a_access_key | 
-| fs.s3a.secret.key | your_s3a_secret_key | 
-| fs.s3a.session.token | your_s3a_session_token | 
-
 To use `com.amazonaws.auth.InstanceProfileCredentialsProvider`:
 
 | key | value |
@@ -134,14 +118,17 @@ To use `com.amazonaws.auth.InstanceProfileCredentialsProvider`:
 More information could be found [here](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#S3A_Authentication_methods)
 
 ## DBConnector
+
 [DBConnector](https://github.com/SETL-Developers/setl/tree/master/src/main/scala/com/jcdecaux/setl/storage/connector/DBConnector.scala) could be used for accessing databases.
 
 ### Functionalities
 
 #### Read
+
 Read data from a database. Need to be implemented in a concrete **DBConnector**.
 
 #### Create
+
 Create a table in a database. Need to be implemented in a concrete **DBConnector**.
 
 #### Write
@@ -153,6 +140,7 @@ Send a delete request.
 ## CSVConnector
 
 ### Options
+
 | name  | default   |
 | ------  |  ------- |
 | path |   <user_input> |
@@ -174,7 +162,9 @@ Send a delete request.
 For other options, please refer to [this doc](https://docs.databricks.com/spark/latest/data-sources/read-csv.html).
 
 ## JSONConnector
+
 ### Options
+
 | name  | default |
 | ------  |  ------- |
 | path | <user_input>  |
@@ -196,7 +186,9 @@ For other options, please refer to [this doc](https://docs.databricks.com/spark/
  
 
 ## ParquetConnector
+
 ### Options
+
 | name  | default |
 | ------  |  ------- |
 | path | <user_input>  |
@@ -205,6 +197,7 @@ For other options, please refer to [this doc](https://docs.databricks.com/spark/
 
 ## ExcelConnector
 ### Options
+
 | name  | default |
 | ------  |  ------- |
 | path | <user_input>  |
@@ -218,11 +211,12 @@ For other options, please refer to [this doc](https://docs.databricks.com/spark/
 | addColorColumns | `false` |
 | dateFormat  | `yyyy-MM-dd` |
 | timestampFormat  | `yyyy-mm-dd hh:mm:ss.000` |
-| maxRowsInMemory  | None |
+| maxRowsInMemory  | `None` |
 | excerptSize  | 10 |
-| workbookPassword  | None |
+| workbookPassword  | `None` |
 
 ## DynamoDBConnector
+
 ### Options
 
 | name  | default |
@@ -232,6 +226,7 @@ For other options, please refer to [this doc](https://docs.databricks.com/spark/
 | saveMode | <user_input>  |
 
 ## CassandraConnector
+
 ### Options
 
 | name  | default |

From 0871022079e434de57fa47d7a0ac68cf6bf821a2 Mon Sep 17 00:00:00 2001
From: XuzhouQin <17144939+qxzzxq@users.noreply.github.com>
Date: Thu, 20 Aug 2020 17:21:47 +0200
Subject: [PATCH 3/5] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 5fe0c0e9..ac8c67f8 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
 [![codecov](https://codecov.io/gh/SETL-Developers/setl/branch/master/graph/badge.svg)](https://codecov.io/gh/SETL-Developers/setl)
 [![Maven Central](https://img.shields.io/maven-central/v/com.jcdecaux.setl/setl_2.11.svg?label=Maven%20Central&color=blue)](https://mvnrepository.com/artifact/com.jcdecaux.setl/setl)
 [![javadoc](https://javadoc.io/badge2/com.jcdecaux.setl/setl_2.11/javadoc.svg)](https://javadoc.io/doc/com.jcdecaux.setl/setl_2.11)
-[![Gitter](https://badges.gitter.im/setl-by-jcdecaux/community.svg)](https://gitter.im/setl-by-jcdecaux/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
+[![documentation](https://img.shields.io/badge/docs-passing-1f425f.svg)](https://setl-developers.github.io/setl/)
 
 If you’re a **data scientist** or **data engineer**, this might sound familiar while working on an **ETL** project: 
 

From 534759f56dbab6a009cd7bdbb5a8ab96588da852 Mon Sep 17 00:00:00 2001
From: XuzhouQin <17144939+qxzzxq@users.noreply.github.com>
Date: Thu, 20 Aug 2020 17:23:09 +0200
Subject: [PATCH 4/5] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ac8c67f8..028d93bf 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 ![logo](docs/img/logo_setl.png)
 ----------
 
-![build](https://github.com/SETL-Developers/setl/workflows/build/badge.svg?branch=master)
+[![build](https://github.com/SETL-Developers/setl/workflows/build/badge.svg?branch=master)](https://github.com/SETL-Developers/setl/actions)
 [![codecov](https://codecov.io/gh/SETL-Developers/setl/branch/master/graph/badge.svg)](https://codecov.io/gh/SETL-Developers/setl)
 [![Maven Central](https://img.shields.io/maven-central/v/com.jcdecaux.setl/setl_2.11.svg?label=Maven%20Central&color=blue)](https://mvnrepository.com/artifact/com.jcdecaux.setl/setl)
 [![javadoc](https://javadoc.io/badge2/com.jcdecaux.setl/setl_2.11/javadoc.svg)](https://javadoc.io/doc/com.jcdecaux.setl/setl_2.11)

From 6819da5462d4102625c7a8eb7e63da453897d74c Mon Sep 17 00:00:00 2001
From: XuzhouQin <17144939+qxzzxq@users.noreply.github.com>
Date: Thu, 20 Aug 2020 17:32:37 +0200
Subject: [PATCH 5/5] Update README.md

---
 README.md | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 028d93bf..8e1745d8 100644
--- a/README.md
+++ b/README.md
@@ -18,9 +18,11 @@ If you’re a **data scientist** or **data engineer**, this might sound familiar
 ## Use SETL
 
 ### In a new project
+
 You can start working by cloning [this template project](https://github.com/qxzzxq/setl-template).
 
 ### In an existing project
+
 ```xml
 <dependency>
   <groupId>com.jcdecaux.setl</groupId>
@@ -48,7 +50,9 @@ To use the SNAPSHOT version, add Sonatype snapshot repository to your `pom.xml`
 ```
 
 ## Quick Start
+
 ### Basic concept
+
 With SETL, an ETL application could be represented by a `Pipeline`. A `Pipeline` contains multiple `Stages`. In each stage, we could find one or several `Factories`.
 
 The class `Factory[T]` is an abstraction of a data transformation that will produce an object of type `T`. It has 4 methods (*read*, *process*, *write* and *get*) that should be implemented by the developer.
@@ -58,6 +62,7 @@ The class `SparkRepository[T]` is a data access layer abstraction. It could be u
 The entry point of a SETL project is the object `com.jcdecaux.setl.Setl`, which will handle the pipeline and spark repository instantiation.
 
 ### Show me some code
+
 You can find the following tutorial code in [the starter template of SETL](https://github.com/qxzzxq/setl-template). Go and clone it :)
 
 Here we show a simple example of creating and saving a **Dataset[TestObject]**. The case class **TestObject** is defined as follows:
@@ -67,6 +72,7 @@ case class TestObject(partition1: Int, partition2: String, clustering1: String,
 ```
 
 #### Context initialization
+
 Suppose that we want to save our output into `src/main/resources/test_csv`. We can create a configuration file **local.conf** in `src/main/resources` with the following content that defines the target datastore to save our dataset:
 
 ```txt
@@ -92,6 +98,7 @@ setl.setSparkRepository[TestObject]("testObjectRepository")
 ```
 
 #### Implementation of Factory
+
 We will create our `Dataset[TestObject]` inside a `Factory[Dataset[TestObject]]`. A `Factory[A]` will always produce an object of type `A`, and it contains 4 abstract methods that you need to implement:
 - read
 - process
@@ -133,6 +140,7 @@ class MyFactory() extends Factory[Dataset[TestObject]] with HasSparkSession {
 ```
 
 #### Define the pipeline
+
 To execute the factory, we should add it into a pipeline.
 
 When we call `setl.newPipeline()`, **Setl** will instantiate a new **Pipeline** and configure all the registered repositories as inputs of the pipeline. Then we can call `addStage` to add our factory into the pipeline.
@@ -144,12 +152,14 @@ val pipeline = setl
 ```
 
 #### Run our pipeline
+
 ```scala
 pipeline.describe().run()
 ```
 The dataset will be saved into `src/main/resources/test_csv`
 
 #### What's more?
+
 As our `MyFactory` produces a `Dataset[TestObject]`, it can be used by other factories of the same pipeline.
 
 ```scala
@@ -180,6 +190,7 @@ pipeline.addStage[AnotherFactory]()
 ```
 
 ### Generate pipeline diagram (with v0.4.1+)
+
 You can generate a [Mermaid diagram](https://mermaid-js.github.io/mermaid/#/) by doing:
 ```scala
 pipeline.showDiagram()
@@ -264,15 +275,19 @@ You should also provide Scala and Spark in your pom file. SETL is tested against
 |     2.3       |        2.11    | :warning: see *known issues* |
 
 ## Known issues
+
 - `DynamoDBConnector` doesn't work with Spark version 2.3
 - `Compress` annotation can only be used on Struct field or Array of Struct field with Spark 2.3
 
 ## Test Coverage
-![](https://codecov.io/gh/SETL-Developers/setl/branch/master/graphs/sunburst.svg)
+
+[![coverage.svg](https://codecov.io/gh/SETL-Developers/setl/branch/master/graphs/sunburst.svg)](https://codecov.io/gh/SETL-Developers/setl)
 
 ## Documentation
+
 [https://setl-developers.github.io/setl/](https://setl-developers.github.io/setl/)
 
 ## Contributing to SETL
+
 [Check our contributing guide](https://github.com/SETL-Developers/setl/blob/master/CONTRIBUTING.md)