ETL(Extract Transform Load) for Jenkins data.
pip install duck-jenkins
- Extract and serialize Jenkins' build information along with artefact metadata into files.
- A fix file structure can support multiple Jenkins servers.
- Support multi-branch structure
└── data
├── jenkins1.example.io
└── jenkins2.example.io
├── pipeline1
│ └── 1_info.json
└── pipeline2
└── master
├── 1_info.json
└── 1_artifact.csv
Transform all serialized data above to relational database, DuckDB.
erDiagram
Jenkins ||--o{ Job: has
Job ||--o{ Build: has
Build ||--o{ Artifact: has
Build ||--o| Jenkins_User: has
Build ||--o{ Cause: has
Build ||--o{ Parameter: has
Build ||--|| Result: has
Parameter ||--|| ParameterDictionary: has
Jenkins{
int id PK
str domain_name
}
Job{
int id PK
str name
int jenkins_id FK
}
Result{
int id PK
str name
}
Jenkins_User{
int id PK
str name
str lan_id
}
Cause{
int id PK
str category
}
Build{
int id PK
int job_id FK
int build_number
int result_id FK
int user_id FK
int trigger_type FK "Cause table's PK"
int duration
datetime timestamp
int upstream_job_id FK
int upstream_build_number
int upstream_type FK "Cause table's PK"
int previous_build_number
}
ParameterDictionary{
int id PK
str name
}
Parameter{
int build_id FK
int name_id FK
str value
}
Artifact{
int id PK
int build_id FK
str file_name
str dir
int size
datetime timestamp
}
Following examples try to emulate the file structure aboved.
Extracting a multi-branch pipeline
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull(
project_name='pipeline2/master',
build_number=1,
artifact=True
)
Let assume the upstream of pipeline2/master/1
is pipeline1/1
.
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_upstream(
project_name='pipeline2/master',
build_number=1,
artifact=False
)
from duck_jenkins import JenkinsData
jd = JenkinsData(
domain_name='jenkins1.example.io',
verify_ssl=False,
user_id='C001',
secret='elwerqoqiweucv',
data_directory='data'
)
jd.pull_previous(
project_name='pipeline2/master',
build_number=2, # build 2 is excluded from the extraction in this function.
artifact=True,
overwrite=True,
size=1 # say, you only interested 1 previous build.
)
Without transform into a database, it is useless. Following steps demostrate how to import into DuckDB.
from duck_jenkins import DuckLoader
import duckdb
db = duckdb.connect('1.ddb')
cursor = db.cursor()
dl = DuckLoader(cursor, 'data')
dl.import_into_db(
jenkins_domain_name='jenkins1.example.io',
overwrite=False # False to skip insert for existing record.
)
cursor.commit()
cursor.close()
For more usage of DuckDB
, visit the official document:
https://duckdb.org/docs/