Diligence Doer

Codegeist Hackathon 2021

See the Diligence Doer in action! Watch demo video on YouTube

Overview

Diligence Doer is an Atlassian Forge app for Jira. It works by parsing the summary of a Jira Issue for database tables or columns, then displays the other resources where those database tables or fields are being used.

Currently, those resources can come from two places: Github and Tableau.

Github

Given a Github Repository and authentication token, Diligence Doer will return the name and link to the file(s) that contain the database table(s) in the summary of the Jira Issue.
In the app, these files are marked with the 📄 emoji.

Tableau

Given a Tableau Server and authentication token, Diligence Doer will return the name and link to the dashboard(s) whose datasources contain the database table(s) or field(s) in the summary of the Jira Issue.
In the app, these dashboards are marked with the 📈 emoji.

Usage

The information displayed by Diligence Doer can be seen directly in a Jira Issue underneath the description...

and in other places an Issue may exist, like the Backlog...

If the database table referenced in the ticket is not referenced in any other resources, Diligence Doer lets you know that, too!

Getting Started

View SETUP.md documentation for an in depth walk through of the cloud deployment.

This project was built for the Atlassian Codegeist Hackathon 2021. If you would like to learn more about building apps with Atlassian Forge, here are some notes I took that will help you get started!

Atlassian Forge

Make account or log in
Visit Atlassian Website
- Getting Started with Forge
Download Docker
- Visit Docker's website and download the Docker.dmg
- Install & Run Docker.dmg
- Make sure the Docker whale is running in the system bar
Install Forge on MacOS
- $ nvm install --lts=Erbium
- $ nvm use --lts=Erbium
Install the Forge CLI
- $ npm install -g @forge/cli
Hello World App in Jira
- Here is a quick video I took after getting the Hello World app up and running.
- Here are some of the commands you will use after installing the CLI
  - $ forge login
  - $ forge create
  - $ forge deploy
  - $ forge install
  - $ forge tunnel

Use Case Specific Caveats

Action may be required to customize this tool for your specific use case. In this section I will identify use cases which would require you to make code or configuration changes to this project, and point you towards the appropriate files in this repository to make those changes.

Using an Enterprise Github account

You will need to change the url endpoint to access the API for GitHub Enterprise Server. Edit the authenticate_github() function in the authentication.py file to point to your Enterprise Account. The change you need to make is in the function's docstring.

Identifying SQL commands in YML files

Not all data pipelines or orchestrators use YML and certainly not all of those that do use it in the same way. The functionality to look for SQL in YML files will be useful for others that use AWS Datapipeline or Dataduct, but may be noisy feature for those that do not.

If you DO NOT have YML files that contain SQL in your repo... You can disable this feature by changing the following line in the get_files_containing_sql() function of the get_repository.py file.

Change from: if len(split_name) >= 2 and split_name[1] in ['sql', 'yml']:
Change to: if len(split_name) >= 2 and split_name[1] == 'sql':

If you DO have YML files that contain SQL in your repo...

The YML parser is hardcoded to read the keys used by Dataduct. It parses all keys named steps that are of step_type: sql_command
If you want to parse your YML files but do not use Dataduct, you may need to adjust the YML keys and their properties to match your YML structure in the parse_yml.py file.

Handling environment variables in YML files

The use of environment variables in YML files may introduce breaking characters (%, {{, }}) to the PyYaml parser. Instead of letting these fail silently, which would result in none of the YML file being parsed, we have provided two solutions for this case.

If you DO NOT have YML files that contain environment variables pertinent to this project, such as schema, table, or field names, you can opt to exclude this altogether.

To prevent the parser from finding and replacing environment variables, simply pass an empty string '' as the env_var_file_path in the parse_yml() function found in the parse_yml.py file.
Alternatively, in the read_bytestream_to_yml() function in the same file, you could comment out the line replace_yml_env_vars(linted_yml_file, replace_dict) and change the input of the yaml.safe_load() function to stream=linted_yml_file

If you DO have YML files that contain environment variables pertinent to this project, such as schema, table, or field names, you can find and replace environment variables with their keys.

To find and replace environment variables with their keys, provide the path to the specific YML file in repository that contains the environment variables as key value pairs in the function mentioned above.

Project To Do's:

Add support for parsing Spark SQL and DataFrames
Add support for other code repository hosts (BitBucket, Gitlab) and BI Tools (Looker, Power BI)
Add support for matching individual fields
- Ingest and parse fields from Github and Tableau
- Determine best way to shape the data for this use case
- Determine best way to identify a field name separate from database table in a Jira Issue Summary
  - For instance, consider the issue summary: "Combine address1 and address2 fields in ods.customers"
  - If we add databases as a source (Snowflake, Redshift, BigQuery), we could then check each word in the summary against the actual schema for the database table:
```
"ods.customers.Combine"
"ods.customers.address1"
"ods.customers.and"
"ods.customers.address2"
"ods.customers.fields"
"ods.customers.in"
```

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

Authors

Brian Crant | LinkedIn
John McDonald | LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Diligence Doer

Overview

Usage

Getting Started

View SETUP.md documentation for an in depth walk through of the cloud deployment.

Atlassian Forge

Use Case Specific Caveats

Using an Enterprise Github account

Identifying SQL commands in YML files

Handling environment variables in YML files

Project To Do's:

Contributing

License

Authors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Diligence Doer

Overview

Usage

Getting Started

View SETUP.md documentation for an in depth walk through of the cloud deployment.

Atlassian Forge

Use Case Specific Caveats

Using an Enterprise Github account

Identifying SQL commands in YML files

Handling environment variables in YML files

Project To Do's:

Contributing

License

Authors