See the Diligence Doer in action! Watch demo video on YouTube
Diligence Doer is an Atlassian Forge app for Jira. It works by parsing the summary of a Jira Issue for database tables or columns, then displays the other resources where those database tables or fields are being used.
Currently, those resources can come from two places: Github and Tableau.
Github
- Given a Github Repository and authentication token, Diligence Doer will return the name and link to the file(s) that contain the database table(s) in the summary of the Jira Issue.
- In the app, these files are marked with the 📄 emoji.
Tableau
- Given a Tableau Server and authentication token, Diligence Doer will return the name and link to the dashboard(s) whose datasources contain the database table(s) or field(s) in the summary of the Jira Issue.
- In the app, these dashboards are marked with the 📈 emoji.
The information displayed by Diligence Doer can be seen directly in a Jira Issue underneath the description...
and in other places an Issue may exist, like the Backlog...
If the database table referenced in the ticket is not referenced in any other resources, Diligence Doer lets you know that, too!
View SETUP.md documentation for an in depth walk through of the cloud deployment.
This project was built for the Atlassian Codegeist Hackathon 2021. If you would like to learn more about building apps with Atlassian Forge, here are some notes I took that will help you get started!
- Make account or log in
- Visit Atlassian Website
- Download Docker
- Visit Docker's website and download the Docker.dmg
- Install & Run Docker.dmg
- Make sure the Docker whale is running in the system bar
- Install Forge on MacOS
$ nvm install --lts=Erbium
$ nvm use --lts=Erbium
- Install the Forge CLI
$ npm install -g @forge/cli
- Hello World App in Jira
- Here is a quick video I took after getting the Hello World app up and running.
- Here are some of the commands you will use after installing the CLI
$ forge login
$ forge create
$ forge deploy
$ forge install
$ forge tunnel
Action may be required to customize this tool for your specific use case. In this section I will identify use cases which would require you to make code or configuration changes to this project, and point you towards the appropriate files in this repository to make those changes.
You will need to change the url endpoint to access the API for GitHub Enterprise Server. Edit the authenticate_github()
function in the authentication.py file to point to your Enterprise Account. The change you need to make is in the function's docstring.
Not all data pipelines or orchestrators use YML and certainly not all of those that do use it in the same way. The functionality to look for SQL in YML files will be useful for others that use AWS Datapipeline or Dataduct, but may be noisy feature for those that do not.
If you DO NOT have YML files that contain SQL in your repo...
You can disable this feature by changing the following line in the get_files_containing_sql()
function of the get_repository.py file.
- Change from:
if len(split_name) >= 2 and split_name[1] in ['sql', 'yml']:
- Change to:
if len(split_name) >= 2 and split_name[1] == 'sql':
If you DO have YML files that contain SQL in your repo...
- The YML parser is hardcoded to read the keys used by Dataduct. It parses all keys named
steps
that are ofstep_type: sql_command
- If you want to parse your YML files but do not use Dataduct, you may need to adjust the YML keys and their properties to match your YML structure in the parse_yml.py file.
The use of environment variables in YML files may introduce breaking characters (%
, {{
, }}
) to the PyYaml parser. Instead of letting these fail silently, which would result in none of the YML file being parsed, we have provided two solutions for this case.
If you DO NOT have YML files that contain environment variables pertinent to this project, such as schema, table, or field names, you can opt to exclude this altogether.
- To prevent the parser from finding and replacing environment variables, simply pass an empty string
''
as theenv_var_file_path
in theparse_yml()
function found in the parse_yml.py file. - Alternatively, in the
read_bytestream_to_yml()
function in the same file, you could comment out the linereplace_yml_env_vars(linted_yml_file, replace_dict)
and change the input of theyaml.safe_load()
function tostream=linted_yml_file
If you DO have YML files that contain environment variables pertinent to this project, such as schema, table, or field names, you can find and replace environment variables with their keys.
- To find and replace environment variables with their keys, provide the path to the specific YML file in repository that contains the environment variables as key value pairs in the function mentioned above.
- Add support for parsing Spark SQL and DataFrames
- Add support for other code repository hosts (BitBucket, Gitlab) and BI Tools (Looker, Power BI)
- Add support for matching individual fields
- Ingest and parse fields from Github and Tableau
- Determine best way to shape the data for this use case
- Determine best way to identify a field name separate from database table in a Jira Issue Summary
- For instance, consider the issue summary:
"Combine address1 and address2 fields in ods.customers"
- If we add databases as a source (Snowflake, Redshift, BigQuery), we could then check each word in the summary against the actual schema for the database table:
"ods.customers.Combine" "ods.customers.address1" "ods.customers.and" "ods.customers.address2" "ods.customers.fields" "ods.customers.in"
- For instance, consider the issue summary:
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.