Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Work in progress] Refactor to only run get_all_models if necessary #828

Conversation

jsnb-devoted
Copy link

Change description

The api request to build the whole project isn't necessary for the sql and content validations depending on the provided arguments. For the SQL validation you only need to know the list of models/explores if you provide an explore filter with a wildcard. If you provide the exact explore you can just do the SQL validation on the provided explore. For the content validation -- the models/explores are strictly used to filter the errors after the content validation has been run. If you don't provide any explores to the filters there is no need to get that full list in the first place. You can merely build the project hierarchy as you receive errors from the content validator.

Type of change

  • Bug fix (fixes an issue)
  • New feature (adds functionality)

Related issues

Closes #1

Checklists

Security

  • Security impact of change has been considered
  • Code follows security best practices and guidelines

Code review

  • Pull request has a descriptive title and context useful to a reviewer

@jsnb-devoted
Copy link
Author

@joshtemple / @DylanBaker -- This is more of a proof of concept than anything. The changes to the SQL validator are pretty straight forward. The changes to the content validator are more complicated and our project has so many errors in production its hard for me to tell if the incremental diff is working after all these changes. I'm not sure how viable this particular set of changes is but I think the concept is viable.

If the /lookml_models endpoint wasn't so slow then I would say ignore this. I have been monitoring our API calls today and I'm noticing many calls hitting the 5 minute timeout. My working theory is that the data resources that the /lookml_models endpoint hits takes a while to "compile" and cache. It's not clear to me when that compilation begins. It could be per branch - when the branch is published. It could also effectively "lazy load" that data and wait to compile it until the API is hit. I have a support ticket open with Looker but I have some doubts that I'll be able to get to the bottom of this going down that route.

Copy link
Collaborator

@DylanBaker DylanBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jsnb-devoted. This is really interesting and appreciate you putting in the work.

I agree on the SQL side this feels like a pretty clean change.

On the content side, I think this creates a path where errors are surfaced from multiple projects. if you have project_a with explore model_a/explore_a and project_b with explore model_b/explore_b, I think you could pass in project_a with filter model_b/explore_b and it would return errors that aren't from the project you have passed in. (Obviously this input would be human error, but I'm not sure what the desired behaviour is in that scenario.) Is my understanding correct here?

I've also made a few comments in line.

@@ -552,13 +565,21 @@ async def validate_content(
name=self.project,
filters=filters,
include_all_explores=True,
get_full_project=get_full_project,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does get_full_project get defined if the if block on line 543 isn't entered?

Comment on lines +595 to +596
# Indicates whether the project has all of the models/explores or just the selected ones
project.is_complete_project = is_complete_project
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is notionally a world where you have manually passed in every explore within a project and this is true, even though get_full_project was passed as false. This may not matter, but wanted to flag it.

@jsnb-devoted
Copy link
Author

@DylanBaker - thanks for taking a look. In our current set up only the SQL validation is required for PRs to pass. Is the change to the SQL validation something you would considering merging if I broke it out into its own PR?

re: the content -- this is the area I understand much less. I think I follow that the issue has to do with imported projects. We just have the single project so I definitely didn't have the use case in mind when I was writing this up. It sounds like maybe this would work for us with the single project but there probably isn't a great path to getting that portion of this merged in right?

I'm not really sure what to do about the content check. Even if we take this off of PRs and let people run it on demand -- that UX is going to be greatly degraded by the fact that you will have to likely run it twice and wait 15-20 minutes for it to fail first before rerunning it.

@jsnb-devoted
Copy link
Author

@DylanBaker / @joshtemple -- I think we have a path forward on our end that involves moving the content check out of the PRs to either an on demand check or something that runs post-merge. My understanding is that the content check effectively needs the whole project structure in order to support multiple projects. To that end I think we can close this and discuss whether the change to the SQL validator is viable:
#831

I'm going to be installing it in our CI checks via our fork. I would definitely prefer your buy in and move off of our fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants