[Work in progress] Refactor to only run `get_all_models` if necessary #828

jsnb-devoted · 2024-11-27T21:05:20Z

Change description

The api request to build the whole project isn't necessary for the sql and content validations depending on the provided arguments. For the SQL validation you only need to know the list of models/explores if you provide an explore filter with a wildcard. If you provide the exact explore you can just do the SQL validation on the provided explore. For the content validation -- the models/explores are strictly used to filter the errors after the content validation has been run. If you don't provide any explores to the filters there is no need to get that full list in the first place. You can merely build the project hierarchy as you receive errors from the content validator.

Type of change

Bug fix (fixes an issue)
New feature (adds functionality)

Related issues

Closes #1

Checklists

Security

Security impact of change has been considered
Code follows security best practices and guidelines

Code review

Pull request has a descriptive title and context useful to a reviewer

…isnt complete

jsnb-devoted · 2024-11-27T21:14:58Z

@joshtemple / @DylanBaker -- This is more of a proof of concept than anything. The changes to the SQL validator are pretty straight forward. The changes to the content validator are more complicated and our project has so many errors in production its hard for me to tell if the incremental diff is working after all these changes. I'm not sure how viable this particular set of changes is but I think the concept is viable.

If the /lookml_models endpoint wasn't so slow then I would say ignore this. I have been monitoring our API calls today and I'm noticing many calls hitting the 5 minute timeout. My working theory is that the data resources that the /lookml_models endpoint hits takes a while to "compile" and cache. It's not clear to me when that compilation begins. It could be per branch - when the branch is published. It could also effectively "lazy load" that data and wait to compile it until the API is hit. I have a support ticket open with Looker but I have some doubts that I'll be able to get to the bottom of this going down that route.

DylanBaker

Thanks @jsnb-devoted. This is really interesting and appreciate you putting in the work.

I agree on the SQL side this feels like a pretty clean change.

On the content side, I think this creates a path where errors are surfaced from multiple projects. if you have project_a with explore model_a/explore_a and project_b with explore model_b/explore_b, I think you could pass in project_a with filter model_b/explore_b and it would return errors that aren't from the project you have passed in. (Obviously this input would be human error, but I'm not sure what the desired behaviour is in that scenario.) Is my understanding correct here?

I've also made a few comments in line.

DylanBaker · 2024-11-28T13:03:53Z

spectacles/runner.py

@@ -552,13 +565,21 @@ async def validate_content(
                name=self.project,
                filters=filters,
                include_all_explores=True,
+                get_full_project=get_full_project,


Where does get_full_project get defined if the if block on line 543 isn't entered?

DylanBaker · 2024-11-28T13:03:58Z

spectacles/lookml.py

+    # Indicates whether the project has all of the models/explores or just the selected ones
+    project.is_complete_project = is_complete_project


There is notionally a world where you have manually passed in every explore within a project and this is true, even though get_full_project was passed as false. This may not matter, but wanted to flag it.

jsnb-devoted · 2024-12-02T16:36:23Z

@DylanBaker - thanks for taking a look. In our current set up only the SQL validation is required for PRs to pass. Is the change to the SQL validation something you would considering merging if I broke it out into its own PR?

re: the content -- this is the area I understand much less. I think I follow that the issue has to do with imported projects. We just have the single project so I definitely didn't have the use case in mind when I was writing this up. It sounds like maybe this would work for us with the single project but there probably isn't a great path to getting that portion of this merged in right?

I'm not really sure what to do about the content check. Even if we take this off of PRs and let people run it on demand -- that UX is going to be greatly degraded by the fact that you will have to likely run it twice and wait 15-20 minutes for it to fail first before rerunning it.

jsnb-devoted · 2024-12-02T19:10:06Z

@DylanBaker / @joshtemple -- I think we have a path forward on our end that involves moving the content check out of the PRs to either an on demand check or something that runs post-merge. My understanding is that the content check effectively needs the whole project structure in order to support multiple projects. To that end I think we can close this and discuss whether the change to the SQL validator is viable:
#831

I'm going to be installing it in our CI checks via our fork. I would definitely prefer your buy in and move off of our fork.

jsnb-devoted added 23 commits November 27, 2024 17:34

add callback logging to backoff

8598638

skip lookml_models for incremental content validation

14eb8f0

add empty models

4813d2b

dont call lookml_models api for sql validator if explores are provided

52c2770

add some logging

0df4891

fix dict initialization

3dacc2f

fix models type and dont return

d8b3fb1

dont filter for models/explores in content validation if the project …

7b5127a

…isnt complete

adding log

385fd7c

another log

8caccad

fix log

ceddf54

check for default */*

4c7069d

fix conditional

7d4b973

ignore if */*

c7f0417

set is_complete_project better

18da8ff

append to content errors no matter what

7123d47

create models/explores if they dont exist

aaaef2e

make a list

f52d182

init variable reference

51f470a

remove noisy log

4bc86a7

fix header printing

b6bb970

also get partial project for the target

b5054e1

fix comment

1193f49

jsnb-devoted requested review from DylanBaker, joshtemple and Hawk94 as code owners November 27, 2024 21:05

DylanBaker reviewed Nov 28, 2024

View reviewed changes

jsnb-devoted closed this Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Work in progress] Refactor to only run `get_all_models` if necessary #828

[Work in progress] Refactor to only run `get_all_models` if necessary #828

jsnb-devoted commented Nov 27, 2024

jsnb-devoted commented Nov 27, 2024

DylanBaker left a comment

DylanBaker Nov 28, 2024

DylanBaker Nov 28, 2024

jsnb-devoted commented Dec 2, 2024

jsnb-devoted commented Dec 2, 2024

		# Indicates whether the project has all of the models/explores or just the selected ones
		project.is_complete_project = is_complete_project

[Work in progress] Refactor to only run get_all_models if necessary #828

[Work in progress] Refactor to only run get_all_models if necessary #828

Conversation

jsnb-devoted commented Nov 27, 2024

Change description

Type of change

Related issues

Checklists

Security

Code review

jsnb-devoted commented Nov 27, 2024

DylanBaker left a comment

Choose a reason for hiding this comment

DylanBaker Nov 28, 2024

Choose a reason for hiding this comment

DylanBaker Nov 28, 2024

Choose a reason for hiding this comment

jsnb-devoted commented Dec 2, 2024

jsnb-devoted commented Dec 2, 2024

[Work in progress] Refactor to only run `get_all_models` if necessary #828

[Work in progress] Refactor to only run `get_all_models` if necessary #828