-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize tf template E2E deployment time #444
Optimize tf template E2E deployment time #444
Conversation
/gcbrun |
/gcbrun |
|
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
/gcbrun |
cloudbuild.yaml
Outdated
set -e | ||
|
||
cd /workspace/modules/custom-network | ||
terraform apply \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to test network creation as part of infrastrcuture/
since that's how it'll be used for RAG, Ray, Jupyter. Does it have to be a separate step in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right, combined it into cluster creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a I might misunderstood it so want to confirm that
if we want to accelerate the deployment by paralleling cloudsql database and gke cluster creation, we have to move network creation out of infrastrcuture/
.
As cloudsql database depends on the network.
@imreddy13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewsykim this PR splits out the network creation from infra, because for RAG we need to create the cloud SQL instance in parallel to the GKE cluster (which needs a network to exist).
But for ray and jupyter, we don't need that. @yiyinglovecoding can we test both flows in our E2E?
- network created by infra for ray and jupyter applications
- network created by rag for rag application
Might need to restructure and/or add tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could add a lot of complexity. How much faster is creation due to this change? If it's not significant should we reconsider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be around at least 10mins for AP.
There has been lots of changes in the template so I just ran the
the AP creation before change is around 25mins, after change it would be 20mins.
/gcbrun |
/gcbrun |
/gcbrun |
cloudbuild.yaml
Outdated
set -e | ||
|
||
cd /workspace/modules/custom-network | ||
terraform apply \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewsykim this PR splits out the network creation from infra, because for RAG we need to create the cloud SQL instance in parallel to the GKE cluster (which needs a network to exist).
But for ray and jupyter, we don't need that. @yiyinglovecoding can we test both flows in our E2E?
- network created by infra for ray and jupyter applications
- network created by rag for rag application
Might need to restructure and/or add tests
… and only not create network when in RAG template
/gcbrun |
/gcbrun |
/gcbrun |
Closing old issues with merge conflicts. Please rebase and re-open if still relevant. |
Making terraform apply faster by paralleling creation of gke cluster and cloudsql database
custom-network
a separate module and moduleinfra
should depends oncustom-network
in ragcloudsql-secret
a separate resource that depends onnamespace
andcloudsql
It doesn't affect Jupyterhub and Ray using
Infra
module as theinfra
still create network for them. In RAG, however,custom-network
willl create network outside infra module.test:
time terrform apply
on AP clusterbefore: around 30min
after: around 20min
e2e: done