Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to terminate stale jobs #497

Merged
merged 2 commits into from
Sep 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions vars/abortStaleJenkinsJobs.groovy
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

/** Library to fetch failing tests at the end of gradle-check run and index the results in an OpenSearch cluster.
*
* @param Map args = [:] args A map of the following parameters
* @param args.jobName <required> - The name of the jenkins job.
* @param args.lookupTime <optional> - Fetch builds from past N hours for the job, defaults to 6 hours.
*/

import java.time.Instant
import java.time.temporal.ChronoUnit
import jenkins.model.Jenkins
import hudson.model.Result

void call(Map args = [:]) {
String jobName = args.jobName.toString()
long lookupTime = isNullOrEmpty(args.lookupTime.toString()) ? 6 : Long.parseLong(args.lookupTime.toString())

if (isNullOrEmpty(jobName)) {
throw new IllegalArgumentException("Error: jobName is null or empty")
}

def currentBuildNumber = currentBuild.number
def currentBuildDescription = currentBuild.description
def endTime = Instant.now()
def startTime = endTime.minus(lookupTime, ChronoUnit.HOURS)
def startMillis = startTime.toEpochMilli()
def endMillis = endTime.toEpochMilli()

// Add sleep to let job-id get assigned to queued jobs when triggered via generic webhook url
sleep(15)

def currentJob = Jenkins.instance.getItemByFullName(jobName)

//Fetch all builds for the job based on look up time provided
def builds = currentJob.getBuilds().byTimestamp(startMillis,endMillis)
for (build in builds) {
if (build.isBuilding() && currentBuildNumber > build.number && currentBuildDescription == build.description) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we terminating purely based on build description?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK when a gradle-check is triggered on a pull request the build description remains the same for each new build of the job. Let me know if I can have a check on any other attribute?

try {
build.doStop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is doStop enough for our case as these might take long to stop.
Do we want to immediately stop it since the agents are deleted anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a graceful stop which also takes care of stopping the underlying running process.
If we use doKill then it will hard stop the pipeline and there is no guarantee that underlying gradle process would be stopped on the host.
If the agents are deleted for aborted or killed pipelines then yes we can use doKill instead of stop. Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes each agent will only run one build, then delete after.
Tho probably the existing graceful stop is enough for the time being.

println "Aborted build #${build.number} for ${build.description}"
}
catch (Exception e) {
if (build.result == Result.ABORTED) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add any checks for failed/already stopped builds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the build.isBuilding() in if condition only iterates over in-progress builds. This try/catch is to catch edge condition where two jobs are executing this block at the same time and then one might fail due to job already terminated by other one. We don't want to fail the build job if this block fails to process due to race condition.
This block just catches this condition without stopping the flow.

println "Build #${build.number} is already aborted!"
}
else {
println "Failed to abort build #${build.number}: ${e.message}"
}
}
}
}
}


boolean isNullOrEmpty(String str) { return (str == 'Null' || str == null || str.allWhitespace || str.isEmpty()) }
Loading