spec: add ExitPolicy type in pod manifest. #500

yifan-gu · 2015-09-24T01:38:42Z

The optional ExitPolicy type defines the behavior of the pod when
the apps within it exit.

This PR adds 3 valid policies:

untilAll: The pod exits only when all the apps exit (no matter they
are successful or not).
onAny: The pod exits when any of the apps exit (no matter they are
successful or not).
onAnyFailure: The pod exits when any of the apps exit unsuccessfully.

yifan-gu · 2015-09-24T01:41:04Z

Let's iterate on this PR specifically for the pod's exit policy :)

/cc @iaguis @alban for the rkt related issue. Also cc @thockin @jonboulle @vbatts @xiang90 who has discussed on the original issue.

jonboulle · 2015-09-24T04:47:09Z

spec/pods.md

@@ -179,3 +180,4 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org
 * **ports** (list of objects, optional) list of ports that SHOULD be exposed on the host.
    * **name** (string, required, restricted to the [AC Name](#ac-name-type) formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.
    * **hostPort** (integer, required) port number on the host that will be mapped to the application port.
+* **exitPolicy** (string, optional) a string that specify the exit policy of the pod, valid values are "untilAll" (the pod exits only when all the apps exit, no matter they are successful or not) , "onAny" (the pod exits when any of the apps exits either successfully or unsuccessfully), and "onAnyFailure" (the pod exits when any of the pod exits unsuccessfully).


Could you break out the possible values into a sub-list, e.g. https://github.com/appc/spec/pull/500/files#diff-7bbac9ed3bbf6dbc1688669ce478f5deL174

We need to discuss a) default behaviour if this is omitted, b) whether it's required/optional by implementations. Perhaps we need a short lifecycle section in ace.md

What is current behaviour of Rocket? As far as I understand, it's defined by systemd it invokes as stage1, but I tend to get lost in systemd docs.

FWIW, current work-in-progress multi-app branch of Jetpack exits when all processes exit ("untilAll") or when Jetpack itself is expicitly killed (SIGTERM/SIGINT/SIGQUIT).

With "onAnyFailure", what happens when one of the apps exits successfully? What happens when all of them exits successfully? Is restarting the app a possibility?

With all possible combinations, I'd rather see it on app level. Say, "onExit" and "onFailure" fields that could be "nothing" (default), "restart", or "stopPod". This allows me to easily say that if my flaky webapp dies, just bring it up, but if Postgres server exits with a failure, it's probably serious and we're better off shutting down everything and waiting for somebody to inspect.

jonboulle · 2015-09-24T04:48:44Z

/cc @philips @thockin @vbatts

yifan-gu · 2015-09-24T19:50:33Z

@mpasternacki

What is current behaviour of Rocket? As far as I understand, it's defined by systemd it invokes as stage1, but I tend to get lost in systemd docs.

Currently rkt is sort like onAnyFailure, which if any apps exit with non-zero, the pod exits. Otherwise it waits for all apps, (if any app exit with zero, the other apps continue)

With "onAnyFailure", what happens when one of the apps exits successfully? What happens when all of them exits successfully? Is restarting the app a possibility?

With onAnyFailure, when one of the apps exits successfully, other apps continue to run until all of them exit successfully. The restarting is not defined in this scope.

With all possible combinations, I'd rather see it on app level. Say, "onExit" and "onFailure" fields that could be "nothing" (default), "restart", or "stopPod". This allows me to easily say that if my flaky webapp dies, just bring it up, but if Postgres server exits with a failure, it's probably serious and we're better off shutting down everything and waiting for somebody to inspect.

This sounds like a valid use case, any thoughts @jonboulle ?

jonboulle · 2015-09-24T21:09:11Z

With all possible combinations, I'd rather see it on app level. Say, "onExit" and "onFailure" fields that could be "nothing" (default), "restart", or "stopPod". This allows me to easily say that if my flaky webapp dies, just bring it up, but if Postgres server exits with a failure, it's probably serious and we're better off shutting down everything and waiting for somebody to inspect.

This is not unreasonable, I'm just a little wary of the lifecycle complexity it implies. (Actually, it should be fine to implement in rkt, but thinking about the spec more abstractly..)

The optional `ExitPolicy` type defines the behavior of the pod when the apps within it exit. This PR adds 3 valid policies: - untilAll: The pod exits only when all the apps exit (no matter they are successful or not). - onAny: The pod exits when any of the apps exit (no matter they are successful or not). - onAnyFailure: The pod exits when any of the apps exit unsuccessfully.

yifan-gu · 2015-09-24T23:30:58Z

/cc @iaguis @alban as if we are going down this way, we probably need another rework on service files... However while the app is being restarted, it's also considered stopped. I don't know if there is a way to differentiate an app being real stopped or stopped, but will be restarted

thockin · 2015-09-25T15:24:10Z

spec/pods.md

@@ -179,3 +180,7 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org
 * **ports** (list of objects, optional) list of ports that SHOULD be exposed on the host.
    * **name** (string, required, restricted to the [AC Name](#ac-name-type) formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.
    * **hostPort** (integer, required) port number on the host that will be mapped to the application port.
+* **exitPolicy** (string, optional) a string that specify the exit policy of the pod, if left empty, then it's up to ACE to choose the default behavior. Valid values are:


First: Kubernetes is assuming "untilAll", and we resisted adding this for lack of really concrete use-cases. My instinct is that it SOUNDS cool, but isn't that useful IRL. As far as I know we have no such equivalent internally. If any app container exits with failure, we know the pod is doomed to fail, but we let the other containers finish.

But I'm going to assume you have a concrete set of use-cases that justify this (you should write them down in this PR description) or else you would not be adding hypothetical complexity.

I just went to refresh on the state of the spec, and I realize there is no (what kubernetes calls) restartPolicy. Is this supposed to be an analog of that? I think it's interesting to contrast the approaches.

Kubernetes defines:

RestartAlways: Always restart app containers, regardless of exit code. The pod can only terminate in Failure if the runtime decides that it is not viable (hardware failure, machine drain, etc).

RestartOnFailure: Restart containers if and only if they exited with a non-zero code. The pod's terminal state is the worst-of any container's terminal state.

RestartNever: Never intentionally restart containers. The pod's terminal state is the worst-of any container's terminal state.

Superficially kube's RestartAlways feels the same as untilAll here. But here's the rub - the definition of untilAll doesn't actually say anything about restart. Is that part of the policy here or is that governed somewhere else that I am not seeing?

I'll not write much more now, because I have asked enough questions that I am probably attacking a straw man.

From a functional POV I think the concepts that matter to a user are "when does my container get restarted?" and "what does that mean for the fate of my pod?", but this only answers the latter, and only partially.

From an API usability POV I think it might be clearer to express these things "in the positive". E.g. I think a "RunPolicy" would be clearer (RunForever, RunToCompletion, RunOnce), and I sort of wish Kubernetes had done it that way.

@thockin Actually that's part of the plan to implement k8s' restart policy in rkt

Basically as we are using systemd to launch rkt pods, my original plan is to use systemd's restart policy, and combined with this pod exit policy.

But I see your point, and we actually shouldn't make something just to ease the implementation... I am thinking to change this to restart policy, and implement it in the runtime itself. Thanks!

Think a little bit more on this, and I found that a RestartPolicy would imply that the runtime is long running, otherwise if the runtime get's killed, nothing can enforce the restart policy (e.g. kill a pod after killing kubelet, pod is not restarted)

Any policy around exit/restart needs something to babysit, right? There's not a way (AFAIK) to tell the OS to kill process B when process A dies (short of SIGCHLD which is a stretch)

You can do that with systemd service's dependency though.

My point is if the thing(or runtime) that launches the pod is not PID1, then the restart policy will not be enforced in some cases.
But maybe that's ok for now as we can limiting the scope of the restart policy, e.g. we assume the runtime is always there, and we don't consider what if the runtime gets killed.

People can just let PID1 to monitor the runtime, when the runtime fails, we treat it in a like a machine crash, and restart the runtime anyway(which will consequently restarts the pod).

We've made the argument that "userspace is unreliable" in many arguments with Kernel folks, but their pushback (and rightly so) is "make it more reliable". There will always be corner cases, but there has to be a turtle at the bottom, and that turtle can't always be the kernel. In this case, I think kernel includes systemd - it really does fancy itself as important as the kernel.

So define the behavior you think is correct, and engineer towards a good enough answer.

Sure, I am happy with changing this to restart policy. Waiting for other maintainers' feedback.

Please note that I am not saying you should change it to be like
kubernetes. Consider it in fresh light. I thing RestartPolicy is clearer
than ExitPolicy, but I think RunPolicy might be even better.

On Fri, Sep 25, 2015 at 4:40 PM, Yifan Gu notifications@github.com wrote:

In spec/pods.md
#500 (comment):

@@ -179,3 +180,7 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org

ports (list of objects, optional) list of ports that SHOULD be exposed on the host.

name (string, required, restricted to the AC Name formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.

hostPort (integer, required) port number on the host that will be mapped to the application port.
+* exitPolicy (string, optional) a string that specify the exit policy of the pod, if left empty, then it's up to ACE to choose the default behavior. Valid values are:

Sure, I am happy with changing this to restart policy. Waiting for other
maintainers' feedback.

—
Reply to this email directly or view it on GitHub
https://github.com/appc/spec/pull/500/files#r40485441.

I'd like to have something than empty here :) Any thoughts/votes on
ExitPolicy vs RestartPolicy vs RunPolicy?
@jonboulle @vbatts @philips ?

ExecPolicy ?
On Oct 15, 2015 5:59 PM, "Yifan Gu" notifications@github.com wrote:

In spec/pods.md
#500 (comment):

@@ -179,3 +180,7 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org

ports (list of objects, optional) list of ports that SHOULD be exposed on the host.

name (string, required, restricted to the AC Name formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.

hostPort (integer, required) port number on the host that will be mapped to the application port.
+* exitPolicy (string, optional) a string that specify the exit policy of the pod, if left empty, then it's up to ACE to choose the default behavior. Valid values are:

I'd like to have something than empty here :) Any thoughts/votes on
ExitPolicy vs RestartPolicy vs RunPolicy?
@jonboulle https://github.com/jonboulle @vbatts
https://github.com/vbatts @philips https://github.com/philips ?

—
Reply to this email directly or view it on GitHub
https://github.com/appc/spec/pull/500/files#r42186397.

yifan-gu · 2015-11-18T02:23:42Z

ExecPolicy sounds good to me. How do you guys think @jonboulle @thockin @philips

yifan-gu · 2015-11-23T20:57:19Z

Ping?

yifan-gu · 2015-11-26T00:04:58Z

For refreshing the memory.
Originally, this proposal is intended for implementing the kubelet's restart policy. As if this is implemented in runtime, then we can add the Restart in the service files[1] to match the kubernetes restart policy. However there's no exponential back-off, and if the app restarts, it will cause the other apps within the pod to be restarted.

So after today's discussion with @jonboulle @philips , we planned to pull the kubernetes restart policy into spec, and let the runtime (e.g. rkt) to handle how each app restarts. Also, this implies the restart of app A should not affect the running state of app B.

[1] The service files manage the pods started by kubelet, e.g. they all have ExecStart=/bin/rkt run ${pod_id}

yifan-gu · 2015-11-30T22:43:25Z

Closed for #547

jonboulle reviewed Sep 24, 2015
View reviewed changes

yifan-gu force-pushed the exit_policy branch from 785e67d to 23ce7eb Compare September 24, 2015 21:34

yifan-gu force-pushed the exit_policy branch from 23ce7eb to ae17631 Compare September 24, 2015 21:36

thockin reviewed Sep 25, 2015
View reviewed changes

yifan-gu mentioned this pull request Sep 30, 2015

spec: define pod lifecycle #276

Open

5 tasks

jonboulle added this to the v0.7.2 milestone Oct 6, 2015

jonboulle added needs review area/spec labels Oct 6, 2015

alban mentioned this pull request Nov 24, 2015

(WIP) stage1: app exit code propagated to rkt exit code rkt/rkt#1783

Closed

yifan-gu referenced this pull request in cgonyeo/kubernetes Nov 25, 2015

rkt: rewrote GetPods to use rkt's api service

833b035

jonboulle modified the milestones: v0.8.0, v0.7.2 Nov 25, 2015

jonboulle closed this Dec 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: add ExitPolicy type in pod manifest. #500

spec: add ExitPolicy type in pod manifest. #500

yifan-gu commented Sep 24, 2015

yifan-gu commented Sep 24, 2015

jonboulle Sep 24, 2015

jonboulle Sep 24, 2015

mpasternacki Sep 24, 2015

jonboulle commented Sep 24, 2015

yifan-gu commented Sep 24, 2015

jonboulle commented Sep 24, 2015

yifan-gu commented Sep 24, 2015

thockin Sep 25, 2015

yifan-gu Sep 25, 2015

yifan-gu Sep 25, 2015

thockin Sep 25, 2015

yifan-gu Sep 25, 2015

thockin Sep 25, 2015

yifan-gu Sep 25, 2015

thockin Sep 26, 2015

yifan-gu Oct 15, 2015

vbatts Oct 15, 2015

yifan-gu commented Nov 18, 2015

yifan-gu commented Nov 23, 2015

yifan-gu commented Nov 26, 2015

yifan-gu commented Nov 30, 2015

spec: add ExitPolicy type in pod manifest. #500

spec: add ExitPolicy type in pod manifest. #500

Conversation

yifan-gu commented Sep 24, 2015

yifan-gu commented Sep 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonboulle commented Sep 24, 2015

yifan-gu commented Sep 24, 2015

jonboulle commented Sep 24, 2015

yifan-gu commented Sep 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yifan-gu commented Nov 18, 2015

yifan-gu commented Nov 23, 2015

yifan-gu commented Nov 26, 2015

yifan-gu commented Nov 30, 2015