You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe that by having get_container_status return information other than pod_status, we can display more appropriate errors to the users.
Proposed Solution
This is quite simplified, but here's the idea. I'm using stringify to override methods, but there might be a better way.
k8s.py
# can not encode datetime type, define custom encoder and use itclassDatetimeJSONEncoder(json.JSONEncoder):
defdefault(self, obj):
ifisinstance(obj, datetime.datetime):
returnobj.isoformat()
returnobj@overridesdefget_container_status(self, iteration: Optional[str]) ->str:
# Locates the kernel pod using the kernel_id selector. Note that we also include 'component=kernel'# in the selector so that executor pods (when Spark is in use) are not considered.# If the phase indicates Running, the pod's IP is used for the assigned_ip.pod_status=""kernel_label_selector=f"kernel_id={self.kernel_id},component=kernel"ret=client.CoreV1Api().list_namespaced_pod(
namespace=self.kernel_namespace, label_selector=kernel_label_selector
)
ifretandret.items:
# if ret.items is not empty, then return the strigify json of the pod datapod_dict=ret.items[0].to_dict()
dump_json=json.dumps(pod_dict, cls=DatetimeJSONEncoder)
returndump_jsonelse:
self.log.warning(f"kernel server pod not found in namespace '{self.kernel_namespace}'")
return""
Additional context
This might be specific to my environment, but by setting it to wait when the k8s pod is in the ContainerCreating state or no error has occurred, and ContainersReady is false, it has started to work properly even without a kernel image puller.
This is quite simplified example code
@overridesasyncdefconfirm_remote_startup(self):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~pod_info=self.get_container_status(str(i))
# if pod_info is empty string or None, it means the container is not foundifpod_info:
pod_info_json=json.loads(pod_info)
status=pod_info_json["status"]
pod_phase=status["phase"].lower()
ifpod_phase=="running":
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~else:
if"conditions"instatus:
forconditioninstatus["conditions"]:
if"containerStatuses"instatus:
# check if the ContainerCreatingif (
status["containerStatuses"][0]["state"]["waiting"]["reason"]
=="ContainerCreating"
):
self.log.info("Container is creating ...")
continueif (
condition["type"] =="ContainersReady"andcondition["status"] !="True"
):
self.log.warning("Containers are not ready waiting 1 second.")
awaitasyncio.sleep(1)
continue
The text was updated successfully, but these errors were encountered:
This is just my opinion based on my experience. Thank you for the wonderful product :)
Problem
The current code monitoring the status of the containers only checks the container's status, resulting in error messages that are not user-friendly.
I believe that by having get_container_status return information other than pod_status, we can display more appropriate errors to the users.
Proposed Solution
This is quite simplified, but here's the idea. I'm using stringify to override methods, but there might be a better way.
k8s.py
Additional context
This might be specific to my environment, but by setting it to wait when the k8s pod is in the ContainerCreating state or no error has occurred, and ContainersReady is false, it has started to work properly even without a kernel image puller.
This is quite simplified example code
The text was updated successfully, but these errors were encountered: