Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve K8s Comments (Part 2) #197

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TaekyungHeo
Copy link
Member

@TaekyungHeo TaekyungHeo commented Sep 17, 2024

Summary

This PR relies on #195.

Resolve K8s comments in #180

Proposal to Refactor KubernetesSystem by Introducing Manager Classes

1. Background

  • The current KubernetesSystem class interacts directly with three Kubernetes APIs (CoreV1Api, BatchV1Api, and CustomObjectsApi).
  • All these API interactions are handled within the KubernetesSystem class itself, which can lead to a bloated design as more functionalities are added.

2. Problem

  • Monolithic Design: All Kubernetes API interactions are handled in a single class, which can become difficult to manage and maintain over time.
  • Reduced Modularity: Since all API interactions are in one place, it's harder to isolate changes to specific Kubernetes API interactions.
  • Complexity in Testing: Testing individual API interactions requires mocking a single class that handles everything, reducing flexibility in unit testing.

3. Proposed Solution

  • Introduce Manager Classes: Refactor the KubernetesSystem class by delegating specific API interactions to separate manager classes, where each manager class is responsible for interacting with one API.
    • CoreV1APIManager: Handles interactions with the CoreV1Api, such as managing pods, namespaces, and nodes.
    • BatchAPIManager: Manages batch jobs via the BatchV1Api.
    • CustomObjectsAPIManager: Manages custom objects such as mpi_job via the CustomObjectsApi.

4. Benefits

  • Modularity: By delegating responsibilities to specific manager classes, the system becomes more modular and easier to maintain. Each manager class will handle a specific subset of Kubernetes API interactions.
  • Improved Readability: API interactions are now organized in separate classes, improving the readability of the code and reducing the cognitive load for developers.
  • Better Testing: Each manager class can be tested independently, allowing for more granular unit tests, making the system more robust and maintainable.
  • Extensibility: New Kubernetes APIs can be easily integrated by adding new manager classes without disrupting existing functionality.

5. API Manager Classes

5.1 CoreV1APIManager
  • Purpose: Handles operations related to the Kubernetes CoreV1 API (e.g., pods, namespaces, nodes).
  • Key Methods:
    • list_pods: Lists pods in a namespace.
    • create_namespace: Creates a new namespace.
    • delete_namespace: Deletes an existing namespace.
    • patch_node: Updates a node by applying a patch.
class CoreV1APIManager:
    """Handles Kubernetes Core V1 API interactions (pods, namespaces, nodes)."""
    
    def __init__(self):
        self.api = client.CoreV1Api()

    def list_pods(self, namespace: str) -> List[client.V1Pod]:
        return self.api.list_namespaced_pod(namespace=namespace).items

    def patch_node(self, node: str, body: Dict):
        self.api.patch_node(name=node, body=body)

    def create_namespace(self, namespace: str):
        body = client.V1Namespace(metadata=client.V1ObjectMeta(name=namespace))
        self.api.create_namespace(body=body)

    def delete_namespace(self, namespace: str):
        self.api.delete_namespace(name=namespace, body=client.V1DeleteOptions())

5.2 BatchAPIManager
  • Purpose: Manages batch job operations using the BatchV1Api.
  • Key Methods:
    • create_job: Submits a new batch job.
    • delete_job: Deletes an existing batch job.
    • read_job_status: Reads the status of a batch job.
class BatchAPIManager:
    """Handles Kubernetes Batch V1 API interactions (batch jobs)."""

    def __init__(self):
        self.api = client.BatchV1Api()

    def create_job(self, job_spec: Dict[Any, Any], namespace: str) -> client.V1Job:
        return self.api.create_namespaced_job(body=job_spec, namespace=namespace)

    def delete_job(self, job_name: str, namespace: str):
        return self.api.delete_namespaced_job(
            name=job_name,
            namespace=namespace,
            body=V1DeleteOptions(propagation_policy="Foreground", grace_period_seconds=5),
        )

    def read_job_status(self, job_name: str, namespace: str) -> client.V1Job:
        return self.api.read_namespaced_job_status(name=job_name, namespace=namespace)

5.3 CustomObjectsAPIManager
  • Purpose: Manages custom objects like mpi_job using the CustomObjectsApi.
  • Key Methods:
    • create_mpi_job: Submits a new MPI job.
    • delete_mpi_job: Deletes an existing MPI job.
    • get_mpi_job_status: Gets the status of a specific MPI job.
class CustomObjectsAPIManager:
    """Handles Kubernetes Custom Objects API interactions (MPIJobs)."""
    
    def __init__(self):
        self.api = client.CustomObjectsApi()

    def create_mpi_job(self, job_spec: Dict[Any, Any], namespace: str) -> Dict:
        return self.api.create_namespaced_custom_object(
            group="kubeflow.org", version="v2beta1", namespace=namespace, plural="mpi_jobs", body=job_spec
        )

    def delete_mpi_job(self, job_name: str, namespace: str):
        return self.api.delete_namespaced_custom_object(
            group="kubeflow.org", version="v2beta1", namespace=namespace, plural="mpi_jobs", name=job_name,
            body=V1DeleteOptions(propagation_policy="Foreground", grace_period_seconds=5),
        )

    def get_mpi_job_status(self, job_name: str, namespace: str) -> Dict:
        return self.api.get_namespaced_custom_object(
            group="kubeflow.org", version="v2beta1", namespace=namespace, plural="mpi_jobs", name=job_name
        )

6. Refactoring KubernetesSystem

The KubernetesSystem class will be refactored to delegate API interactions to the newly introduced manager classes. This reduces the complexity in KubernetesSystem, making it more focused on high-level orchestration rather than API details.

class KubernetesSystem(BaseModel, System):
    """
    Represents a Kubernetes system and delegates API-specific tasks to API managers.
    """

    model_config = ConfigDict(extra="forbid", arbitrary_types_allowed=True)

    name: str
    default_namespace: str
    scheduler: str = "kubernetes"
    core_v1_manager: CoreV1APIManager
    batch_manager: BatchAPIManager
    custom_objects_manager: CustomObjectsAPIManager

    def __init__(self, **data):
        super().__init__(**data)

        kube_config_path = self.kube_config_path.resolve() if self.kube_config_path.is_file() else Path.home() / ".kube" / "config"
        if not kube_config_path.exists():
            raise FileNotFoundError(f"Kube config file '{kube_config_path}' not found.")
        
        config.load_kube_config(config_file=str(kube_config_path))

        # Initialize the API managers
        self.core_v1_manager = CoreV1APIManager()
        self.batch_manager = BatchAPIManager()
        self.custom_objects_manager = CustomObjectsAPIManager()

    def create_job(self, job_spec: Dict[Any, Any]) -> str:
        job_kind = job_spec.get("kind", "").lower()
        if "mpi_job" in job_kind:
            return self.custom_objects_manager.create_mpi_job(job_spec, self.default_namespace)
        elif "job" in job_kind:
            return self.batch_manager.create_job(job_spec, self.default_namespace)
        else:
            raise ValueError(f"Unsupported job kind: {job_kind}")

    def delete_job(self, job_name: str, job_kind: str) -> None:
        if "mpi_job" in job_kind.lower():
            self.custom_objects_manager.delete_mpi_job(job_name, self.default_namespace)
        elif "job" in job_kind.lower():
            self.batch_manager.delete_job(job_name, self.default_namespace)
        else:
            raise ValueError(f"Unsupported job kind: {job_kind}")

    def list_pods(self) -> List[Any]:
        return self.core_v1_manager.list_pods(self.default_namespace)

Test Plan


TaekyungHeo and others added 2 commits September 16, 2024 19:34
Co-authored-by: Andrei Maslennikov <andreyma@nvidia.com>
…config

Co-authored-by: Srinivas Sridharan <srinivas212@users.noreply.github.com>
@TaekyungHeo TaekyungHeo added enhancement New feature or request Oct24 Oct'24 release feature labels Sep 17, 2024
@TaekyungHeo TaekyungHeo removed the Oct24 Oct'24 release feature label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant