Skip to content

Commit

Permalink
Created 3 new scenarios for troubleshooting
Browse files Browse the repository at this point in the history
  • Loading branch information
sofusalbertsen committed Jan 18, 2024
1 parent dba6641 commit 9c47348
Show file tree
Hide file tree
Showing 16 changed files with 416 additions and 0 deletions.
143 changes: 143 additions & 0 deletions scenarios/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@



### 3. Persistent Volume Issues

#### Overview
Applications might face issues when trying to mount persistent volumes, which can be due to misconfigurations or underlying storage issues.

#### Tasks
- **Creating the Problem**:
1. Create a persistent volume with incorrect access modes.
2. Create a persistent volume claim that references the volume.
3. Create a pod that mounts the persistent volume claim.
4. Apply all configurations and observe the pod failing to start.

- **Fixing the Problem**:
1. Update the persistent volume to have the correct access modes.
2. Apply the updated configuration: `kubectl apply -f persistent-volume.yaml`.
3. Delete and recreate the pod to attempt mounting the volume again.

---

### 4. Resource Limitations

#### Overview
Setting appropriate resource requests and limits is crucial to ensure that your applications have the resources they need to run effectively.

#### Tasks
- **Creating the Problem**:
1. Create a pod with very low resource limits.
2. Apply the configuration: `kubectl apply -f pod-definition.yaml`.
3. Observe that the pod might be evicted or fail to run effectively.

- **Fixing the Problem**:
1. Update the pod definition to have more reasonable resource limits.
2. Apply the updated configuration: `kubectl apply -f pod-definition.yaml`.
3. Observe that the pod is now running effectively.

---

### 5. Security and RBAC Issues

#### Overview
Role-Based Access Control (RBAC) in Kubernetes helps in defining what actions users or applications can perform. Misconfigurations can lead to access issues.

#### Tasks
- **Creating the Problem**:
1. Create a Role and RoleBinding (or ClusterRole and ClusterRoleBinding) with very limited permissions.
2. Try to perform an action that requires more permissions and observe the failure.

- **Fixing the Problem**:
1. Update the Role (or ClusterRole) to include the necessary permissions.
2. Apply the updated configuration: `kubectl apply -f role-definition.yaml`.
3. Retest the action and observe that it now succeeds.

---

### 6. ConfigMap and Secret Issues

#### Overview
Applications might fail if they cannot access the necessary configuration data or secrets.

#### Tasks
- **Creating the Problem**:
1. Create a ConfigMap or Secret with incorrect data.
2. Mount the ConfigMap or Secret in a pod.
3. Apply all configurations and observe the application failing.

- **Fixing the Problem**:
1. Update the ConfigMap or Secret with the correct data.
2. Apply the updated configuration: `kubectl apply -f configmap-or-secret.yaml`.
3. Delete and recreate the pod to mount the updated ConfigMap or Secret.

---

### 7. Upgrade Issues

#### Overview
Upgrading Kubernetes or applications can sometimes lead to issues if not done carefully.

#### Tasks
- **Creating the Problem**:
1. Upgrade your Kubernetes cluster or application without checking compatibility.
2. Observe any issues that arise post-upgrade.

- **Fixing the Problem**:
1. Roll back to the previous version if necessary.
2. Check compatibility and perform any required pre-upgrade steps.
3. Retry the upgrade.

---

### 8. High Availability and Failover

#### Overview
Ensuring high availability and seamless failover is crucial for production applications.

#### Tasks
- **Creating the Problem**:
1. Set up an application with a single replica.
2. Simulate a node failure or delete the pod and observe downtime.

- **Fixing the Problem**:
1. Update the deployment to use multiple replicas spread across different nodes.
2. Apply the updated configuration: `kubectl apply -f deployment.yaml`.
3. Retest node failure or pod deletion and observe reduced downtime.

---

### 9. Monitoring and Logging

#### Overview
Proper monitoring and logging are essential for troubleshooting and maintaining the health of your applications.

#### Tasks
- **Creating the Problem**:
1. Set up an application without any monitoring or logging solutions in place.
2. Try to troubleshoot an issue without sufficient logs or metrics.

- **Fixing the Problem**:
1. Set up and configure a monitoring and logging solution such as Prometheus, Grafana, and ELK stack.
2. Ensure logs and metrics are being collected.
3. Retest troubleshooting with the available logs and metrics.

---

### 10. Resource Leaks and Orphaned Resources

#### Overview
Over time, Kubernetes clusters might accumulate unused or orphaned resources, leading to resource wastage.

#### Tasks
- **Creating the Problem**:
1. Create and delete numerous resources without cleaning up associated resources.
2. Observe the accumulation of orphaned resources.

- **Fixing the Problem**:
1. Identify and manually delete orphaned resources.
2. Use tools like `kubectl-prune` to automate the cleanup of unused resources.

---

Each section provides a basic guide on how to create a specific problem in a Kubernetes environment and steps to resolve it. Adjustments may be needed based on the specific Kubernetes setup and application configurations.
38 changes: 38 additions & 0 deletions scenarios/connectivity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
### Service Connectivity Issues


#### Overview
Services in Kubernetes provide a way for pods to communicate with each other. Misconfigurations or network policies can lead to connectivity issues.

In this scenario, we have an `nginx` deployment that has a service assosiated with it. The goal is to get connectivity between the `probe` pod and the `nginx` pod through the service.

#### Tasks
- **Creating the Problem**:
1. run `bash setup.sh`
1. Execute into the `probe` pod and see that you can connect to the nginx deployment through the service with a curl call: `curl nginx-service:80`

- **Fixing the Problem**:

<details>
<summary> Hint </summary>
It is clear that the connection is not working.
Try to describe the service and see if you can find the problem.
</details>

<details>
<summary> Hint </summary>

A service describes it's endpoints under the `Endpoints` section. Is there any there?

</details>

<summary> Hint </summary>

The service is not assosiated with any pods, because the selector does not match any labels.
Make sure that the labels on both service and deployment/pod are the same.

</details>

### Clean up

Remove the pod by executing `kubectl delete -f .` in the folder
22 changes: 22 additions & 0 deletions scenarios/connectivity/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: nginx
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 80
16 changes: 16 additions & 0 deletions scenarios/connectivity/pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: v1
kind: Pod
metadata:
name: probe
labels:
name: probe
spec:
containers:
- name: probe
image: ghcr.io/eficode-academy/network-multitool
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 80
10 changes: 10 additions & 0 deletions scenarios/connectivity/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: app
ports:
- port: 80
targetPort: 80
5 changes: 5 additions & 0 deletions scenarios/connectivity/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
# apply the configuration to your Kubernetes cluster
kubectl apply -f .

echo "setup completed"
24 changes: 24 additions & 0 deletions scenarios/limits/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
### 3. Limits

#### Overview
Apps should have boundaries on what type of resources they can take up

#### Tasks
- **Creating the Problem**:
1. run `bash setup.sh`
1. Observe that the pod is failing to start.

- **Fixing the Problem**:

<details>
<summary> Hint </summary>

`kubectl get pods` will show you
</details>

<details>


### Clean up

Remove the pod by executing `kubectl delete -f .` in the folder
24 changes: 24 additions & 0 deletions scenarios/limits/pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: v1
kind: Pod
metadata:
name: mysql-pod
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: MYSQL_ROOT_PASSWORD
value: "my-secret-pw"
ports:
- containerPort: 3306
volumeMounts:
- mountPath: /var/lib/mysql
name: my-volume
resources:
limits:
memory: "128Mi"
cpu: "500m"
volumes:
- name: my-volume
persistentVolumeClaim:
claimName: my-pvc
5 changes: 5 additions & 0 deletions scenarios/limits/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
# apply the configuration to your Kubernetes cluster
kubectl apply -f .

echo "setup completed"
37 changes: 37 additions & 0 deletions scenarios/persistency/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
### 3. Persistency Issues

#### Overview
Applications might face issues when trying to mount persistent volumes, which can be due to misconfigurations or underlying storage issues.

Note that this scenario only works on AWS clusters, as persistency is very cloud provider specific.

#### Tasks
- **Creating the Problem**:
1. run `bash setup.sh`
1. Observe that the pod is failing to start.

- **Fixing the Problem**:

<details>
<summary> Hint </summary>

`kubectl get pods` will show you
</details>

<details>
<summary> Hint </summary>

A service describes it's endpoints under the `Endpoints` section. Is there any there?

</details>

<summary> Hint </summary>

The service is not assosiated with any pods, because the selector does not match any labels.
Make sure that the labels on both service and deployment/pod are the same.

</details>

### Clean up

Remove the pod by executing `kubectl delete -f .` in the folder
24 changes: 24 additions & 0 deletions scenarios/persistency/pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: v1
kind: Pod
metadata:
name: mysql-pod
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: MYSQL_ROOT_PASSWORD
value: "my-secret-pw"
ports:
- containerPort: 3306
volumeMounts:
- mountPath: /var/lib/mysql
name: my-volume
resources:
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: my-volume
persistentVolumeClaim:
claimName: my-pvc
11 changes: 11 additions & 0 deletions scenarios/persistency/pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: standard
5 changes: 5 additions & 0 deletions scenarios/persistency/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
# apply the configuration to your Kubernetes cluster
kubectl apply -f .

echo "setup completed"
Loading

0 comments on commit 9c47348

Please sign in to comment.