Skip to content

Commit

Permalink
Merge pull request #146 from Cray-HPE/CASMTRIAGE-7327
Browse files Browse the repository at this point in the history
CASMTRIAGE-7327 - fix reading default values from ims-config.
  • Loading branch information
dlaine-hpe authored Oct 3, 2024
2 parents a4f3d7d + eb19b38 commit bc16a2c
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 14 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed
- CASMTRIAGE-7327 - fix loading default values from ims-config.
- CASMTRIAGE-7274 - fix cpu limits to not overdrive kata vm, add job pod anti-affinity.
- CASMCMS-9147 - stop using alpine:latest image.

## [3.18.0] - 2024-09-24

### Changed
Expand Down
2 changes: 1 addition & 1 deletion kubernetes/cray-ims/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,5 +57,5 @@ annotations:
- name: cray-ims-sshd
image: artifactory.algol60.net/csm-docker/stable/cray-ims-sshd:0.0.0-imssshd
- name: alpine
image: alpine:latest
image: artifactory.algol60.net/csm-docker/stable/docker.io/library/alpine:3
artifacthub.io/license: MIT
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,13 @@ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

NOTE: Kata hypevisor setup adds ALL container cpu limits together for
the hardware description. This changes the nproc return of available
cpus in the container, possibly overloading the VM causing it to
crash. Be careful adjusting any cpu limits for the containers.
*/}}

apiVersion: v1
data:
image_configmap_create.yaml.template: |
Expand Down Expand Up @@ -75,6 +81,18 @@ data:
namespace: $namespace
spec:
backoffLimit: 0
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- cray-ims
namespaces:
- $namespace
template:
metadata:
labels:
Expand Down Expand Up @@ -121,7 +139,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "8"
cpu: "2" # NOTE: see comment at top of the file
# Step 2: Wait for Repos
- image: {{ .Values.cray_ims_utils.image.repository }}:{{ .Values.cray_ims_utils.image.tag }}
imagePullPolicy: {{ .Values.cray_ims_utils.image.imagePullPolicy }}
Expand Down Expand Up @@ -151,7 +169,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "8"
cpu: "1" # NOTE: see comment at top of the file
# Step 3: Build a RPM containing the Cray Root CA certificate
- image: {{ .Values.cray_ims_utils.image.repository }}:{{ .Values.cray_ims_utils.image.tag }}
imagePullPolicy: {{ .Values.cray_ims_utils.image.imagePullPolicy }}
Expand Down Expand Up @@ -186,7 +204,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "8"
cpu: "1" # NOTE: see comment at top of the file
# Step 4: Build the image
- image: {{ .Values.cray_ims_kiwi_ng_opensuse_x86_64_builder.image.repository }}:{{ .Values.cray_ims_kiwi_ng_opensuse_x86_64_builder.image.tag }}
imagePullPolicy: {{ .Values.cray_ims_kiwi_ng_opensuse_x86_64_builder.image.imagePullPolicy }}
Expand All @@ -197,7 +215,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "48"
cpu: "8" # NOTE: see comment at top of the file
securityContext:
privileged: true
capabilities:
Expand Down Expand Up @@ -255,7 +273,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "48"
cpu: "8" # NOTE: see comment at top of the file
envFrom:
- configMapRef:
name: cray-ims-$id-configmap
Expand Down Expand Up @@ -372,7 +390,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "8"
cpu: "2" # NOTE: see comment at top of the file
volumes:
- name: image-vol
persistentVolumeClaim:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

NOTE: Kata hypevisor setup adds ALL container cpu limits together for
the hardware description. This changes the nproc return of available
cpus in the container, possibly overloading the VM causing it to
crash. Be careful adjusting any cpu limits for the containers.
*/}}
apiVersion: v1
data:
Expand Down Expand Up @@ -73,6 +78,18 @@ data:
namespace: $namespace
spec:
backoffLimit: 0
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- cray-ims
namespaces:
- $namespace
template:
metadata:
annotations:
Expand Down Expand Up @@ -130,7 +147,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "8"
cpu: "4" # NOTE: see comment at top of the file
securityContext:
privileged: true
capabilities:
Expand All @@ -147,7 +164,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "48"
cpu: "8" # NOTE: see comment at top of the file
env:
- name: API_GATEWAY_HOSTNAME
value: {{ .Values.api_gw.api_gw_service_name }}.{{ .Values.api_gw.api_gw_service_namespace }}.svc.cluster.local
Expand Down Expand Up @@ -261,7 +278,7 @@ data:
cpu: "500m"
limits:
memory: "$job_mem_limit"
cpu: "8"
cpu: "8" # NOTE: see comment at top of the file
volumeMounts:
- name: image-vol
mountPath: /mnt/image
Expand Down
4 changes: 2 additions & 2 deletions kubernetes/cray-ims/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ s3:

alpine:
image:
repository: alpine
tag: latest
repository: artifactory.algol60.net/csm-docker/stable/docker.io/library/alpine
tag: 3
pullPolicy: IfNotPresent

ims_config:
Expand Down
4 changes: 2 additions & 2 deletions src/server/models/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ class V2JobRecordInputSchema(Schema):
validate=Length(min=1, error="image_root_archive_name field must not be blank"))
enable_debug = fields.Boolean(load_default=False,dump_default=False,
metadata={"metadata": {"description": "Whether to enable debugging of the job"}})
build_env_size = fields.Integer(load_default=60,dump_default=60,
build_env_size = fields.Integer(dump_default=DEFAULT_IMAGE_SIZE,
metadata={"metadata": {"description": "Approximate disk size in GiB to reserve for the image build environment (usually 2x final image size)"}},
validate=Range(min=1, error="build_env_size must be greater than or equal to 1"))
kernel_file_name = fields.Str(metadata={"metadata": {"description": "Name of the kernel file to extract and upload"}})
Expand All @@ -166,7 +166,7 @@ class V2JobRecordInputSchema(Schema):
metadata={"metadata": {"description": "Job requires the use of dkms"}})

# v2.2
job_mem_size = fields.Integer(dump_default=8, required=False,
job_mem_size = fields.Integer(dump_default=DEFAULT_JOB_MEM_SIZE, required=False,
validate=Range(min=1, error="build_env_size must be greater than or equal to 1"),
metadata={"metadata": {"description": "Approximate working memory in GiB to reserve for the build job "
"environment (loosely proportional to the final image size)"}})
Expand Down

0 comments on commit bc16a2c

Please sign in to comment.