Skip to content

caas-team/GoKubeDownscaler

Repository files navigation

GoKubeDownscaler

GitHub Release GitHub License Contributers Stars Slack Workspace

A vertical autoscaler for Kubernetes workloads. This is a golang port of the popular (py-)kube-downscaler with improvements and quality of life changes.

Table of contents

Scalable Resources

These are the resources the Downscaler can scale:

  • CronJobs:
    • sets the cronjobs suspend property to true, halting it from running on the schedule
  • Daemonsets:
    • adds a label which matches none of the nodes to the nodeselector, stopping its pods from running on any node
  • Deployments:
  • Horizontal Pod Autoscalers (HPA):
    • sets the minReplicas of the HPA to the downscale replicas. Will throw an error if the downscale replicas is smaller than 1
  • Jobs:
    • sets the jobs suspend property to true, will stop execution of the job until upscaled again
  • PodDisruptionBudgets:
    • sets either maxUnavailable or minAvailable to the downscale replicas. Will not scale if minAvailable or maxUnavailable are percentiles instead of replica counts.
  • ScaledObjects:
  • StatefulSets:
  • Rollouts:
  • Stacks:
  • Prometheuses:

Installation

Installation is done via the Helm Chart

Configuration

Annotations

Annotations can be applied to a workload or its namespace. See the layers concept for more details on which of the layers values will be used.

Arguments

CLI arguments set layer values and runtime configuration at the start of the program. See the layers concept for more details on which of the layers values will be used.

Layer Values:

Runtime Configuration:

  • --dry-run:
    • boolean
    • sets the downscaler into dry run mode, which makes it just print the actions it would have performed
    • default: false
  • --debug:
    • boolean
    • makes the downscaler print more/debug information on what it currently does and what happens to the workloads
    • default: false
  • --once:
    • boolean
    • makes the downscaler exit after one scan
    • default: false
  • --interval:
    • duration
    • sets the wait time between scans
    • default: 30s
  • --namespace:
    • comma seperated list of namespaces (some-ns,other-ns or some-ns, other-ns)
    • makes the downscaler get workloads only from the specified namespaces
    • default: all namespaces
  • --include-resources:
    • comma seperated list of (case-insensitive) scalable resources (deployments,statefulsets or deployments, statefulsets)
    • enables scaling of workloads with the specified resource type
    • default: deployments
  • --exclude-namespaces:
    • comma seperated list of regex patterns matching namespaces (some-ns,other-ns,kube-.* or some-ns, other-ns, kube-.*)
    • excludes the matching namespaces from being scaled
    • default: kube-system, kube-downscaler
  • --exclude-deployments:
    • comma seperated list of regex patterns matching workload names (some-workload,other-workload,.*kube-downscaler or some-workload, other-workload, .*kube-downscaler)
    • excludes the matching workloads from being scaled
    • default: none
  • --matching-labels:
    • comma seperated list of regex patterns matching labels with their value (some-label=val,other-label=value,another-label=.* or some-label=val, other-label=value, another-label=.*)
    • makes the downscaler only include workloads which have any label that machtes any of the specified labels and values
    • default: none
  • --time-annotation:
    • string key of an annotation on the workload containing a RFC3339 Timestamp
    • when set grace-period will use the timestamp in the annotation instead of the creation time of the workload
    • default: none (uses the workloads creation time)

Environment Variables

Environment Variables set layer values on the env layer and runtime configuration at the start of the program. See the layers concept for more details on which of the layers values will be used.

Layer Values:

Runtime Configuration:

Timespans

There are two different kinds of Timespans:

  • Absolute Timespans: a timespan defined by two RFC3339 Timestamps
  • Relative Timespans: reoccuring on a schedule

Configuration of an Absolute Timespan

<RFC3339-Timestamp>-<RFC3339-Timestamp>
or
<RFC3339-Timestamp> - <RFC3339-Timestamp>

example: 2024-07-29T08:30:00Z - 2024-07-29T16:00:00+02:00

See RFC3339 Timestamps for more information

Configuration of a Relative Timespan

<Weekday-From>-<Weekday-To> <Time-Of-Day-From>-<Time-Of-Day-To> <Timezone>

example:

Mon-Fri 08:00-20:00 Asia/Tokyo          # From Monday to Friday: from 08:00 to 20:00
Sat-Sun 00:00-24:00 UTC                 # On The Weekend: the entire day
Mon-Fri 20:00-08:00 Australia/Sydney    # From Monday to Friday: from Midnight to 08:00 and from 20:00 until end of day
Mon-Sun 00:00-00:00 America/New_York    # The timespan never matches, this would not do anything
Mon-Tue 20:00-24:00 Africa/Johannesburg # On Monday and Tuesday: from 20:00 to midnight
Mon-Tue 20:00-00:00 Europe/Amsterdam    # On Monday and Tuesday: from 20:00 to midnight

Valid Values:

Weekdays: (case-insensitive)

  • Mon
  • Tue
  • Wed
  • Thu
  • Fri
  • Sat
  • Sun

Timezones:

Note

The IANA Time Zone database mainly supports regional/city timezones (example: Europe/Berlin, America/Los_Angeles) instead of abbreviations (example: CEST, PST, PDT). It supports some abbreviations like CET, MET and PST8PDT but these (not including UTC) shouldn't be used, and only exist for backwards compatibility. Time of day: 00:00 - 24:00

Multiple/Complex Timespans

In some cases you need to define multiple Timespans. You can do this like this:

<TIMESPAN>,<TIMESPAN>,<TIMESPAN>

OR with optional spaces:

<TIMESPAN>, <TIMESPAN>, <TIMESPAN>

The timespans can be absolute, relative or mixed.

Example: downscale over the weekend and at night:

Sat-Sun 00:00-24:00 Europe/Berlin, Mon-Fri 20:00-07:00 Europe/Berlin

Duration

A duration can be defined either by an integer representing seconds

"120" # 120 seconds (2 minutes)
"900" # 900 seconds (15 minutes)

Or by a duration string:

"1h30m" # 1 hour and 30 minutes
"1.5h"  # 1 hour and 30 minutes
"2m"    # 2 minutes
"10s"   # 10 seconds
"300s"  # 300 seconds

Other units:

"ns"      # nanoseconds
"us"/"µs" # microseconds
"ms"      # milliseconds
"s"       # seconds
"m"       # minutes
"h"       # hours

See Golangs official documentation for more information

Concepts

Layers

Layers are layers of values. If the highest Layer doesn't have a value, it falls through it and tries to get the value from the next lower layer.

Layer Hierarchy

  1. Workload Layer
  2. Namespace Layer
  3. CLI Layer
  4. ENV Layer

Workload Layer

Defined by the annotations on the workload every scan.

Namespace Layer

Defined by the annotations on the namespace every scan.

CLI Layer

Defined by the command line arguments at startup.

ENV Layer

Defined by the environemt variables at startup.

Examples

Note

A process line with "(...)" is a compacted form, instead of showing the process on each layer

--- Layers
Workload: (no annotations)
Namespace: exclude=true
CLI: (defaults)
ENV: (no env vars)
--- Process:
Exclusion not specified on workload layer, going to next layer
Exclusion set to true on namespace layer, excluding workload
--- Result:
Workload is excluded, no changes will be made to it
--- Layers
Workload: exclude=false
Namespace: exclude=true
CLI: downtime="Mon-Fri 08:00-16:00 Europe/Berlin"
ENV: (no env vars)
--- Process:
Exclusion set to false on workload layer, not excluding workload
No forced scaling found on any layer (...)
No scaling specified on Workload layer, going to next layer
No scaling specified on Namespace layer, going to next layer
Scaling "downtime" specified on CLI layer, scaling according to the downtime schedule on the cli layer
--- Result:
Workload will be scaled according to the downtime schedule on the cli layer
--- Layers
Workload: uptime="Mon-Fri 08:00-16:00 Europe/Berlin"
Namespace: force-downtime=true
CLI: downtime="Mon-Fri 20:00-08:00 America/Los_Angeles"
ENV: (no env vars)
--- Process:
Exclusion not set on any layer (...)
Forced scaling found on namespace layer, forcing downscale (...)
--- Result:
Workload will be forced into a down-scaled state
--- Layers
Workload: uptime="Mon-Fri 08:00-16:00 Europe/Berlin"
Namespace: force-downtime=true
CLI: downtime="Mon-Fri 20:00-08:00 America/Los_Angeles"
ENV: (no env vars)
--- Process:
Exclusion not set on any layer (...)
No forced scaling found on any layer (...)
Scaling "uptime" set on workload layer, scaling according to the uptime schedule on the cli layer
--- Result:
Workload will be scaled according to the uptime schedule on the cli layer

Values

  • downscale-period:
    • comma seperated list of timespans
    • within these periods the workload will be scaled down, outside of them the state will be ignored
    • incompatible with downtime, uptime
  • downtime:
  • upscale-period:
    • comma seperated list of timespans
    • within these periods the workload will be scaled up, outside of them the state will be ignored
    • incompatible with downtime, uptime
  • uptime:
  • exclude:
    • boolean
    • when true, the workload will be excluded/ignored while scaling
  • exclude-until:
    • RFC3339 Timestamp
    • the workload will be excluded until this time
  • force-uptime:
  • force-downtime:
    • boolean
    • if set to true the workload will be forced into an downtime state
    • incompatible with force-uptime
  • downscale-replicas:
    • int
    • the replicas that the workload should have while downscaled
  • grace-period:
    • duration
    • the duration a workload has to exist until it is first scaled. Will use the time annotation instead of the creation time of the workload if the time annotation argument is set.

See the layers concept for more details on which of the layers values will be used

Migrating from py-kube-downscaler

Basic migration

  1. Remove the old kube-downscaler
helm uninstall py-kube-downscaler
  1. Make sure all programs/non-default use cases support the breaking changes
  2. Make sure all timestamps are RFC 3339 compatible
  3. Install the new downscaler

Edge cases

If you had an implementation that used some of the quirks of the py-kube-downscaler you might need to change them to work with the GoKubeDownscaler.

Some cases where this might be needed include:

  • Incompatibility instead of priority
    • example: if you had a program that dynamically added an uptime annotation on a workload with a downtime annotation because you relied on the uptime annotation taking over
  • Layer system
    • example: the behaviour of excluding a namespace resulting in all workloads in it being excluded is not quite the same, as the workload could overwrite this by setting exclude to false
  • A pod that upscales the whole cluster
    • this behaviour is no longer available
  • RFC3339 timestamp
    • if you used the short form versions of the ISO 8601 timestamp (2023-08-12, 2023-233 or 2023-W34-1)
  • Actual exclusion
    • example: if you had a program that dynamically excluded a namespace and need it to then go in an upscaled state

Differences to py-kube-downscaler

Incompatibility instead of priority:

  • some values are now incompatible instead of using one over the other if both are set
  • backwards compatible: shouldn't break anything in most cases

Duration units:

  • instead of integers representing seconds you can also use duration strings
  • backwards compatible: fully compatible, integer seconds are still supported

Layer system:

  • makes it easier and more uniform to know what configuration is going to be used. All annotations can now also be easily applied to namespaces. See the layers concept for information on the new behaviour
  • backwards compatible: shouldn't break anything in most cases

--explicit-include cli argument:

  • a simple way to explicitly include single workloads. See --explicit-include for more details.
  • backwards compatible: fully compatible, no prior behaviour was changed

Comfort spaces:

  • allows for spaces in configuration to make the configuration more readable. (applies to: any comma seperated list, absolute timespans)
  • backwards compatible: fully compatible, you can still use the configuration without spaces

Uniform timestamp:

  • all timestamps are RFC3339 Timestamps this is more optimized for golang, more consistent and also used by Kubernetes itself
  • backwards compatible: mostly, unless you used a short form of ISO 8601 (2023-08-12, 2023-233 or 2023-W34-1) it should be totally fine to not change anything

Overlapping relative timespans into next day:

  • relative timespans can overlap into the "next" day (Mon-Fri 20:00-06:00 UTC). See Relative Timespans for information on how this behaves
  • backwards compatible: fully compatible, this didn't change any existing functionallity

Actual exclusion:

  • excluding a workload won't force the workload to be upscaled, instead it will just ignore its state
  • backwards compatible: should be fully compatible, unless your implementation relies on this

IANA Timezones:

  • the downscaler uses the IANA timezone database
  • backwards compatible: fully compatible, "Olson timezones" is just a lesser known synonym for the IANA time zone database

Workload error events:

  • errors with the configuration on the namespace or workload layer are shown as events on the workload
  • backwards compatible: fully compatible, doesn't change any existing functionality

--deployment-time-annotation -> --time-annotation:

  • the --deployment-time-annotation cli argument was changed to --time-annotation
  • backwards compatible: if you used this cli argument, you have to change it to --time-annotation

Missing Features

Currently the GoKubeDownscaler is still a WIP. This means that there are still some features missing. You can find a list of the known-missing features here. If you think that any other features are missing or you have an idea for a new feature, feel free to open an Issue

Troubleshooting

See troubleshooting

Developing

Please read the contribution manifest

Cloning the Repository

git clone https://github.com/caas-team/GoKubeDownscaler.git
cd GoKubeDownscaler

Setting up Pre-Commit

brew install pre-commit
pre-commit install
brew install golangci-lint
brew install gofumpt

Testing the downscaler

running the unit tests

go test -v --cover ./...

running the downscaler locally

The downscaler can be run locally by specifying a kubeconfig to use. The kubeconfig should have at least the permissions as the Helm Charts role.yaml. The downscaler will use the current-context in the kubeconfig.

go run -k=path/to/kubeconfig # ... additional configuration