Policies and administration
Deployments on kubernetes happen using helmfile.
Deploying with helmfile
Code deployment/configuration changes
Note that both new code deployments as well as configuration changes are considered a deployment!
- Clone deployment-charts repo.
- Using your editor modify under the helmfile.d folder of the service you want to modify. As an example, myservice deployment lives under deployment-charts/helmfile.d/services/myservice. Most of the changes are usually made on the values.yaml and the values-*.yaml files to tune the deployment parameters.
- If you need to update or add a secret like a password or a certificate ask an SRE to commit it into the private puppet repo do not commit secrets in deployment-charts repo.
- Make a CR and after a successful review merge it. Note: SRE may offer +1 to your patch and that is sufficient to self-merge and deploy (see the notes about deployment changes in mw:Gerrit/Privilege policy#Merging without review)
- After merge, log in to a deployment server, there is a cron (1 minute) that will update the
directory with the contents from git. - Go to
where SERVICE is the name of your service i.e myservice. - execute
helmfile -e ${CLUSTER} -i apply --context 5
, where $CLUSTER is the k8s cluster you're operating on, currently one of staging, eqiad and codfw. This will show the changes that will be applied on the cluster and prompt you to confirm. The--context 5
flag allows for a more compact diff (by default it will display the whole rendered resources). It will then materialize the previous diff in the cluster and will log the change into SAL.
Consider that the diff generated by helmfile
may contain sensitve information like passwords and API keys. Use caution when sharing the output.
- all done!
In case there are multiple releases of your service in the same helmfile, you can use the --selector name=RELEASE_NAME
option, e.g. helmfile -e $CLUSTER --selector name=test -i apply --context 5
Release breaking changes
In some cases (TODO: add a list of cases?), changes to your chart will not be able to be applied by helmfile. In these cases, you will need to destroy and recreate the deployments, which involves doing a little DNS pooling dance.
- Depool your service from codfw
sudo cookbook sre.discovery.service-route depool codfw service-foo
- Watch your dashboards and wait for traffic to die out
- Destroy your services codfw deployments
cd /srv/deployment-charts/helmfile.d/services/service-foo; helmfile -e codfw -i destroy
- Wait for a bit.
helmfile destroy
returns before all actions are done on the kubernetes cluste. You will encounter the following error or similar if you recreate the deployments too soon:Error: release production failed, and has been uninstalled due to atomic being set: Service "service-foo-production-tls-service" is invalid: spec.ports[0].nodePort: Invalid value: $service-port: provided port is already allocated
- Recreate your deployment in codfw:
helmfile -e codfw -i apply --context 5
- Repool your service in codfw:
sudo cookbook sre.discovery.service-route pool codfw service-foo
- Watch traffic come back to codfw, then GOTO1 for eqiad
Seeing the current status
This is done using helmfile
- Change directory to
on a deployment server - Unless you are mid un-applied changes the current values files should reflect the deployed values
- You can check for unapplied changes with:
helmfile -e $CLUSTER diff --context 5
(again, the--context
option allows you to tune the amount of context surrounding your changes) - You can see the status with
helmfile -e $CLUSTER status
Rolling back changes
If you need to roll back a change because something went wrong:
- Revert the git commit to the deployment-charts repo
- Merge the revert (with review if needed)
- Wait one minute for the cron job to pull the change to the deployment server
- Change directory to
where SERVICE is the name of your service - execute
helmfile -e $CLUSTER diff --context 5
- execute
helmfile -e $CLUSTER apply
Rolling back in an emergency
If you can't wait the one minute, or the cron job to update from git fails etc. then it is possible to manually roll back using helm. This is discouraged over using helmfile though.
- Find the revision to roll back to
sudo -i
kube_env admin $CLUSTER; helm3 -n $SERVICE history $RELEASE
- Find the revision to roll back to
- e.g. perhaps the penultimate one
REVISION UPDATED STATUS CHART DESCRIPTION 1 Tue Jun 18 08:39:20 2019 SUPERSEDED termbox-0.0.2 Install complete 2 Wed Jun 19 08:20:42 2019 SUPERSEDED termbox-0.0.3 Upgrade complete 3 Wed Jun 19 10:33:34 2019 SUPERSEDED termbox-0.0.3 Upgrade complete 4 Tue Jul 9 14:21:39 2019 SUPERSEDED termbox-0.0.3 Upgrade complete
- Rollback with (still
sudo -i
):kube_env admin $CLUSTER; helm3 rollback -n $SERVICE $RELEASE 3
Rolling restart
If you want to force all PODs of your deployment to restart, you can use the roll_restart parameter during deployment with helmfile:
helmfile -e $CLUSTER --state-values-set roll_restart=1 sync
Undeploy/delete a release
You may undeploy/delete your service completely using:
helmfile -e $CLUSTER destroy
If you want to undeploy/delete just a specific release of your service, use a selector like:
helmfile -e $CLUSTER --selector name=$RELEASE_NAME destroy
Advanced use cases: using kubeconfig
If you need to use kubeconfig (for a port-forward or to get logs for debugging) you can execute kube_env $SERVICE $CLUSTER; kubectl COMMAND
, e.g. kube_env myservice staging; kubectl logs POD_NAME -c CONTAINER_NAME
for logs.
kube_env $SERVICE-deploy $CLUSTER
will configure your shell to use an account with more expansive permissions within the $SERVICE
namespace. This is a bit like using sql enwiki --write
, so please pay attention when you are using these extended rights.
Advanced use cases: using helm
Sometimes you might need to use helm, this is completely discouraged use it only at your own risk and in emergencies. It assumes that you know what you are doing using helm.
kube_env <service> <cluster>
helm <command>
akosiaris@deploy1002:~$ kube_env mathoid eqiad
akosiaris@deploy1002:~$ helm list
production 1 Tue Mar 23 10:37:50 2021 DEPLOYED mathoid-0.0.35 mathoid
akosiaris@deploy1002:~$ helm status
Error: release name is required
akosiaris@deploy1002:~$ helm status production
LAST DEPLOYED: Tue Mar 23 10:37:50 2021
NAMESPACE: mathoid
==> v1/ConfigMap
config-production 1 26d
mathoid-production-envoy-config-volume 1 26d
mathoid-production-tls-proxy-certs 2 26d
production-metrics-config 1 26d
==> v1/Deployment
mathoid-production 30/30 30 30 26d
==> v1/NetworkPolicy
mathoid-production app=mathoid,release=production 26d
==> v1/Pod(related)
mathoid-production-64787b97c5-24pzw 3/3 Running 0 26d
mathoid-production-64787b97c5-z74n2 3/3 Running 0 26d
==> v1/Service
mathoid-production NodePort <none> 10044:10042/TCP 26d
mathoid-production-tls-service NodePort <none> 4001:4001/TCP 26d
When `helmfile apply` Does Nothing
In T347521, an application was in a state where kubectl get pod
and kubectl get deploy
showed no resources, but helmfile apply
did nothing. Looking with kubectl get networkpolicy
, we were able to see that the application was in a partially-deployed state. Running helmfile destroy
and helmfile apply
was enough to recover the application.
`helmfile destroy` without root permissions
If you need to destroy a release, you can use kube_env to become the deploy user (might require global root/SRE-level access, please update if so). kube_env ${namespace}-deploy ${env} ˆ. This will allow you to destroy a release with the minimum amount of permissions.
Deploying a change
- +2 your change on https://gerrit.wikimedia.org/g/operations/deployment-charts and wait for Jenkins to merge it.
- Login to the active deployment server:
$ ssh deployment.eqiad.wmnet
- Apply the helm chart to all 3 clusters:
$ cd /srv/deployment-charts/helmfile.d/services/${SERVICE} $ helmfile -e staging -i apply --context 5 $ helmfile -e eqiad -i apply --context 5 $ helmfile -e codfw -i apply --context 5
Rendering the helmfile file itself
Usually, a chart helmfile.yaml
file includes some templating logic, which can, like any logic, induce bugs. To inspect the fully rendered version of the helmfile, you can use helmfile -e $CLUSTER build
rouberol@deploy1002:/srv/deployment-charts/helmfile.d/dse-k8s-services/superset-next$ helmfile -e dse-k8s-eqiad build
helmfile.yaml: basePath=.
# Source: helmfile.yaml
filepath: helmfile.yaml
helmBinary: helm3
- releases:
- staging
missingFileHandler: Warn
tillerless: false
- --kubeconfig
- /etc/kubernetes/superset-next-deploy-dse-k8s-eqiad.config
verify: false
devel: false
wait: false
timeout: 600
recreatePods: false
force: false
atomic: true
Rendering the final values overlay
In the helmfile, we define an order of precedence for values files, from lowest to highest. These values are then "overlaid" on top of each other into a single value file, which is then passed to the chart for rendering. Sometimes, we make typos, or a value isn't indented enough, which results in chart rendering issues (maybe the wrong value is passed to the chart). If you're facing such a bug, you can use the helmfile -e $CLUSTER write-values
command to generate the final overlaid values file, that you can then inspect for mistakes.
brouberol@deploy1002:/srv/eployment-charts/helmfile.d/dse-k8s-services/superset-next$ sudo helmfile -e dse-k8s-eqiad write-values
helmfile.yaml: basePath=.
skipping missing values file matching "values.yaml"
Writing values file helmfile-638f63f0/staging.yaml # one file per release
helmfile.yaml: basePath=.
brouberol@deploy1002:/srv/deployment-charts/helmfile.d/dse-k8s-services/superset-next$ sudo cat helmfile-638f63f0/staging.yaml | head -n 20
- port: 3320
protocol: tcp
- port: 3350
protocol: tcp
- port: 3306
protocol: tcp
version: ea12de870b81a7c735701ae0e23d0f416fb2bfc9-production-backend
version: ea12de870b81a7c735701ae0e23d0f416fb2bfc9-production-frontend
exporter: prometheus-apache-exporter:0.0.3-20231015
version: 45377f59c5bdf8bae1b967c49ee29a144c5cba44-production
Note that this will create a directory containing a flat values file per helmfile release. Don't forget to remove it afterwards, as /srv/deployment-charts
is a git repository.
Rendering a chart
To render a chart using its associated helmfile, you can simply run helmfile -e $CLUSTER template
brouberol@deploy1002:/srv/deployment-charts/helmfile.d/dse-k8s-services/superset-next$ helmfile -e dse-k8s-eqiad template | head -n 20
helmfile.yaml: basePath=.
skipping missing values file matching "values.yaml"
Templating release=staging, chart=wmf-stable/superset
# Source: superset/templates/networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: superset-staging
app: superset
chart: superset-0.0.19
release: staging
heritage: Helm
app: superset
release: staging
- Egress
- Ingress
helmfile.yaml: basePath=.