Helm upgrade fails the release after adding a new resource · Issue #6031 · helm/helm

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output of helm version :
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.6-gke.13", GitCommit:"fcbc1d20b6bca1936c0317743055ac75aef608ce", GitTreeState:"clean", BuildDate:"2019-06-19T20:50:07Z", GoVersion:"go1.11.5b4", Compiler:"gc", Platform:"linux/amd64"}
Scenario:
Helm install a chart with resources X, Y, and Z (it doesn't seem to matter which).
Helm upgrade chart to add resource W (in this case a CronJob) - success
$ helm upgrade --install --wait --timeout 600 --namespace myNamespace -f someValues.yaml test .
Release "test" has been upgraded.
LAST DEPLOYED: Wed Jul 17 10:14:58 2019
NAMESPACE: myNamespace
STATUS: DEPLOYED
==> v1beta1/CronJob
NAME                          SCHEDULE     SUSPEND  ACTIVE  LAST SCHEDULE  AGE
test-myCron * */1 * * *  False    0       <none>         6s
Helm upgrade again, with or without changing anything - failure
$ helm upgrade --install --wait --timeout 600 --namespace myNamespace -f someValues.yaml test .
UPGRADE FAILED
Error: kind CronJob with the name "test-myCron" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Error: UPGRADE FAILED: kind CronJob with the name "test-myCron" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
I can delete the CronJob (or whatever resource was added) with kubectl and repeat steps 2 and 3 with the same results. Adding --debug doesn't add anything of value.
It seems related to #1193 but, if I'm reading it correctly, in that issue the deployment would have failed in step 2.
  daern91, acbeni, anas-aso, manusinha27, Constantin07, thiago, jflorencio, lorenzo-cavazzi, kgrygiel, 27Bslash6, and 26 more reacted with thumbs up emoji
  daern91, macrozone, and yves-vogl reacted with confused emoji
    All reactions
          @goodman-sullivan I tried to reproduce the issue (using the scaffold chart and simple cronjob )but it seems to work OK for me. Here is the output of my investigation. Maybe you could provide more details to help me reproduce it?
$ helm create chrt-6031
Creating chrt-6031
$ helm install --name chrt-6031 chrt-6031/
NAME:   chrt-6031
LAST DEPLOYED: Wed Jul 17 16:25:54 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME       READY  UP-TO-DATE  AVAILABLE  AGE
chrt-6031  0/1    0           0          1s
==> v1/Pod(related)
NAME                       READY  STATUS   RESTARTS  AGE
chrt-6031-6987788fb-gvwbv  0/1    Pending  0         0s
==> v1/Service
NAME       TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)  AGE
chrt-6031  ClusterIP  10.96.111.27  <none>       80/TCP   1s
==> v1/ServiceAccount
NAME       SECRETS  AGE
chrt-6031  1        1s
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=chrt-6031,app.kubernetes.io/instance=chrt-6031" -o jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl port-forward $POD_NAME 8080:80
$ helm ls
NAME      	REVISION	UPDATED                 	STATUS  	CHART          	APP VERSION	NAMESPACE
chrt-6031 	1       	Wed Jul 17 16:25:54 2019	DEPLOYED	chrt-6031-0.1.0	1.0        	
default
$ vim chrt-6031/templates/cronjob.yaml
$ cat chrt-6031/templates/cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: {{ include "chrt-6031.fullname" . }}
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure
$ helm upgrade chrt-6031 chrt-6031/
Release "chrt-6031" has been upgraded.
LAST DEPLOYED: Wed Jul 17 16:31:08 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME       READY  UP-TO-DATE  AVAILABLE  AGE
chrt-6031  1/1    1           1          5m14s
==> v1/Pod(related)
NAME                       READY  STATUS   RESTARTS  AGE
chrt-6031-6987788fb-gvwbv  1/1    Running  0         5m13s
==> v1/Service
NAME       TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)  AGE
chrt-6031  ClusterIP  10.96.111.27  <none>       80/TCP   5m14s
==> v1/ServiceAccount
NAME       SECRETS  AGE
chrt-6031  1        5m14s
==> v1beta1/CronJob
NAME       SCHEDULE     SUSPEND  ACTIVE  LAST SCHEDULE  AGE
chrt-6031  */1 * * * *  False    0       <none>         0s
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=chrt-6031,app.kubernetes.io/instance=chrt-6031" -o jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl port-forward $POD_NAME 8080:80
$ helm ls
NAME      	REVISION	UPDATED                 	STATUS  	CHART          	APP VERSION	NAMESPACE
chrt-6031 	2       	Wed Jul 17 16:31:08 2019	DEPLOYED	chrt-6031-0.1.0	1.0        	default  
$ kubectl get cronjobs
NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
chrt-6031   */1 * * * *   False     0        <none>          35s
$ helm upgrade chrt-6031 chrt-6031/
Release "chrt-6031" has been upgraded.
LAST DEPLOYED: Wed Jul 17 16:32:05 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Deployment
NAME       READY  UP-TO-DATE  AVAILABLE  AGE
chrt-6031  1/1    1           1          6m10s
==> v1/Pod(related)
NAME                       READY  STATUS   RESTARTS  AGE
chrt-6031-6987788fb-gvwbv  1/1    Running  0         6m9s
==> v1/Service
NAME       TYPE       CLUSTER-IP    EXTERNAL-IP  PORT(S)  AGE
chrt-6031  ClusterIP  10.96.111.27  <none>       80/TCP   6m10s
==> v1/ServiceAccount
NAME       SECRETS  AGE
chrt-6031  1        6m10s
==> v1beta1/CronJob
NAME       SCHEDULE     SUSPEND  ACTIVE  LAST SCHEDULE  AGE
chrt-6031  */1 * * * *  False    1       5s             56s
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=chrt-6031,app.kubernetes.io/instance=chrt-6031" -o jsonpath="{.items[0].metadata.name}")
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl port-forward $POD_NAME 8080:80
$ helm ls
NAME      	REVISION	UPDATED                 	STATUS  	CHART          	APP VERSION	NAMESPACE
chrt-6031 	3       	Wed Jul 17 16:32:05 2019	DEPLOYED	chrt-6031-0.1.0	1.0        	default  
$ kubectl get cronjobs
NAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
chrt-6031   */1 * * * *   False     0        49s             100s




    

          We are experiencing the same issue but with HPA resource.

EKS version: 1.11

Helm version:  2.14.2
While trying to update the deployment with a new version of chart with added HPA resource, we are getting:
helmfile sync
Building dependency chart
No requirements found in chart/charts.
Upgrading chart
UPGRADE FAILED
Error: kind HorizontalPodAutoscaler with the name "some_service" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Removing manually the HPA resource and re-running the deployment fixes the situation but is not acceptable for production.
          We are also facing the same kind of issue.
helm upgrade --namespace monitoring -f config.yaml central-monitoring stable/prometheus-operator --version 6.1.1 --force  --debug
[debug] Created tunnel using local port: 'xxxx'
[debug] SERVER: "127.0.0.1:xxxx"
[debug] Fetched stable/prometheus-operator to /Users/xxxx/.helm/v2.14.1/cache/archive/prometheus-operator-6.1.1.tgz
UPGRADE FAILED
Error: kind Service with the name "central-monitoring-prometh-kube-proxy" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Error: UPGRADE FAILED: kind Service with the name "central-monitoring-prometh-kube-proxy" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Even if we delete the service it throws the same error while upgrading.
We just rolled back to previous successful release.
          same here with prometheus-operator on my bare-metal cluster v1.14 which never had any problems and had no recent upgrades
 helm upgrade --install mon \
  --namespace monitoring \
  -f values.yaml \
  stable/prometheus-operator
UPGRADE FAILED
Error: kind ConfigMap with the name "mon-prometheus-operator-apiserver" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Error: UPGRADE FAILED: kind ConfigMap with the name "mon-prometheus-operator-apiserver" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
          Does anyone have an idea of the nature/root cause of these errors ?

Unfortunately, I don't have good knowledge of Go language to dig it myself.
It just makes impossible to use helm in production.
          That error indicates that you introduced a new resource into the chart, but that resource already existed in the cluster. Helm does not "adopt" existing resources; if a resource exists before it was introduced in the chart, it's considered an error condition, hence the error message.
To fix this, remove the resource from the cluster using kubectl delete, then call helm upgrade again.
          @bacongobbler i have not done super deep inspection. but this is not case.

and somehow i'm not only one who has problem with prometheus-operator specifically.

I'm deleting all resources it is asking. It just never stops saying about existant/non-existant resources.

I update this chart almost instantly. and latest update has no huge changes in chart structure

https://github.com/helm/charts/tree/0459eb2bdade42a88a44ee115184ba584bd3131c (pinned to problematic commit)
i will definitely try to inspect yaml closely.
But for now rollback has worked just fine.
          @bacongobbler I second @fernandoalex - I'm 100% sure that no new resources for chart had been added manually or by any other means in the cluster before the update the chart.
And it's not related to specific resource type - in one case it failed when we added an HPA, in another a config map.
Given the fact that helm is quite spread and widely used in Kubernetes ecoystem as deployment tool, I don't understand how people are using it in production ...
Is it suppose to work like that ? break sporadically and require manual intervention to delete custom resources ...
          I'm seeing this issue too.

Step 1> Deploy chart with new CronJob objects added (don't yet exist in cluster).  Deploy is successful.

Step 2> Next deploy fails with error:
Error: UPGRADE FAILED: kind CronJob with the name "staging-sitemap-cron-job" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Detected that there are existing resources.
The only workaround we have found is to run helm del --purge on the release and re-deploy.  We are using helm v2.14.3.
  Constantin07, macrozone, zaro, livneros, aboichev, zry656565, YuvalItzchakov, smusick-teamwork, apragacz, kikilagou, and 3 more reacted with thumbs up emoji
  luiseduardobrito, Allex1, duongnt, and dancb10 reacted with thumbs down emoji
    All reactions
          I had this issue more than once, but this time I found something in common.

It may have to do when you change subcharts version and (this part I don't remember quite well) the upgrade fails for some reason.

From that point, you'll run over this problem again and again.

Hope this helps!
          can anyone shed some more light on this? we are seeing this as well. the usual procedure is to delete the resource (in our case cronjob) from kube manually and redeploy. this works occasionally but we are also seeing this bug come back (even after multiple deploys succeeding)
edit: using helm v2.14.1
Try adding the --cleanup-on-fail flag to your helm upgrade invocations. This flag will automatically remove any new resources the chart introduced during a failed upgrade. See #4871 for more info
will try that. Which helm versions do include this feature?
          Unfortunately we are infrequently experiencing the same issue:
helm secrets upgrade --atomic --cleanup-on-fail --timeout 3600 -i integ2 umbrella --namespace xxx -f values/envs/integ2/values.yaml -f values/envs/integ2/secrets.yaml -f control/values.yaml -f control/envs/integ2/values.yaml
UPGRADE FAILED
Error: kind PersistentVolumeClaim with the name "biocatch-log" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
ROLLING BACKError: kind ConfigMap with the name "alertmanager-jiramail" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
The thing is, both resources were provisioned by helm and both Objects were not touched during the current deployment (we are always running a helm diff beforehand). We are already - and have always - used the cleanup-on-fail Feature.
Installed Helm version v2.14.3
Installed Tiller version v2.14.3
What would happen, if we delete the last Secrets (where helm keeps track of the release Infos) by hand?

Deleting the single resources might be an valid option currently. But not in Production, and surly not wiping the complete namespace. Would a Tillerlog be of any assistance in the debugging?
$ oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0+b4261e0", GitCommit:"b4261e07ed", GitTreeState:"clean", BuildDate:"2019-05-18T05:40:34Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.0+b81c8f8", GitCommit:"b81c8f8", GitTreeState:"clean", BuildDate:"2019-09-05T14:19:18Z", GoVersion:"go1.9.7", Compiler:"gc", Platform:"linux/amd64"}
          This continues to happen to us consistently. This workaround seems to work
upgrade with the new resource
upgrade again (it fails)
rollback the failed upgrade
subsequent upgrades work for me. didnt see anything interesting in the log
  nce, morganchristiansson, rstedman, jeffrom, bobbui, tvainika, shikloshi, yashvanzara, choldrim, eitanpinchover, and 4 more reacted with thumbs up emoji
  sadjunky reacted with hooray emoji
    All reactions
STDERR:
Error: UPGRADE FAILED: kind PodDisruptionBudget with the name "istio-sidecar-injector" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
MSG:```
I am experiencing this problem as well.
UPGRADE FAILED
Error: kind Service with the name "XXXXXXXXXXXX" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Edit:

Ok so I have found a temporary solution (which may not work for everyone) but at least it's better than doing a '''helm delete --purge''' which will delete all of your data. You can do:
helm rollback <release-name> 0
to rollback to the previous revision of your helm deployment and then do a:
helm upgrade <release-name> .
This worked for me.
          Ah, i finally understood why this error repeats even when we delete the service account.

What really happens is when you apply your file with
helm upgrade -f
He assigns the first service account without any problem, that whyit takes some delay to respond, then he moves to assign the next one but surprise, here k8S finds that the service account is already there, so to fix this, simply don't declare your service account more than once in your values.yaml file (this is only for my case).
          This seems to be caused by Tiller's database getting corrupted. For example, on one of my clusters:
$ kubectl get cm -n kube-system -l OWNER=TILLER,STATUS=DEPLOYED 
NAME                       DATA   AGE
foo.v4                     1      8m12s
bar.v206   1      8h
bar.v207   1      8h
bar.v99    1      33d
In other words, Tiller's internals might take either v99, v206 or v207 as the DEPLOYED version, which leads to the error above.
I ended up (brutally) deleting the extra CMs: kubectl delete -n kube-system cm bar.v99.
      For some reason, many users experince corrupted storage with the
ConfigMaps storage backend. Specifically, several Releases are marked as
DEPLOYED. This patch improved handling of such situations, by taking the latest
DEPLOYED Release. Eventually, the storage will clean itself out, after
the corrupted Releases are deleted due to --history-max.
Closes helm#6031
Signed-off-by: Cristian Klein <[email protected]>
      For some reason, many users experince corrupted storage with the
ConfigMaps storage backend. Specifically, several Releases are marked as
DEPLOYED. This patch improved handling of such situations, by taking the latest
DEPLOYED Release. Eventually, the storage will clean itself out, after
the corrupted Releases are deleted due to --history-max.
Closes helm#6031
Signed-off-by: Cristian Klein <[email protected]>
      For some reason, many users experince corrupted storage with the
ConfigMaps storage backend. Specifically, several Releases are marked as
DEPLOYED. This patch improved handling of such situations, by taking the latest
DEPLOYED Release. Eventually, the storage will clean itself out, after
the corrupted Releases are deleted due to --history-max.
Closes #6031
Signed-off-by: Cristian Klein <[email protected]>
(cherry picked from commit 840e0e2)