相关文章推荐
$ oc get pods 
NAME                                                      READY  STATUS     RESTARTS  AGE
kube-controller-manager-master1.example.com  3/4    CreateContainerError    0         18h
kube-controller-manager-master2.example.com  4/4    Running    0         12m
kube-controller-manager-master3.example.com  4/4    Running    0         18h
  • oc describe shows kubelet errors about Path not found
  • Events:
      Type     Reason  Age                     From     Message
      ----     ------  ----                    ----     -------
      Warning  Failed  59m (x8706 over 17h)    kubelet  (combined from similar events): Error: container create failed: time="2021-04-11T21:03:07Z" level=error msg="container_linux.go:366: starting container process caused: exec: \"cluster-kube-scheduler-operator\": executable file not found in $PATH"
      Normal   Pulled  4m10s (x4600 over 17h)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6468c1dd1ca2d855e171dda54efcb56b8915ba65f9b915899d922c8720d8e7e1" already present on machine
    
  • This issue could be affecting one or more images.
  • Normally the errors are affecting a specific node.
  • Deleting and redownloading the image doesn't resolve the issue.
  • After trying to run the affected image using podman in the node, we get a different error:
  • $ podman run 2810ace6e1fe
    readlink /var/lib/containers/storage/overlay: invalid argument"
          

    Resolution

    This issue is being tracked in Red Hat Bugzilla 1950536.

    The workaround is to delete all the images from /var/lib/containers/storage directories and reboot. The steps for accomplishing this are:

  • Drain the node with the problematic images:
  • $ oc adm drain master1.example.com --ignore-daemonsets --delete-local-data --force --grace-period=1
    node/master1.example.com cordoned
    WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-859bg, openshift-controller-manager/controller-manager-2bvrq, openshift-dns/dns-default-d995f, openshift-image-registry/node-ca-xrw5r, openshift-machine-config-operator/machine-config-daemon-dxj98, openshift-machine-config-operator/machine-config-server-q7gpv, openshift-monitoring/node-exporter-jzxvt, openshift-multus/multus-74zhp, openshift-multus/multus-admission-controller-xqj2r, openshift-multus/network-metrics-daemon-vrst2, openshift-sdn/ovs-9vvlq, openshift-sdn/sdn-controller-m6kz9, openshift-sdn/sdn-psnlt
    evicting pod openshift-image-registry/cluster-image-registry-operator-548576fb5b-frmfp
    evicting pod openshift-apiserver-operator/openshift-apiserver-operator-67fd49986d-9tdmf
    evicting pod openshift-apiserver/apiserver-7f54fbf8f6-psv55
    evicting pod openshift-authentication-operator/authentication-operator-74c6b567fb-bx5h6
    pod/apiserver-64f575f4f6-cr99f evicted
    node/ocp46ipi-t46gj-master-0 evicted
    
  • SSH to the node , disable crio and kubelet services and reboot
  •  $ systemctl disable crio; systemctl disable kubelet; reboot
    
  • Once the node has restarted ssh to it again and delete storage overlay directories from the node, and after this, enable and start crio and kubelet services. As root user execute:
  • $ rm -rf /var/lib/containers/storage/*
    $ systemctl enable crio; systemctl enable kubelet
    Created symlink /etc/systemd/system/multi-user.target.wants/crio.service → /usr/lib/systemd/system/crio.service.
    Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /etc/systemd/system/kubelet.service.
    $ systemctl start crio; systemctl start kubelet
    
  • Wait some minutes and check containers are running again
  • $ crictl ps
    CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                    ATTEMPT             POD ID
    96afb20435c62       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f3da24f9a2383afa1cf31d707cdcd03df0e21084523d17373f74d03349700ff   15 seconds ago      Running             sdn-controller          0                   1862d340c8fe8
    984dfa6a1f4f2       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f3da24f9a2383afa1cf31d707cdcd03df0e21084523d17373f74d03349700ff   15 seconds ago      Running             openvswitch             0                   ba8fb32dcdf30
    5fb941a0a4c4d       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f3da24f9a2383afa1cf31d707cdcd03df0e21084523d17373f74d03349700ff   15 seconds ago      Running             sdn                     0                   2c6bc4f7d59b9
    9b2b03880dd6c       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9b58995c876bcb431e0f1d54d611a8b8e9cb7a60744a9df0a9193786d8865020   22 seconds ago      Running             machine-config-server   0                   454b048591d4c
    4cd8d971f9a6c       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9b58995c876bcb431e0f1d54d611a8b8e9cb7a60744a9df0a9193786d8865020   22 seconds ago      Running             machine-config-daemon   0                   490c2493f6b0e
    d128e50e478be       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c6b99fa7f1114aac1818c48eab061f13d7c0d02d70f2308c36b03a4dcda20282   27 seconds ago      Running             kube-rbac-proxy         0                   2d95fa31b9536
    
  • Uncordon the node.
  • Diagnostic Steps

    Trying to execute podman run with the problematic image gives a different error:

    $ podman run 2810ace6e1fe
    readlink /var/lib/containers/storage/overlay: invalid argument"
                          

    This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

    Here are the common uses of Markdown.

    Code blocks
    ~~~
    Code surrounded in tildes is easier to read
            
    Links/URLs
    [Red Hat Customer Portal](https://access.redhat.com)
    Learn more Are you sure you want to request a translation? We appreciate your interest in having Red Hat content localized to your language. Please note that excessive use of this feature could cause delays in getting specific content you are interested in translated.
     
    推荐文章