In Kubernetes the central scheduler assigns workloads to nodes. When such a request reaches a node, the container runtime there checks whether the associated image already exists in the cache. If not, it loads it from the registry and then starts the container. If the image already exists, the runtime uses this version of the software by default.
In the example scenario, there was no difference in the name of the image (scraper:27.15). By the way, it was a log scraper. The old version tried to collect the logs via the Docker daemon. However, containerd has been in use for a while, which is accessed by the updated image. This dependence on the runtime was not reflected in the image name. Short-term solution (or workaround) in this case was to set the ImagePullPolicy in the associated DaemonSet of IfNotPresent to Always. With that, everything worked again.
On the other hand, there are best practices that would have prevented this problem: 1) reference images by their SHA256 checksums instead of a tag, and 2) enforce a cluster-wide ImagePullPolicy: Always.
Checksums are unique identifiers for binary objects. Since they are different for the two image versions, using them would have forced a new download. They are immutable unlike tags which can be overwritten with arbitrary values as in this example.
Downloading images before each container startup would have also helped and has another advantage. Suppose multiple clients are running applications in the same cluster. Client A launches a deployment that uses a particular Image A. The container runtime loads the image from the registry and uses the access data of client A for this. Client B should not have access to the images of A in case of doubt (be it for security or licensing reasons). However, if it is already present on the Node, B could access it without having to authenticate anywhere. This could be enforced via the enforced ImagePullPolicy.
For the current customer, both ways are recommended. The use of checksums can be decided by each team and should be discussed there – in particular, at which points and whether this will result in any changes to the workflow. Enforcing the ImagePullPolicy implies never using image versions from the cache. The decision is central, has somewhat greater influence, requires fewer changes to the workflow and, lastly, is recommended for security reasons.