Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes fails to create a checkpoint using CRIU #2594

Open
jmLee4 opened this issue Feb 9, 2025 · 5 comments
Open

Kubernetes fails to create a checkpoint using CRIU #2594

jmLee4 opened this issue Feb 9, 2025 · 5 comments

Comments

@jmLee4
Copy link

jmLee4 commented Feb 9, 2025

When sending a POST request to kubelet, I get the following response:

checkpointing of default/user-5c57b749d8-4bx8t/hotel-reserv-user failed (rpc error: code = Unknown desc = failed to checkpoint container 87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d: running "/usr/bin/runc" ["checkpoint" "--file-locks" "--image-path" "/var/lib/containers/storage/overlay-containers/87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d/userdata/checkpoint" "--work-path" "/var/lib/containers/storage/overlay-containers/87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d/userdata" "--leave-running" "87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d"] failed: /usr/bin/runc --root /run/runc --systemd-cgroup checkpoint --file-locks --image-path /var/lib/containers/storage/overlay-containers/87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d/userdata --leave-running 87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d failed: time="2025-02-10T01:38:52+08:00" level=error msg="criu failed: type NOTIFY errno 0\nlog file: /var/lib/containers/storage/overlay-containers/87ea0f1e6cf80fca7c22afee97cbea6ba749686c6942a0e86d22b42523562c7d/userdata/dump.log"

Strangely, when I try to create Checkpoint with a simple Nginx application, most attempts succeed, but sometimes I encounter this error. Could this be an environmental issue?

Here is dump.log:
dump.log

Here is some potentially useful environment information:

CRIU Version: 3.17.1:

criu check --all -v4
...
(00.016909) netlink: Collect netlink sock 0x34d7
(00.016911) netlink: Collect netlink sock 0x26
(00.016912) netlink: Collect netlink sock 0x6094
(00.016914) netlink: Collect netlink sock 0x20
(00.016919) Warn  (criu/cr-check.c:1231): clone3() with set_tid not supported
(00.016921) Error (criu/cr-check.c:1273): Time namespaces are not supported
(00.016923) Error (criu/cr-check.c:1283): IFLA_NEW_IFINDEX isn't supported
(00.016925) Warn  (criu/cr-check.c:1300): Pidfd store requires pidfd_open syscall which is not supported
(00.016967) Warn  (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support
(00.016970) Warn  (criu/cr-check.c:804): ptrace(PTRACE_GET_RSEQ_CONFIGURATION) isn't supported. C/R of processes which are using rseq() won't work.
(00.017046) Warn  (criu/cr-check.c:1160): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.
Other env info
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
$ uname -a
Linux host5 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue Jun 4 14:43:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ crio --version
crio version 1.28.4
Version:        1.28.4
GitCommit:      unknown
GitCommitDate:  unknown
GitTreeState:   clean
BuildDate:      2024-06-10T21:59:35Z
GoVersion:      go1.20.12
Compiler:       gc
Platform:       linux/amd64
Linkmode:       dynamic
BuildTags:
  rpm_crashtraceback
  exclude_graphdriver_btrfs
  btrfs_noversion
  exclude_graphdriver_devicemapper
  libdm_no_deferred_remove
  seccomp
  selinux
  containers_image_openpgp
LDFlags:          -X github.com/cri-o/cri-o/internal/pkg/criocli.DefaultsPath= -X  github.com/cri-o/cri-o/internal/version.buildDate=2024-06-10T21:59:35Z -X  github.com/cri-o/cri-o/internal/version.gitCommit=c5fc2a463053cf988db2aebe9b762700484922e5 -X  github.com/cri-o/cri-o/internal/version.version=1.28.4 -X  github.com/cri-o/cri-o/internal/version.gitTreeState=clean  -B 0x63be025121120488d3d1f85d94f9095e97f334a8 -extldflags '-Wl,-z,relro  ' -compressdwarf=false
SeccompEnabled:   true
AppArmorEnabled:  false
@jmLee4
Copy link
Author

jmLee4 commented Feb 9, 2025

@adrianreber Could you help me take a look at this? Thank you

@adrianreber
Copy link
Member

Don't use CentOS 7. It is EOL and was never a supported platform for CRIU.

In your case you have an established TCP connection and currently the only way to handle this is by using a CRIU configuration file to tell CRIU to handle it.

@jmLee4
Copy link
Author

jmLee4 commented Feb 9, 2025

Later I will consider abandoning CentOS 7, but for now, how should I use CRIU configuration file?

@adrianreber
Copy link
Member

Take a look at the man page or the wiki. It is all documented there.

@jmLee4
Copy link
Author

jmLee4 commented Feb 9, 2025

I'm not quite sure which configurations should be added to the CRIU configuration file to resolve the current issue, and whether checkpoint generation through kubelet would still work after such configuration. Could you elaborate on this more detail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants