
Buscando consejos sobre cómo diagnosticar una carga alta de espacio en el sistema/kernel.
Configuración
- Docker CE en CentOS 7 (sin SELinux)
- Carga de trabajo de IA/ML basada en Python en contenedores.
- El multiprocesamiento de Python se utiliza con ((n/2)-2) trabajadores del grupo
- n => núcleos cpu/vcpu (1 hiperproceso)
Problema general
- En el contenedor Bare metal de nuestra imagen acoplable se ejecuta como se esperaba.
- En el contenedor GCP basado en la misma imagen se ejecuta lento, 3 o 4 veces más lento.
- Los AI Pipelines que originalmente demoraban 2 días ahora demoran 4/5 días.
Cuestión específica
- GCP superior -> alta carga de espacio del sistema/kernel (30-50).
- Parte superior de metal desnudo (BM) -> carga baja del sistema (1-10).
- Procesos individuales de BM como máximo 100-120 %CPU.
- Procesos individuales de GCP entre 300 y 500 % de CPU.
- Promedio de carga de BM 11,85, 32,87, 19,43
- PCG LA 43,82, 32,87, 19,43
Lo que intentamos
- Establecer privilegios completos = verdadero al ejecutar la ventana acoplable
- Duplicó la vCPU en GCP (32->64)
- Máquina usada tipo C2 (Compute Intensive).
Como no tengo idea de cómo diagnosticar este problema de arriba hacia abajo, también probé perf para ver qué está sucediendo en el espacio del kernel/sistema, a continuación se muestra el resultado de ambos sistemas.
GCP VM 'rastreo de rendimiento'
Summary of events:
python (2551), 6580977 events, 3.9%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
sched_yield 3232286 50396.443 0.002 0.016 68.851 0.69%
futex 58139 23715.636 0.001 0.408 21989.789 92.73%
brk 1643 312.000 0.002 0.190 19.195 13.42%
close 5 35.710 3.630 7.142 18.245 39.07%
openat 4 12.764 2.710 3.191 3.767 7.16%
write 165 9.681 0.003 0.059 6.496 66.95%
ioctl 85 0.605 0.002 0.007 0.028 9.58%
lseek 182 0.460 0.002 0.003 0.004 1.70%
read 10 0.104 0.003 0.010 0.019 15.84%
fstat 4 0.030 0.006 0.007 0.008 6.21%
python (2847), 10240661 events, 6.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 2909 269175.495 0.001 92.532 21907.964 13.70%
sched_yield 5119771 63387.192 0.002 0.012 98.500 0.60%
ython (2848), 10952416 events, 6.4%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 2944 269333.572 0.001 91.486 21908.135 13.70%
sched_yield 5475608 62191.735 0.002 0.011 54.373 0.56%
python (2849), 10095381 events, 5.9%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 2869 269495.562 0.001 93.934 21907.785 13.69%
sched_yield 5046875 63404.111 0.002 0.013 69.027 0.53%
python (2850), 10871629 events, 6.4%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3379 269541.688 0.001 79.770 21908.129 13.71%
sched_yield 5434655 61881.631 0.002 0.011 92.192 0.66%
python (2851), 10471715 events, 6.2%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3267 269665.818 0.001 82.542 21908.306 13.70%
sched_yield 5234762 62440.899 0.002 0.012 70.764 0.59%
python (2852), 11048544 events, 6.5%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3075 269845.352 0.001 87.755 21908.402 13.69%
sched_yield 5523384 61269.561 0.002 0.011 77.207 0.58%
python (2853), 10802913 events, 6.4%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3411 269995.415 0.001 79.154 21907.717 13.70%
sched_yield 5400121 61586.390 0.000 0.011 63.498 0.63%
python (2854), 10580227 events, 6.2%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3237 270029.557 0.001 83.420 21907.737 13.69%
sched_yield 5289038 62372.649 0.002 0.012 50.329 0.56%
python (2855), 11269016 events, 6.6%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3309 270088.861 0.001 81.623 21908.143 13.69%
sched_yield 5633482 61010.474 0.002 0.011 59.756 0.54%
python (2856), 11194619 events, 6.6%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3355 270348.851 0.001 80.581 21917.988 13.68%
sched_yield 5596230 61122.634 0.002 0.011 52.892 0.60%
python (2857), 11236749 events, 6.6%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 2559 273140.199 0.001 106.737 21916.745 13.53%
sched_yield 5618107 61171.202 0.002 0.011 74.198 0.59%
python (2858), 10328444 events, 6.1%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 2759 270424.625 0.001 98.015 21916.130 13.66%
sched_yield 5163708 62490.561 0.002 0.012 95.956 0.65%
python (2859), 11507002 events, 6.8%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3065 270481.725 0.001 88.249 21910.849 13.67%
sched_yield 5752618 60412.759 0.002 0.011 96.043 0.69%
python (2860), 11013249 events, 6.5%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3254 270691.944 0.001 83.187 21912.965 13.67%
sched_yield 5505249 60965.211 0.002 0.011 69.026 0.62%
python (2861), 11793518 events, 6.9%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 2851 270700.999 0.001 94.949 21911.187 13.66%
sched_yield 5896062 60165.571 0.002 0.010 69.030 0.69%
python (2862), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (2863), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (2864), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (2865), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
ARRIBA (VM de GCP)
top - 11:09:17 up 3 min, 1 user, load average: 35.10, 12.57, 4.53
Tasks: 337 total, 15 running, 322 sleeping, 0 stopped, 0 zombie
%Cpu(s): 52.0 us, 43.1 sy, 0.0 ni, 4.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
GiB Mem : 28.1 total, 24.3 free, 3.3 used, 0.6 buff/cache
GiB Swap: 0.0 total, 0.0 free, 0.0 used. 24.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2501 root 20 0 2955128 169656 7612 R 380.5 0.6 3:24.55 python
2502 root 20 0 2959472 176016 7612 R 307.6 0.6 3:17.58 python
2495 root 20 0 2959088 176400 7608 R 288.4 0.6 3:35.93 python
2496 root 20 0 2933612 151452 7600 R 262.9 0.5 3:27.08 python
2494 root 20 0 2933356 153248 7604 R 253.3 0.5 3:22.47 python
2489 root 20 0 2949156 166916 7652 R 250.7 0.6 3:13.78 python
2500 root 20 0 2941704 153928 7632 R 225.5 0.5 3:24.54 python
2499 root 20 0 2950132 164576 7620 R 210.6 0.6 3:14.79 python
2493 root 20 0 2936304 151568 7608 R 177.2 0.5 3:28.18 python
2490 root 20 0 2949248 169748 7596 R 170.9 0.6 3:16.57 python
2497 root 20 0 2932684 156128 7600 R 167.9 0.5 3:13.12 python
2491 root 20 0 2942860 165400 7608 R 149.0 0.6 3:17.05 python
2492 root 20 0 2947632 163008 7604 R 117.2 0.6 3:20.01 python
2498 root 20 0 2935784 153348 7600 R 76.2 0.5 3:14.54 python
2943 root 20 0 162376 2560 1588 R 0.7 0.0 0:00.08 top
1 root 20 0 191028 4024 2592 S 0.0 0.0 0:01.68 systemd
Bare Metal 'rastreo de perforación'
Summary of events:
python (113694), 88758 events, 2.7%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3174 13767.708 0.001 4.338 13745.177 99.84%
sched_yield 39353 130.937 0.002 0.003 3.972 3.91%
brk 1132 91.005 0.002 0.080 4.163 15.54%
write 101 1.928 0.002 0.019 0.673 34.59%
lseek 110 0.310 0.001 0.003 0.032 12.49%
openat 2 0.234 0.112 0.117 0.122 4.49%
ioctl 36 0.162 0.002 0.005 0.010 7.15%
read 6 0.033 0.003 0.005 0.008 16.09%
fstat 2 0.008 0.004 0.004 0.004 2.01%
close 3 0.007 0.002 0.002 0.003 5.55%
python (114304), 3210496 events, 97.3%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 3640 97379.595 0.000 26.753 13930.078 20.68%
sched_yield 1599811 3995.576 0.001 0.002 15.957 0.91%
python (114305), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114306), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114307), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114308), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114309), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114310), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114311), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114312), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114313), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114314), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114315), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114316), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114317), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114318), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114319), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114320), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114321), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114322), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114323), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114324), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114325), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114326), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114327), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114328), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114329), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114330), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114331), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114332), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114333), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114334), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114335), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114336), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114337), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114338), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114339), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114340), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
python (114341), 1 events, 0.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
futex 1 0.000 0.000 0.000 0.000 0.00%
Aparentemente hay demasiados eventos en GCP, específicamente futex
y sched_yield
, y ambos claramente consumen mucho tiempo en el espacio del sistema.
Gracias