目前我發現 CPU 使用率很高events/1
,我想知道如何找出造成這種情況的原因?
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
0: 575075290 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge i8042
8: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge rtc0
9: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
18: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ata_piix
19: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ata_piix
20: 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1
21: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb5, uhci_hcd:usb8
22: 48 0 0 0 203 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4, uhci_hcd:usb7
23: 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb6
32: 7511 0 0 0 0 0 0 169991 0 0 0 0 0 0 0 0 IO-APIC-fasteoi megasas
49: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0
50: 14013 79342 0 0 13699 5817 0 0 916084 0 0 0 43447 0 0 0 PCI-MSI-edge eth0-TxRx-0
51: 21 0 0 0 0 0 0 0 0 491153 0 0 0 0 0 0 PCI-MSI-edge eth0-TxRx-1
52: 16 0 0 0 0 0 0 0 0 0 0 0 0 490363 0 0 PCI-MSI-edge eth0-TxRx-2
53: 15 0 0 0 0 0 0 0 0 0 512295 0 0 0 0 0 PCI-MSI-edge eth0-TxRx-3
54: 14 0 0 0 0 0 0 0 0 0 0 0 0 0 2137545 0 PCI-MSI-edge eth0-TxRx-4
55: 14 0 0 0 0 0 0 0 0 0 0 472322 0 0 0 0 PCI-MSI-edge eth0-TxRx-5
56: 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1261400 PCI-MSI-edge eth0-TxRx-6
57: 46039 101974 16307 0 177472 22446 0 0 78146 0 0 0 3060 0 0 0 PCI-MSI-edge eth0-TxRx-7
NMI: 116990 104030 89718 76478 57282 42770 27106 7229 11177 12608 16074 17471 15123 17563 17220 9457 Non-maskable interrupts
LOC: 1240959 513359079 608524106 453845650 545193366 480747439 402785555 456482461 620998409 526207907 405289993 406272537 426647321 459716091 532029492 578607757 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 116990 104030 89718 76478 57282 42770 27106 7229 11177 12608 16074 17471 15123 17563 17220 9457 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IRQ work interrupts
RES: 3518 3786 1864 535 5509 4985 1982 2083 2640 1854 1547 1067 2261 2259 1744 1742 Rescheduling interrupts
CAL: 57062 229 228 228 3541 227 228 225 222 210 222 226 212 212 224 217 Function call interrupts
TLB: 10184 9632 3623 3081 15017 12586 3242 2803 7624 33023 5225 4085 4565 45383 7271 4827 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 1185 Machine check polls
ERR: 0
MIS: 0
這可能是硬體問題嗎?
編輯:
ps auxf
答案1
好的,感謝您補充的資訊。所以,events/1 就是問題所在。此事件/表示事件/CPU_NO。引入事件執行緒作為核心喚醒和工作的一種方式,取代了 keventd。然而,在 v3.9-rc8 git 樹中,我仍然看到 keventd 和 events/per_cpu 。
我的假設是 irq 57 正在佔用 CPU 1。工作量幾乎是平衡的。你看得到 irq 是 57 嗎?
另外,老實說,我認為我們無法使用 top、ps 等傳統方法來調試這個問題。很微妙的是,這需要進行效能分析。如果你能得到,請告訴我。如果你確實得到了,你需要把它郵寄給我,因為你可以把那麼多數據放在這裡。