데비안 서버가 예기치 않게 계속 다시 시작됩니다

데비안 서버가 예기치 않게 계속 다시 시작됩니다

Debian-Wheezy-7.8-Stable을 사용하는 내 연구실 서버는 몇 시간 동안 가동된 후에도 알림 없이 몇 번 계속 다시 시작됩니다. 이 서버는 병렬 계산뿐만 아니라 상당히 높은 부하의 수치 계산을 위해 설정되었습니다. 의 로그 var/log/messages를 인쇄했지만 last reboot이 로그 메시지를 이해하기가 어려웠습니다. 재부팅 시간이 발생하기 직전에 항목을 살펴보고 동일한 시간에도 살펴보았지만 재부팅이 발생한 후에는 로그/메시지만 표시되는 var/log/messages항목인 것 같습니다 .var/log/messages

인터넷 서핑을 해보니 같은 문제를 겪는 사람도 있지만 원인이 서로 다른 것 같아 /var/log/messages문제를 살펴보는 열쇠인 것 같습니다. var/log/messages원치 않는 재부팅 이벤트와 관련하여 실제로 무엇을 설명합니까 ? 초보자를 위해 이 로그를 읽는 방법에 대한 학습을 ​​시작하는 방법은 무엇입니까? 찾아야 할 중요한 키워드가 있나요?

귀하가 제공할 수 있는 도움에 감사드립니다.

last reboot

reboot   system boot  3.2.0-4-amd64    Wed May 20 03:29 - 12:43  (09:14)
reboot   system boot  3.2.0-4-amd64    Tue May 19 16:01 - 12:43  (20:42)

var/log/messages

May 18 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 19 07:35:01 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2400" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May 19 16:01:19 labserver kernel: imklog 5.8.11, log source = /proc/kmsg started.
May 19 16:01:19 labserver rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2401" x-info="http://www.rsyslog.com"] start
May 19 16:01:19 labserver kernel: [    0.000000] Initializing cgroup subsys cpuset
May 19 16:01:19 labserver kernel: [    0.000000] Initializing cgroup subsys cpu
May 19 16:01:19 labserver kernel: [    0.000000] Linux version 3.2.0-4-amd64 ([email protected]) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.65-1+deb7u2
May 19 16:01:19 labserver kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text
May 19 16:01:19 labserver kernel: [    0.000000] BIOS-provided physical RAM map:
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000000100000 - 000000007df71000 (usable)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007df71000 - 000000007e0f1000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007e0f1000 - 000000007e2ec000 (ACPI NVS)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007e2ec000 - 000000007f367000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 000000007f367000 - 000000007f800000 (ACPI NVS)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000080000000 - 0000000090000000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 00000000fed1c000 - 00000000fed40000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
May 19 16:01:19 labserver kernel: [    0.000000]  BIOS-e820: 0000000100000000 - 0000000880000000 (usable)
May 19 16:01:19 labserver kernel: [    0.000000] NX (Execute Disable) protection: active
May 19 16:01:19 labserver kernel: [    0.000000] SMBIOS 2.7 present.
May 19 16:01:19 labserver kernel: [    0.000000] No AGP bridge found
May 19 16:01:19 labserver kernel: [    0.000000] last_pfn = 0x880000 max_arch_pfn = 0x400000000
May 19 16:01:19 labserver kernel: [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
May 19 16:01:19 labserver kernel: [    0.000000] last_pfn = 0x7df71 max_arch_pfn = 0x400000000
May 19 16:01:19 labserver kernel: [    0.000000] found SMP MP-table at [ffff8800000fd900] fd900
May 19 16:01:19 labserver kernel: [    0.000000] Using GB pages for direct mapping
May 19 16:01:19 labserver kernel: [    0.000000] init_memory_mapping: 0000000000000000-000000007df71000
May 19 16:01:19 labserver kernel: [    0.000000] init_memory_mapping: 0000000100000000-0000000880000000
May 19 16:01:19 labserver kernel: [    0.000000] RAMDISK: 36bea000 - 375ed000
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: RSDP 00000000000f04a0 00024 (v02 ALASKA)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: XSDT 000000007e204088 0008C (v01 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: FACP 000000007e211040 0010C (v05 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI Warning: FADT (revision 5) is longer than ACPI 2.0 version, truncating length 268 to 244 (20110623/tbfadt-288)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: DSDT 000000007e2041a8 0CE96 (v02 ALASKA    A M I 00000015 INTL 20051117)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: FACS 000000007e2e3080 00040
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: APIC 000000007e211150 00100 (v03 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: FPDT 000000007e211250 00044 (v01 ALASKA    A M I 01072009 AMI  00010013)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: MCFG 000000007e211298 0003C (v01 ALASKA OEMMCFG. 01072009 MSFT 00000097)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: HPET 000000007e2112d8 00038 (v01 ALASKA    A M I 01072009 AMI. 00000005)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: PRAD 000000007e211310 000BE (v02 PRADID  PRADTID 00000001 MSFT 03000001)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: SPMI 000000007e2113d0 00040 (v05 A M I   OEMSPMI 00000000 AMI. 00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: SSDT 000000007e211410 D0CB0 (v02  INTEL    CpuPm 00004000 INTL 20051117)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: EINJ 000000007e2e20c0 00130 (v01    AMI AMI EINJ 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: ERST 000000007e2e21f0 00230 (v01  AMIER AMI ERST 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: HEST 000000007e2e2420 000A8 (v01    AMI AMI HEST 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: BERT 000000007e2e24c8 00030 (v01    AMI AMI BERT 00000000      00000000)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: DMAR 000000007e2e24f8 000C4 (v01 A M I   OEMDMAR 00000001 INTL 00000001)
May 19 16:01:19 labserver kernel: [    0.000000] No NUMA configuration found
May 19 16:01:19 labserver kernel: [    0.000000] Faking a node at 0000000000000000-0000000880000000
May 19 16:01:19 labserver kernel: [    0.000000] Initmem setup node 0 0000000000000000-0000000880000000
May 19 16:01:19 labserver kernel: [    0.000000]   NODE_DATA [000000087fffb000 - 000000087fffffff]
May 19 16:01:19 labserver kernel: [    0.000000] Zone PFN ranges:
May 19 16:01:19 labserver kernel: [    0.000000]   DMA      0x00000010 -> 0x00001000
May 19 16:01:19 labserver kernel: [    0.000000]   DMA32    0x00001000 -> 0x00100000
May 19 16:01:19 labserver kernel: [    0.000000]   Normal   0x00100000 -> 0x00880000
May 19 16:01:19 labserver kernel: [    0.000000] Movable zone start PFN for each node
May 19 16:01:19 labserver kernel: [    0.000000] early_node_map[3] active PFN ranges
May 19 16:01:19 labserver kernel: [    0.000000]     0: 0x00000010 -> 0x0000009a
May 19 16:01:19 labserver kernel: [    0.000000]     0: 0x00000100 -> 0x0007df71
May 19 16:01:19 labserver kernel: [    0.000000]     0: 0x00100000 -> 0x00880000
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: PM-Timer IO Port: 0x408
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] enabled)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1])
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
May 19 16:01:19 labserver kernel: [    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24])
May 19 16:01:19 labserver kernel: [    0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
May 19 16:01:19 labserver kernel: [    0.000000] Using ACPI (MADT) for SMP configuration information
May 19 16:01:19 labserver kernel: [    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
May 19 16:01:19 labserver kernel: [    0.000000] SMP: Allowing 12 CPUs, 0 hotplug CPUs
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000000009a000 - 000000000009b000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000000009b000 - 00000000000a0000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007df71000 - 000000007e0f1000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007e0f1000 - 000000007e2ec000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007e2ec000 - 000000007f367000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007f367000 - 000000007f800000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 000000007f800000 - 0000000080000000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 0000000080000000 - 0000000090000000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 0000000090000000 - 00000000fed1c000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000fed1c000 - 00000000fed40000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000fed40000 - 00000000ff000000
May 19 16:01:19 labserver kernel: [    0.000000] PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
May 19 16:01:19 labserver kernel: [    0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
May 19 16:01:19 labserver kernel: [    0.000000] Booting paravirtualized kernel on bare hardware
May 19 16:01:19 labserver kernel: [    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:12 nr_node_ids:1
May 19 16:01:19 labserver kernel: [    0.000000] PERCPU: Embedded 27 pages/cpu @ffff88087fc00000 s78848 r8192 d23552 u131072
May 19 16:01:19 labserver kernel: [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 8258294
May 19 16:01:19 labserver kernel: [    0.000000] Policy zone: Normal
May 19 16:01:19 labserver kernel: [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-amd64 root=UUID=1fc245ac-9058-4208-862a-7f4e8e1b20b2 ro text
May 19 16:01:19 labserver kernel: [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
May 19 16:01:19 labserver kernel: [    0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
May 19 16:01:19 labserver kernel: [    0.000000] Checking aperture...
May 19 16:01:19 labserver kernel: [    0.000000] No AGP bridge found
May 19 16:01:19 labserver kernel: [    0.000000] Memory: 32975732k/35651584k available (3434k kernel code, 2130964k absent, 544888k reserved, 3305k data, 576k init)
May 19 16:01:19 labserver kernel: [    0.000000] Hierarchical RCU implementation.
May 19 16:01:19 labserver kernel: [    0.000000]    RCU dyntick-idle grace-period acceleration is enabled.
May 19 16:01:19 labserver kernel: [    0.000000] NR_IRQS:33024 nr_irqs:1184 16
May 19 16:01:19 labserver kernel: [    0.000000] Extended CMOS year: 2000
May 19 16:01:19 labserver kernel: [    0.000000] Console: colour VGA+ 80x25
May 19 16:01:19 labserver kernel: [    0.000000] console [tty0] enabled
May 19 16:01:19 labserver kernel: [    0.000000] Fast TSC calibration using PIT
May 19 16:01:19 labserver kernel: [    0.004000] Detected 2100.074 MHz processor.
May 19 16:01:19 labserver kernel: [    0.000003] Calibrating delay loop (skipped), value calculated using timer frequency.. 4200.14 BogoMIPS (lpj=8400296)
May 19 16:01:19 labserver kernel: [    0.000144] pid_max: default: 32768 minimum: 301
May 19 16:01:19 labserver kernel: [    0.000253] Security Framework initialized
May 19 16:01:19 labserver kernel: [    0.000324] AppArmor: AppArmor disabled by boot time parameter
May 19 16:01:19 labserver kernel: [    0.002355] Dentry cache hash table entries: 4194304 (order: 13, 33554432 bytes)
May 19 16:01:19 labserver kernel: [    0.011585] Inode-cache hash table entries: 2097152 (order: 12, 16777216 bytes)
May 19 16:01:19 labserver kernel: [    0.015724] Mount-cache hash table entries: 256
May 19 16:01:19 labserver kernel: [    0.015915] Initializing cgroup subsys cpuacct
May 19 16:01:19 labserver kernel: [    0.015986] Initializing cgroup subsys memory
May 19 16:01:19 labserver kernel: [    0.016063] Initializing cgroup subsys devices
May 19 16:01:19 labserver kernel: [    0.016133] Initializing cgroup subsys freezer
May 19 16:01:19 labserver kernel: [    0.016201] Initializing cgroup subsys net_cls
May 19 16:01:19 labserver kernel: [    0.016270] Initializing cgroup subsys blkio
May 19 16:01:19 labserver kernel: [    0.016344] Initializing cgroup subsys perf_event
May 19 16:01:19 labserver kernel: [    0.016441] CPU: Physical Processor ID: 0
May 19 16:01:19 labserver kernel: [    0.016509] CPU: Processor Core ID: 0
May 19 16:01:19 labserver kernel: [    0.017564] mce: CPU supports 23 MCE banks
May 19 16:01:19 labserver kernel: [    0.017670] CPU0: Thermal monitoring enabled (TM1)
May 19 16:01:19 labserver kernel: [    0.017768] using mwait in idle threads.
May 19 16:01:19 labserver kernel: [    0.018315] ACPI: Core revision 20110623
May 19 16:01:19 labserver kernel: [    0.049889] DMAR: Host address width 46
May 19 16:01:19 labserver kernel: [    0.049958] DMAR: DRHD base: 0x000000fbffc000 flags: 0x1
May 19 16:01:19 labserver kernel: [    0.050034] IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de
May 19 16:01:19 labserver kernel: [    0.050122] DMAR: RMRR base: 0x0000007f239000 end: 0x0000007f247fff
May 19 16:01:19 labserver kernel: [    0.050195] DMAR: ATSR flags: 0x0
May 19 16:01:19 labserver kernel: [    0.050261] DMAR: RHSA base: 0x000000fbffc000 proximity domain: 0x0
May 19 16:01:19 labserver kernel: [    0.050427] IOAPIC id 0 under DRHD base  0xfbffc000 IOMMU 0
May 19 16:01:19 labserver kernel: [    0.050497] IOAPIC id 2 under DRHD base  0xfbffc000 IOMMU 0
May 19 16:01:19 labserver kernel: [    0.050568] HPET id 0 under DRHD base 0xfbffc000
May 19 16:01:19 labserver kernel: [    0.050741] Enabled IRQ remapping in x2apic mode
May 19 16:01:19 labserver kernel: [    0.050810] Enabling x2apic
May 19 16:01:19 labserver kernel: [    0.050875] Enabled x2apic
May 19 16:01:19 labserver kernel: [    0.050943] Switched APIC routing to cluster x2apic.
May 19 16:01:19 labserver kernel: [    0.051552] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
May 19 16:01:19 labserver kernel: [    0.091256] CPU0: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping 04
May 19 16:01:19 labserver kernel: [    0.195570] Performance Events: PEBS fmt1+, generic architected perfmon, Intel PMU driver.
May 19 16:01:19 labserver kernel: [    0.195802] ... version:                3
May 19 16:01:19 labserver kernel: [    0.195869] ... bit width:              48
May 19 16:01:19 labserver kernel: [    0.195936] ... generic registers:      4
May 19 16:01:19 labserver kernel: [    0.196003] ... value mask:             0000ffffffffffff
May 19 16:01:19 labserver kernel: [    0.196073] ... max period:             000000007fffffff
May 19 16:01:19 labserver kernel: [    0.196143] ... fixed-purpose events:   3
May 19 16:01:19 labserver kernel: [    0.196210] ... event mask:             000000070000000f
May 19 16:01:19 labserver kernel: [    0.196468] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.196637] Booting Node   0, Processors  #1
May 19 16:01:19 labserver kernel: [    0.312587] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.312765]  #2
May 19 16:01:19 labserver kernel: [    0.424400] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.424578]  #3
May 19 16:01:19 labserver kernel: [    0.536316] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.536489]  #4
May 19 16:01:19 labserver kernel: [    0.648124] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.648303]  #5
May 19 16:01:19 labserver kernel: [    0.759941] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.760115]  #6
May 19 16:01:19 labserver kernel: [    0.871864] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.872050]  #7
May 19 16:01:19 labserver kernel: [    0.983690] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    0.983866]  #8
May 19 16:01:19 labserver kernel: [    1.095600] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.095774]  #9
May 19 16:01:19 labserver kernel: [    1.207414] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.207589]  #10
May 19 16:01:19 labserver kernel: [    1.319223] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.319400]  #11 Ok.
May 19 16:01:19 labserver kernel: [    1.431095] NMI watchdog enabled, takes one hw-pmu counter.
May 19 16:01:19 labserver kernel: [    1.431192] Brought up 12 CPUs
May 19 16:01:19 labserver kernel: [    1.431260] Total of 12 processors activated (50398.84 BogoMIPS).
May 19 16:01:19 labserver kernel: [    1.450786] devtmpfs: initialized
May 19 16:01:19 labserver kernel: [    1.455360] PM: Registering ACPI NVS region at 7e0f1000 (2076672 bytes)
May 19 16:01:19 labserver kernel: [    1.455494] PM: Registering ACPI NVS region at 7f367000 (4820992 bytes)
May 19 16:01:19 labserver kernel: [    1.455843] print_constraints: dummy: 
May 19 16:01:19 labserver kernel: [    1.455977] NET: Registered protocol family 16
May 19 16:01:19 labserver kernel: [    1.456140] ACPI: bus type pci registered
May 19 16:01:19 labserver kernel: [    1.456268] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
May 19 16:01:19 labserver kernel: [    1.456361] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
May 19 16:01:19 labserver kernel: [    1.466673] PCI: Using configuration type 1 for base access
May 19 16:01:19 labserver kernel: [    1.468173] bio: create slab <bio-0> at 0
May 19 16:01:19 labserver kernel: [    1.468353] ACPI: Added _OSI(Module Device)
May 19 16:01:19 labserver kernel: [    1.468422] ACPI: Added _OSI(Processor Device)
May 19 16:01:19 labserver kernel: [    1.468491] ACPI: Added _OSI(3.0 _SCP Extensions)
May 19 16:01:19 labserver kernel: [    1.468560] ACPI: Added _OSI(Processor Aggregator Device)
May 19 16:01:19 labserver kernel: [    1.484562] ACPI: Executed 1 blocks of module-level executable AML code
May 19 16:01:19 labserver kernel: [    1.727818] ACPI: Interpreter enabled
May 19 16:01:19 labserver kernel: [    1.727891] ACPI: (supports S0 S1 S4 S5)
May 19 16:01:19 labserver kernel: [    1.728159] ACPI: Using IOAPIC for interrupt routing
May 19 16:01:19 labserver kernel: [    1.736531] ACPI: No dock devices found.
May 19 16:01:19 labserver kernel: [    1.736630] HEST: Table parsing has been initialized.
May 19 16:01:19 labserver kernel: [    1.736704] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
May 19 16:01:19 labserver kernel: [    1.737041] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
May 19 16:01:19 labserver kernel: [    1.737361] pci_root PNP0A08:00: host bridge window [io  0x0000-0x03af]
May 19 16:01:19 labserver kernel: [    1.737435] pci_root PNP0A08:00: host bridge window [io  0x03e0-0x0cf7]
May 19 16:01:19 labserver kernel: [    1.737508] pci_root PNP0A08:00: host bridge window [io  0x03b0-0x03df]
May 19 16:01:19 labserver kernel: [    1.737586] pci_root PNP0A08:00: host bridge window [io  0x0d00-0xffff]
May 19 16:01:19 labserver kernel: [    1.737659] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
May 19 16:01:19 labserver kernel: [    1.737747] pci_root PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff]
May 19 16:01:19 labserver kernel: [    1.737834] pci_root PNP0A08:00: host bridge window [mem 0xfed0e000-0xfed0ffff]
May 19 16:01:19 labserver kernel: [    1.737922] pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xfbffffff]
May 19 16:01:19 labserver kernel: [    1.740791] pci 0000:00:01.0: PCI bridge to [bus 01-01]
May 19 16:01:19 labserver kernel: [    1.745575] pci 0000:00:01.1: PCI bridge to [bus 02-03]
May 19 16:01:19 labserver kernel: [    1.745700] pci 0000:00:02.0: PCI bridge to [bus 04-04]
May 19 16:01:19 labserver kernel: [    1.745816] pci 0000:00:03.0: PCI bridge to [bus 05-05]
May 19 16:01:19 labserver kernel: [    1.745933] pci 0000:00:03.2: PCI bridge to [bus 06-06]
May 19 16:01:19 labserver kernel: [    1.746285] pci 0000:00:11.0: PCI bridge to [bus 07-07]
May 19 16:01:19 labserver kernel: [    1.746541] pci 0000:00:1e.0: PCI bridge to [bus 08-08] (subtractive decode)
May 19 16:01:19 labserver kernel: [    1.747170]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May 19 16:01:19 labserver kernel: [    1.747465]  pci0000:00: ACPI _OSC control (0x15) granted
May 19 16:01:19 labserver kernel: [    1.756901] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus ff])
May 19 16:01:19 labserver kernel: [    1.758443]  pci0000:ff: Requesting ACPI _OSC control (0x1d)
May 19 16:01:19 labserver kernel: [    1.758528]  pci0000:ff: ACPI _OSC control (0x1d) granted
May 19 16:01:19 labserver kernel: [    1.759439] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.760105] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.760768] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 10 11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.761383] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15)
May 19 16:01:19 labserver kernel: [    1.762006] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [    1.762729] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [    1.763450] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0
May 19 16:01:19 labserver kernel: [    1.764170] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *7 10 11 12 14 15)

답변1

추가 정보, 특히 시스템을 재부팅하기 직전에 로그 항목을 제공해야 합니다. 그러나 내가 아는 한 더 많은 정보를 제공하지 못할 수도 있습니다. syslog와 같은 다른 로그를 확인하세요.

실제로 무엇이 잘못되었는지 표시하지 않고 갑자기 다시 시작하는 가장 일반적인 원인은 하드웨어와 관련된 경우가 많습니다. 그렇지 않으면 커널은 대부분 단서를 제공하기 위해 로그에 무언가를 작성할 기회를 갖게 됩니다.

갑자기 다시 시작되는 몇 가지 일반적인 원인은 다음과 같습니다.

  • 과열, 아마도 주요 원인은 온도에 대한 아이디어를 얻고 이를 기록해 보십시오. 서버에 온도를 표시할 수 있는 디스플레이가 있습니까? 방이 적절하게 냉각되어 있습니까? CPU를 덮고 있는 방열판의 열 화합물을 교체할 수도 있습니다.

  • 잘못된 하드웨어 또는 드라이버, "lspci"를 사용하여 목록을 얻으십시오. 예를 들어 잘못된 DIMM으로 인해 시스템이 갑자기 중단되거나 재부팅될 수 있습니다(DIMM, CPU 및 카드 재장착). 인텔 이더넷 카드 문제로 인해 가끔 재부팅되는 서버가 기억납니다. 때로는 불량 디스크로 인해 이러한 문제가 발생할 수도 있지만 일반적으로 다시 시작하지 않고 중단될 뿐입니다.

  • 나쁜 UPS, 배터리로 지원되는 UPS가 천천히 나빠지는 것을 기억하는데 그 징후 중 하나는 연결된 서버의 정기적인 주간 전원 주기였습니다. 전원 주기 일정이 잘못 구성되었을 수 있습니다.

관련 정보