마침내 '120초 동안 작업이 중단되었습니다' 오류에서 추적을 얻을 수 있었습니다. 우리 서버는 로드 급증 시 무작위로 충돌할 수 있습니다. IO 대기 시간은 정상적인 것 같습니다. 메모리/스왑은 괜찮습니다. 이는 Xen 기반 Ubuntu 10.04 - 64비트입니다. VPS, 일부는 심지어 "베어메탈"입니다.
아래는 추적입니다.
Aug 24 11:13:25 admin1 kernel: [2732107.501749] INFO: task java:29952 blocked for more than 120 seconds.
Aug 24 11:13:25 admin1 kernel: [2732107.501954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 24 11:13:25 admin1 kernel: [2732107.502223] java D 0000000000000000 0 29952 1 0x00000000
Aug 24 11:13:25 admin1 kernel: [2732107.502227] ffff8801d3b5fbb8 0000000000000082 ffffffff81a35800 ffff8801d635be8c
Aug 24 11:13:25 admin1 kernel: [2732107.502230] 0000000000014cc0 ffff88014fcc5b80 0000000000014cc0 ffff88014fcc5f18
Aug 24 11:13:25 admin1 kernel: [2732107.502233] ffff8801d3b5ffd8 ffff88014fcc5f20 ffff8801d3b5e010 0000000000014cc0
Aug 24 11:13:25 admin1 kernel: [2732107.502236] Call Trace:
Aug 24 11:13:25 admin1 kernel: [2732107.502243] [<ffffffff81589bd7>] __mutex_lock_slowpath+0xf7/0x180
Aug 24 11:13:25 admin1 kernel: [2732107.502247] [<ffffffff81589abb>] mutex_lock+0x2b/0x50
Aug 24 11:13:25 admin1 kernel: [2732107.502251] [<ffffffff8122633a>] __log_wait_for_space+0xda/0x1b0
Aug 24 11:13:25 admin1 kernel: [2732107.502255] [<ffffffff814fed2f>] ? inet_recvmsg+0x4f/0x80
Aug 24 11:13:25 admin1 kernel: [2732107.502258] [<ffffffff812220e1>] start_this_handle+0xf1/0x440
Aug 24 11:13:25 admin1 kernel: [2732107.502260] [<ffffffff812225c1>] ? journal_start+0x81/0x100
Aug 24 11:13:25 admin1 kernel: [2732107.502263] [<ffffffff812225f5>] journal_start+0xb5/0x100
Aug 24 11:13:25 admin1 kernel: [2732107.502266] [<ffffffff811d3e71>] ext3_journal_start_sb+0x31/0x60
Aug 24 11:13:25 admin1 kernel: [2732107.502269] [<ffffffff811ca63d>] ext3_dirty_inode+0x3d/0xa0
Aug 24 11:13:25 admin1 kernel: [2732107.502273] [<ffffffff811776ca>] __mark_inode_dirty+0x3a/0x1f0
Aug 24 11:13:25 admin1 kernel: [2732107.502275] [<ffffffff811ca7f5>] ext3_setattr+0x155/0x2c0
Aug 24 11:13:25 admin1 kernel: [2732107.502279] [<ffffffff810097ce>] ? __switch_to+0x26e/0x320
Aug 24 11:13:25 admin1 kernel: [2732107.502283] [<ffffffff8116c2c5>] notify_change+0x175/0x2f0
Aug 24 11:13:25 admin1 kernel: [2732107.502285] [<ffffffff81588781>] ? schedule+0x411/0xa50
Aug 24 11:13:25 admin1 kernel: [2732107.502289] [<ffffffff81153064>] do_truncate+0x64/0xa0
Aug 24 11:13:25 admin1 kernel: [2732107.502291] [<ffffffff81153d7b>] ? generic_file_llseek+0x5b/0x70
Aug 24 11:13:25 admin1 kernel: [2732107.502294] [<ffffffff811532d8>] sys_ftruncate+0xe8/0xf0
Aug 24 11:13:25 admin1 kernel: [2732107.502297] [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
그리고 'unzip'이 실행되었을 때 바로 아래에 또 다른 하나가 있습니다. 위의 추적과 정확히 같은 시간입니다.
Aug 24 11:13:25 admin1 kernel: [2732107.503946] unzip D 0000000000000000 0 28123 1 0x00000000
Aug 24 11:13:25 admin1 kernel: [2732107.503951] ffff880031e156e8 0000000000000086 0000000000000000 0000000000000000
Aug 24 11:13:25 admin1 kernel: [2732107.503954] 0000000000014cc0 ffff8801183f5b80 0000000000014cc0 ffff8801183f5f18
Aug 24 11:13:25 admin1 kernel: [2732107.503957] ffff880031e15fd8 ffff8801183f5f20 ffff880031e14010 0000000000014cc0
Aug 24 11:13:25 admin1 kernel: [2732107.503960] Call Trace:
Aug 24 11:13:25 admin1 kernel: [2732107.503964] [<ffffffff81227745>] log_wait_commit+0xc5/0x150
Aug 24 11:13:25 admin1 kernel: [2732107.503967] [<ffffffff81081150>] ? autoremove_wake_function+0x0/0x40
Aug 24 11:13:25 admin1 kernel: [2732107.503969] [<ffffffff8158b08e>] ? _raw_spin_lock+0xe/0x20
Aug 24 11:13:25 admin1 kernel: [2732107.503972] [<ffffffff81225ddb>] log_do_checkpoint+0xab/0x530
Aug 24 11:13:25 admin1 kernel: [2732107.503976] [<ffffffff81036bd6>] ? __ticket_spin_lock+0x16/0x20
Aug 24 11:13:25 admin1 kernel: [2732107.503979] [<ffffffff81226306>] __log_wait_for_space+0xa6/0x1b0
Aug 24 11:13:25 admin1 kernel: [2732107.503982] [<ffffffff81070974>] ? mod_timer+0x174/0x2d0
Aug 24 11:13:25 admin1 kernel: [2732107.503985] [<ffffffff812220e1>] start_this_handle+0xf1/0x440
Aug 24 11:13:25 admin1 kernel: [2732107.503987] [<ffffffff812225c1>] ? journal_start+0x81/0x100
Aug 24 11:13:25 admin1 kernel: [2732107.503990] [<ffffffff812225f5>] journal_start+0xb5/0x100
Aug 24 11:13:25 admin1 kernel: [2732107.503993] [<ffffffff811d3e71>] ext3_journal_start_sb+0x31/0x60
Aug 24 11:13:25 admin1 kernel: [2732107.503995] [<ffffffff811ccbe2>] ext3_write_begin+0xa2/0x230
Aug 24 11:13:25 admin1 kernel: [2732107.503998] [<ffffffff8110029e>] generic_file_buffered_write+0xfe/0x250
Aug 24 11:13:25 admin1 kernel: [2732107.504001] [<ffffffff811dc65f>] ? ext3_xattr_security_get+0x2f/0x40
Aug 24 11:13:25 admin1 kernel: [2732107.504004] [<ffffffff81102df0>] __generic_file_aio_write+0x240/0x470
Aug 24 11:13:25 admin1 kernel: [2732107.504006] [<ffffffff81103085>] generic_file_aio_write+0x65/0xd0
Aug 24 11:13:25 admin1 kernel: [2732107.504009] [<ffffffff8115370a>] do_sync_write+0xda/0x120
Aug 24 11:13:25 admin1 kernel: [2732107.504012] [<ffffffff812941a8>] ? apparmor_file_permission+0x18/0x20
Aug 24 11:13:25 admin1 kernel: [2732107.504015] [<ffffffff8126366f>] ? security_file_permission+0x1f/0x80
Aug 24 11:13:25 admin1 kernel: [2732107.504017] [<ffffffff811539f8>] vfs_write+0xc8/0x190
Aug 24 11:13:25 admin1 kernel: [2732107.504020] [<ffffffff81154251>] sys_write+0x51/0x90
Aug 24 11:13:25 admin1 kernel: [2732107.504023] [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
누구든지 도와줄 수 있나요?
파일 시스템의 교착 상태입니까?
티아.
답변1
이러한 스택 추적은 둘 다 당시 파일 시스템 쓰기 작업이 시도 중이었음을 나타내므로 스토리지 관련 문제를 확인하세요. 네트워크에 마운트된 파일 시스템이 있는 경우 서버에 액세스할 수 있고 정상적으로 작동하는지 확인하세요. 그렇지 않은 경우 디스크 드라이브를 확인하십시오. dmesg
이 경우 메시지를 확인 하고 사용해야 합니다.스마트몬툴즈드라이브 상태를 확인하려면