我們正在為一個客戶管理一台 MySQL 伺服器,該伺服器擁有超過 100 個資料庫,每個資料庫大約有 50 個表,其中許多是 InnoDB 表。伺服器崩潰了,我正在尋找罪魁禍首。使用 重新啟動時innodb_force_recovery = 2
,我可以連線並且在 error.log 中看不到任何錯誤。更重要的是,mysqlcheck --all-databases
所有表都報告“Ok”。但是當我刪除 時innodb_force_recovery
,伺服器再次崩潰,將堆疊追蹤寫入 error.log 並且只能使用 停止kill -9
。
在這種情況下我應該如何找到有問題的資料庫以及什麼會導致 mysqlcheck 錯過損壞的表?請不要告訴我忽略它並從轉儲中恢復所有資料庫。對於一兩個資料庫來說,如果這種情況千載難逢,這可能是可以接受的,但我在同一台伺服器上多次遇到問題,並且從轉儲中恢復所有內容只需要太多的時間和手動工作來做到這一點時間。
伺服器版本為 5.5.46 並且innodb_file_per_table
處於活動狀態。
根據要求摘錄 error.log(這是否The tablespace free space info is corrupt
意味著存在不在特定表中且無法糾正的錯誤?):
180222 17:13:48 mysqld_safe Starting mysqld daemon with databases from /home/mysql
180222 17:13:48 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
180222 17:13:48 [Note] /usr/libexec/mysqld (mysqld 5.5.46) starting as process 26242 ...
180222 17:13:48 [Note] Plugin 'FEDERATED' is disabled.
180222 17:13:48 InnoDB: The InnoDB memory heap is disabled
180222 17:13:48 InnoDB: Mutexes and rw_locks use InnoDB's own implementation
180222 17:13:48 InnoDB: Compressed tables use zlib 1.2.3
180222 17:13:48 InnoDB: Using Linux native AIO
180222 17:13:48 InnoDB: Initializing buffer pool, size = 128.0M
180222 17:13:49 InnoDB: Completed initialization of buffer pool
180222 17:13:49 InnoDB: highest supported file format is Barracuda.
180222 17:13:49 InnoDB: Waiting for the background threads to start
180222 17:13:50 InnoDB: 5.5.46 started; log sequence number 1632912830888
180222 17:13:50 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
180222 17:13:50 [Note] - '0.0.0.0' resolves to '0.0.0.0';
180222 17:13:50 [Note] Server socket created on IP: '0.0.0.0'.
180222 17:13:50 [Note] Event Scheduler: Loaded 0 events
180222 17:13:50 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.46' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL) by Remi
InnoDB: Dump of the tablespace extent descriptor: len 40; hex 000000000000000200000000061600000000126e00000004ffffffffffffffffffffffffffffbfaa; asc n ;
InnoDB: Serious error! InnoDB is trying to free page 512
InnoDB: though it is already marked as free in the tablespace!
InnoDB: The tablespace free space info is corrupt.
InnoDB: You may need to dump your InnoDB tables and recreate the whole
InnoDB: database!
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180222 17:13:50 InnoDB: Assertion failure in thread 2499464080 in file fsp0fsp.c line 3309
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:13:50 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=0
max_threads=512
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1314506 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x33)[0x842a1f3]
/usr/libexec/mysqld(handle_fatal_signal+0x42b)[0x82d9d3b]
[0x7bc420]
[0x7bc410]
/lib/libc.so.6(gsignal+0x50)[0x626b10]
/lib/libc.so.6(abort+0x101)[0x628421]
/usr/libexec/mysqld[0x85012e7]
/usr/libexec/mysqld[0x850147e]
/usr/libexec/mysqld[0x849c0b1]
/usr/libexec/mysqld[0x84a8a61]
/usr/libexec/mysqld[0x8561fef]
/usr/libexec/mysqld[0x85570a9]
/usr/libexec/mysqld[0x847b082]
/usr/libexec/mysqld[0x846bf04]
/usr/libexec/mysqld[0x846dad4]
/lib/libpthread.so.0[0x50d912]
/lib/libc.so.6(clone+0x5e)[0x6d347e]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180222 17:13:50 mysqld_safe Number of processes running now: 0
180222 17:13:50 mysqld_safe mysqld restarted
答案1
錯誤訊息本身告訴您為什麼驗證表資料不能解決問題。它試圖釋放一個頁面(可能是從表或索引中);但該頁面已被標記為免費。換句話說,當您的一個表或索引正在使用某個頁面時,該頁面被認為可提供給另一個表或索引。而且,顯然,如果 InnoDB 不知道哪些頁面實際上是免費的,那麼就會發生不好的事情。
轉儲所有表/索引資料並重新加載,意味著 InnoDB 有機會重建其空閒頁面集合。理想情況下,您可以在新資料庫上執行此操作。為什麼?嗯,你永遠不應該假設只有一個、單一的、單獨的損壞錯誤。因此,如果您將資料移至全新安裝,則不必擔心是否有其他未偵測到的損壞問題。
答案2
正如所建議的,轉儲和從這些轉儲中恢復是我最終選擇恢復所有資料庫的方式。幸運的是,innodb_force_recovery = 2
它允許我毫無錯誤地轉儲所有內容,這樣我就不必使用備份中的轉儲。當然,我寧願找出錯誤的真正原因,但除了錯誤報告中提到「表空間可用空間資訊已損壞」之外,MySQL 沒有提供任何幫助。如果不找出並消除錯誤的原因,我預計它最終會再次發生 - 我們的客戶可能會比我更生氣。
也許應該歸咎於壞硬件,但係統中所有磁碟的 SMART 數據看起來都不錯,並且/var/log/messages
在崩潰時不包含任何可疑內容。也沒有出現意外斷電或重新啟動的情況。