
Estamos utilizando dos instancias de Percona Server 5.7.20-18-log (mientras tanto actualizado a 5.7.21-21-lo) en una replicación maestro-maestro (dbm1/dbm2). La aplicación solo usa dbm1 para escrituras y lecturas, por lo que dbm2 se usa solo como modo de espera en frío. Desde hace un par de semanas, la replicación en dbm2 se detiene varias veces por semana con el mensaje de error "error desconocido al leer el evento de registro en el maestro".
Aquí está el estado completo del esclavo (dbm2) del último incidente:
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.6.233.101
Master_User: replicator
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.007091
Read_Master_Log_Pos: 71137
Relay_Log_File: mysqld-relay-bin.001603
Relay_Log_Pos: 71350
Relay_Master_Log_File: mysql-bin.007091
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 71137
Relay_Log_Space: 71605
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'unknown error reading log event on the master; the first event 'mysql-bin.007003' at 43445, the last event read from './mysql-bin.007091' at 71137, the last byte read from './mysql-bin.007091' at 71137.'
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 7cb8eccb-e1a0-11e7-9d96-545edb2e572a
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 180308 06:41:25
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
Generalmente solucionamos esto con una restauración completa de dbm2 con XtraBackup pero después de unos días el problema en dbm2 vuelve a aparecer. Como podemos ver en nuestro monitoreo, a ambos servidores les quedan suficientes recursos (y no hay IO-Wait); lo que podemos ver es que esto sucede a menudo cuando la aplicación está realizando algunos trabajos de importación.
Ambos servidores están ubicados en el mismo centro de datos, comunicación a través de la red interna.
Más detalles:
OS: Ubuntu 14.04.5 LTS
Kernel: OpenVZ VPS (Kernel 2.6.32-042stab127.2)
CPU: 8 cores Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
RAM: 16 gb
Percona: 5.7.20-18-log Percona Server (GPL), Release '18', Revision '7ce54a6deef'
mi.cnf:
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[isamchk]
key_buffer_size = 16M
[mysqld]
basedir = /usr
bind_address = *
binlog_cache_size = 1M
binlog_format = mixed
bulk_insert_buffer_size = 64M
datadir = /var/lib/mysql
expire_logs_days = 2
innodb_buffer_pool_dump_at_shutdown = ON
innodb_buffer_pool_load_at_startup = ON
innodb_buffer_pool_size = 12G
innodb_doublewrite = OFF
innodb_flush_log_at_trx_commit = 2
innodb_io_capacity = 1000
innodb_log_file_size = 256M
innodb_read_io_threads = 8
innodb_write_io_threads = 8
join_buffer_size = 244K
key_buffer_size = 100M
log-bin = mysql-bin
log-error = /var/log/mysql/error.log
log_slow_verbosity = full
long_query_time = 1
max_allowed_packet = 16M
max_binlog_files = 200
max_binlog_size = 100M
max_connections = 500
max_heap_table_size = 128M
max_relay_log_size = 256M
max_slowlog_files = 1
max_slowlog_size = 4G
myisam_sort_buffer_size = 64M
optimizer_switch = index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=off,block_nested_loop=off,batched_key_access=on,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on
pid-file = /var/run/mysqld/mysqld.pid
port = 3306
query_cache_limit = 1M
query_cache_size = 128M
query_cache_type = ON
read_buffer_size = 244K
relay_log = mysqld-relay-bin
server_id = 1
skip-external-locking
skip_name_resolve = ON
slow_query_log = ON
slow_query_log_file = /var/log/mysql/mysqld.slowlog
socket = /var/run/mysqld/mysqld.sock
sort_buffer_size = 2047K
sql_mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
ssl = false
ssl-ca = /etc/mysql/cacert.pem
ssl-cert = /etc/mysql/server-cert.pem
ssl-key = /etc/mysql/server-key.pem
thread_cache_size = 8
thread_stack = 256K
tmp_table_size = 128M
tmpdir = /tmp
user = mysql
userstat = ON
[mysqld-5.0]
myisam-recover = BACKUP
[mysqld-5.1]
myisam-recover = BACKUP
[mysqld-5.5]
myisam-recover = BACKUP
[mysqld-5.6]
myisam-recover-options = BACKUP
[mysqld-5.7]
myisam-recover-options = BACKUP
[mysqld_safe]
log-error = /var/log/mysql/error.log
nice = 0
socket = /var/run/mysqld/mysqld.sock
[mysqldump]
max_allowed_packet = 16M
quick
quote-names
¿Alguna idea sobre cómo mitigar la causa raíz de este tipo de error de replicación?
Edite para responder preguntas de @wilson-hauck:
Informe MySQLTuner de dbm1
>> MySQLTuner 1.7.9 - Major Hayden <[email protected]>
>> Bug reports, feature requests, and downloads at mysqltuner.com
>> Run with '--help' for additional options and output filtering
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.7.20-18-log
[OK] Operating on 64-bit architecture
-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/log/mysql/error.log(0B)
[OK] Log file /var/log/mysql/error.log exists
[OK] Log file /var/log/mysql/error.log is readable.
[!!] Log file /var/log/mysql/error.log is empty
[OK] Log file /var/log/mysql/error.log is smaller than 32 Mb
[OK] /var/log/mysql/error.log doesn't contain any warning.
[OK] /var/log/mysql/error.log doesn't contain any error.
[--] 0 start(s) detected in /var/log/mysql/error.log
[--] 0 shutdown(s) detected in /var/log/mysql/error.log
-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +BLACKHOLE +CSV -FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA
[--] Data in MEMORY tables: 46M (Tables: 2)
[--] Data in MyISAM tables: 9G (Tables: 368)
[--] Data in InnoDB tables: 12G (Tables: 131)
[OK] Total fragmented tables: 0
-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] There is no basic password file list!
-------- CVE Security Recommendations --------------------------------------------------------------
[--] Skipped due to --cvefile option undefined
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 6d 19h 49m 31s (3B q [5K qps], 7M conn, TX: 5877G, RX: 1160G)
[--] Reads / Writes: 94% / 6%
[--] Binary logging is enabled (GTID MODE: OFF)
[--] Physical Memory : 16.0G
[--] Max MySQL memory : 13.8G
[--] Other process memory: 160.2M
[--] Total buffers: 12.4G global + 3.0M per thread (500 max threads)
[--] P_S Max memory usage: 72B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 13.1G (82.10% of installed RAM)
[!!] Maximum possible memory usage: 13.8G (86.35% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (31K/3B)
[OK] Highest usage of available connections: 53% (266/500)
[OK] Aborted connections: 0.13% (9830/7378007)
[!!] Query cache may be disabled by default due to mutex contention.
[OK] Query cache efficiency: 53.3% (1B cached / 3B selects)
[!!] Query cache prunes per day: 93123042
[OK] Sorts requiring temporary tables: 0% (261K temp sorts / 112M sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 12% (4M on disk / 35M total)
[OK] Thread cache hit rate: 88% (883K created / 7M connections)
[!!] Table cache hit rate: 0% (2K open / 360K opened)
[OK] Open file limit used: 28% (1K/5K)
[OK] Table locks acquired immediately: 99% (1B immediate / 1B locks)
[OK] Binlog cache memory access: 100.00% (47747694 Memory / 47748535 Total)
-------- Performance schema ------------------------------------------------------------------------
[--] Memory used by P_S: 72B
[--] Sys schema is installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 8 thread(s).
[!!] thread_pool_size between 16 and 36 when using InnoDB storage engine.
-------- MyISAM Metrics ----------------------------------------------------------------------------
[OK] Key buffer used: 100.0% (104M used / 104M cache)
[OK] Key buffer size / total MyISAM indexes: 100.0M/3.0G
[OK] Read Key buffer hit rate: 99.6% (41B cached / 152M reads)
[!!] Write Key buffer hit rate: 14.2% (1B cached / 242M writes)
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 12.0G/12.8G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (4.16666666666667 %): 256.0M * 2/12.0G should be equal 25%
[!!] InnoDB buffer pool instances: 8
[--] Number of InnoDB Buffer Pool Chunk : 96 for 8 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 100.00% (55125068296 hits/ 55125632976 total)
[OK] InnoDB Write log efficiency: 92.59% (140220341 hits/ 151440226 total)
[OK] InnoDB log waits: 0.00% (0 waits / 11219885 writes)
-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is disabled.
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- RocksDB Metrics ---------------------------------------------------------------------------
[--] RocksDB is disabled.
-------- Spider Metrics ----------------------------------------------------------------------------
[--] Spider is disabled.
-------- Connect Metrics ---------------------------------------------------------------------------
[--] Connect is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[!!] This replication slave is running with the read_only option disabled.
[OK] This replication slave is up to date with master.
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Reduce your overall MySQL memory footprint for system stability
Increasing the query_cache size over 128M may reduce performance
Increase table_open_cache gradually to avoid file descriptor limits
Read this before increasing table_open_cache over 64:
This is MyISAM only table_cache scalability problem, InnoDB not affected.
See more details here:
This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
Beware that open_files_limit (5000) variable
should be greater than table_open_cache (2000)
Thread pool size for InnoDB usage (8)
Read this before changing innodb_log_file_size and/or innodb_log_files_in_group:
Variables to adjust:
query_cache_size (=0)
query_cache_type (=0)
query_cache_size (> 128M) [see warning above]
table_open_cache (> 2000)
thread_pool_size between 16 and 36 for InnoDB usage
innodb_buffer_pool_size (>= 12G) if possible.
innodb_log_file_size should be (=1G) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
innodb_buffer_pool_instances(=12)
Informe MySQLTuner de dbm2
>> MySQLTuner 1.7.9 - Major Hayden <[email protected]>
>> Bug reports, feature requests, and downloads at mysqltuner.com
>> Run with '--help' for additional options and output filtering
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.7.20-18-log
[OK] Operating on 64-bit architecture
-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/log/mysql/error.log(0B)
[OK] Log file /var/log/mysql/error.log exists
[OK] Log file /var/log/mysql/error.log is readable.
[!!] Log file /var/log/mysql/error.log is empty
[OK] Log file /var/log/mysql/error.log is smaller than 32 Mb
[OK] /var/log/mysql/error.log doesn't contain any warning.
[OK] /var/log/mysql/error.log doesn't contain any error.
[--] 0 start(s) detected in /var/log/mysql/error.log
[--] 0 shutdown(s) detected in /var/log/mysql/error.log
-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +BLACKHOLE +CSV -FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA
[--] Data in InnoDB tables: 12G (Tables: 131)
[--] Data in MEMORY tables: 46M (Tables: 2)
[--] Data in MyISAM tables: 9G (Tables: 368)
[!!] Total fragmented tables: 2
-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] There is no basic password file list!
-------- CVE Security Recommendations --------------------------------------------------------------
[--] Skipped due to --cvefile option undefined
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 6d 21h 4m 15s (89K q [0.150 qps], 39K conn, TX: 325M, RX: 52G)
[--] Reads / Writes: 0% / 100%
[--] Binary logging is enabled (GTID MODE: OFF)
[--] Physical Memory : 20.0G
[--] Max MySQL memory : 13.8G
[--] Other process memory: 166.5M
[--] Total buffers: 12.4G global + 3.0M per thread (500 max threads)
[--] P_S Max memory usage: 72B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 12.4G (61.86% of installed RAM)
[OK] Maximum possible memory usage: 13.8G (69.08% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 2% (1K/89K)
[OK] Highest usage of available connections: 0% (3/500)
[!!] Aborted connections: 24.95% (9904/39702)
[!!] Query cache may be disabled by default due to mutex contention.
[!!] Query cache efficiency: 0.0% (0 cached / 29K selects)
[OK] Query cache prunes per day: 0
[!!] Sorts requiring temporary tables: 7837% (26K temp sorts / 342 sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 0% (392 on disk / 1M total)
[OK] Thread cache hit rate: 99% (3 created / 39K connections)
[!!] Table cache hit rate: 14% (1K open / 7K opened)
[OK] Open file limit used: 13% (657/5K)
[OK] Table locks acquired immediately: 100% (30M immediate / 30M locks)
[OK] Binlog cache memory access: 0% (0 Memory / 0 Total)
-------- Performance schema ------------------------------------------------------------------------
[--] Memory used by P_S: 72B
[--] Sys schema is installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 8 thread(s).
[!!] thread_pool_size between 16 and 36 when using InnoDB storage engine.
-------- MyISAM Metrics ----------------------------------------------------------------------------
[OK] Key buffer used: 99.9% (104M used / 104M cache)
[OK] Key buffer size / total MyISAM indexes: 100.0M/3.0G
[OK] Read Key buffer hit rate: 98.8% (6B cached / 73M reads)
[!!] Write Key buffer hit rate: 14.9% (1B cached / 186M writes)
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 12.0G/12.7G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (4.16666666666667 %): 256.0M * 2/12.0G should be equal 25%
[!!] InnoDB buffer pool instances: 8
[--] Number of InnoDB Buffer Pool Chunk : 96 for 8 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.99% (2954768806 hits/ 2955078519 total)
[OK] InnoDB Write log efficiency: 95.93% (111839144 hits/ 116582132 total)
[OK] InnoDB log waits: 0.00% (0 waits / 4742988 writes)
-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is disabled.
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- RocksDB Metrics ---------------------------------------------------------------------------
[--] RocksDB is disabled.
-------- Spider Metrics ----------------------------------------------------------------------------
[--] Spider is disabled.
-------- Connect Metrics ---------------------------------------------------------------------------
[--] Connect is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] This server is acting as master for 1 server(s).
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[!!] This replication slave is not running but seems to be configured.
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
OPTIMIZE TABLE `REDACTED`.`REDACTED`; -- can free 191.703922271729 MB
OPTIMIZE TABLE `REDACTED`.`REDACTED`; -- can free 143.067314147949 MB
Total freed space after theses OPTIMIZE TABLE : 334.771236419678 Mb
Reduce or eliminate unclosed connections and network issues
Increase table_open_cache gradually to avoid file descriptor limits
Read this before increasing table_open_cache over 64:
This is MyISAM only table_cache scalability problem, InnoDB not affected.
See more details here:
This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
Beware that open_files_limit (5000) variable
should be greater than table_open_cache (2000)
Thread pool size for InnoDB usage (8)
Read this before changing innodb_log_file_size and/or innodb_log_files_in_group:
Variables to adjust:
query_cache_size (=0)
query_cache_type (=0)
query_cache_limit (> 1M, or use smaller result sets)
sort_buffer_size (> 1M)
read_rnd_buffer_size (> 256K)
table_open_cache (> 2000)
thread_pool_size between 16 and 36 for InnoDB usage
innodb_buffer_pool_size (>= 12G) if possible.
innodb_log_file_size should be (=1G) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
innodb_buffer_pool_instances(=12)
ulimites de dbm1
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514886
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 514886
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
ulimites de dbm2
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1030587
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1030587
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
MOSTRAR ESTADO GLOBAL @ dbm1:https://gist.github.com/adriankirchner/cfffedfcc0699798e6c9276b09094ee1
MOSTRAR ESTADO GLOBAL @ dbm2:https://gist.github.com/adriankirchner/b021d8943f59c9a2448caf5dba6f89af
MOSTRAR VARIABLES GLOBALES @ dbm1:https://gist.github.com/adriankirchner/b644072f063f80c2f41c604d5551eb18
MOSTRAR VARIABLES GLOBALES @ dbm2:https://gist.github.com/adriankirchner/fa0ce51eb30a7368a30242fc9acba93b
Desafortunadamente, iostat no funciona dentro de contenedores OpenVZ. Para estar seguro, aumenté la RAM en ambos contenedores de 16 GB a 20 GB.
El historial de monitoreo de fallas de replicación se puede encontrar aquí:https://i.stack.imgur.com/18Dff.png
Actualización 2018-04-30 Wilson nos proporcionó una serie de ajustes de rendimiento que poco a poco implementamos en esta configuración. ¡Muchas gracias por esto! Mientras tanto actualizamos ambos servidores a Percona.5.7.21-21-registropero los errores de replicación aún ocurren. Cuando profundizo en la posición binlog indicada en el registro de errores ([..]el último evento leído desde './mysql-bin.007091' en 71137, el último byte leído desde './mysql-bin.007091' en71137.) esta es la declaración que causa el error:
# at 71049
#180308 6:41:25 server id 1 end_log_pos 71137 CRC32 0x8bbeaab8 Query thread_id=61875722 exec_time=0 error_code=0
SET TIMESTAMP=1520487685/*!*/;
COMMIT
/*!*/;
# at 71137 <---- position from the error message
#180308 6:41:25 server id 1 end_log_pos 71202 CRC32 0x87513662 Anonymous_GTID last_committed=140 sequence_number=141 rbr_only=no
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 71202
#180308 6:41:25 server id 1 end_log_pos 71289 CRC32 0xda97e104 Query thread_id=61875722 exec_time=0 error_code=0
SET TIMESTAMP=1520487685/*!*/;
BEGIN
/*!*/;
# at 71289
#180308 6:41:25 server id 1 end_log_pos 71631 CRC32 0x7c566dd1 Query thread_id=61875722 exec_time=0 error_code=0
SET TIMESTAMP=1520487685/*!*/;
update some_tables set some_col = 'some_value' where some_pk = 'some_value' <-- REDACTED
/*!*/;
# at 71631
#180308 6:41:25 server id 1 end_log_pos 71719 CRC32 0x37a2a3a9 Query thread_id=61875722 exec_time=0 error_code=0
SET TIMESTAMP=1520487685/*!*/;
COMMIT
/*!*/;
Entonces, la declaración que causa el error es SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
: verifiqué esto con el binlog de otra interrupción de la replicación. Hay muy pocos registros sobre esto: -https://bugs.mysql.com/bug.php?id=85480
-https://groups.google.com/forum/#!topic/percona-discussion/u1k85oY5DB8
Ambos están operando con GTID_MODE=ON
(nosotros usamos GTID_MODE=OFF
) y enfrentan mensajes de error totalmente diferentes en el estado esclavo.
Respuesta1
Este error suele ser causado por una falla de hardware o un error tan oscuro que no se ha encontrado en más de una década. No ha mencionado nada sobre el hardware, por lo que la primera pregunta obvia es si tiene memoria ECC y si ve algún error MCE en su syslog, o si los sensores muestran sobrecalentamiento en alguno de los componentes. Si siempre ve esto en los mismos servidores (pero no en TODOS los servidores), entonces una falla en el hardware podría ser una explicación muy probable.
Obviamente, la solución óptima es reinicializar completamente la réplica, ya que eso evitará cualquier riesgo de que se produzcan anomalías.
Dicho esto, esto también es claramente desagradable si tiene terabytes de datos, por lo que puede haber una manera de restaurar la replicación sin reconstruir el esclavo por completo. Utilice mysqlbinlog en el maestro para identificar dónde comienzan las transacciones y busque las transacciones adyacentes cerca de las posiciones 71137 (su Exec_Master_Log_Pos
) y 43445 (donde el error dice que se ahogó al leer el binlog). Existe una gran posibilidad de que las posiciones enumeradas sean realmente erróneas y que ninguna transacción en el binlog comience en esas posiciones.
Si ese es el caso, puede restablecer sus coordenadas de replicación en el esclavo a las últimas mysqlbinlog
listas de transacciones, es decirantesla posición en la que el esclavo cree que está. Puede encontrar más detalles sobre qué hacer exactamente en esteartículo de solución de problemas de MySQL. Existe un riesgo asociado con hacer esto (si las coordenadas se corrompieron, no se sabe qué más se corrompió), pero si realizar la reinicialización completa de inmediato es prohibitivamente costoso, es posible que puedas ganar algo de tiempo usando este enfoque. al menos hasta que pueda verificar (por ejemplo, pt-table-checksum
si tiene que reconstruir todo el conjunto de datos de réplica).
Respuesta2
Sugerencias para considerar en la sección my.cnf-ini [mysqld]
innodb_buffer_pool_size=8G # from 12G only 16G on server and MyISAM involved
#sort_buffer_size=2047K lead with # for default
#read_buffer_size=244K lead with # for default
#join_buffer_size=244K lead with # for default
thread_cache_size=16 # from 8 to avoid threads_created as often
query_cache_min_res_unit=512 # from 4096 to reduce QC RAM reqd per SELECT stored
Dedique 2 minutos a MySQLCalculator.com para reconocer el uso extremo de RAM que requerirían sus variables my.cnf-ini actuales. Sólo 16G disponibles.
Además, publique el informe completo de Mysqltuner que incluirá la versión de MySQL que está utilizando, por espacio de datos de ENGINE utilizado y recuentos de tablas, utilización de caché, subprocesos creados y otra información útil.
Publique los resultados completos de A) ulimit -a para que sepamos los límites con los que funciona su sistema operativo y B) iostat -x para lectura/escritura de IOPS extendida en sus unidades cuando esté ocupada.