Estoy ejecutando un servidor CentOS 6.2 1 GB de memoria 98 GB de tamaño de disco
Elasticsearch versión 1.4.2 con alrededor de 9000 documentos y usando JDBC para recuperar los registros de una base de datos mysql
En las últimas semanas se ha estrellado una media de 2 veces al día. Cuando esto sucede, no se puede acceder al servidor hasta que se reinicia. (sin acceso ssh, error 500 al intentar acceder al sitio web, etc.)
Reiniciar Elasticsearch con frecuencia parece evitar que el servidor se caiga, pero prefiero encontrar y solucionar el origen del problema. Estos son los registros de un accidente que ocurrió hace unos días:
Elasticsearch.log
[2015-10-04 10:20:58,247][WARN ][monitor.jvm ] [Speedball] [gc][young][152940][2627] duration [2.2s], collections [1]/[3.9s], total [2.2s]/[12.5m], memory [89.8mb]->[35.9mb]/[197.2mb], all_pools {[young] [54.1mb]->[110.1kb]/[54.5mb]}{[survivor] [472.9kb]->[656.6kb]/[6.7mb]}{[old] [35.2mb]->[35.2mb]/[136mb]}
[2015-10-04 10:21:05,538][INFO ][index.shard.service ] [Speedball] [rondreis][0] updating refresh_interval from [1s] to [-1000]
[2015-10-04 10:21:06,795][INFO ][index.shard.service ] [Speedball] [tblproductoffer][0] updating refresh_interval from [1s] to [-1000]
[2015-10-04 10:22:54,653][WARN ][monitor.jvm ] [Speedball] [gc][young][153002][2628] duration [1.3s], collections [1]/[1.9s], total [1.3s]/[12.5m], memory [90.3mb]->[36.8mb]/[197.2mb], all_pools {[young] [54.5mb]->[24kb]/[54.5mb]}{[survivor] [656.6kb]->[1.5mb]/[6.7mb]}{[old] [35.2mb]->[35.2mb]/[136mb]}
[2015-10-04 10:22:57,406][INFO ][index.shard.service ] [Speedball] [rondreis][0] updating refresh_interval from [-1000] to [1s]
[2015-10-04 10:22:59,971][WARN ][monitor.jvm ] [Speedball] [gc][young][153006][2630] duration [1.7s], collections [1]/[2.2s], total [1.7s]/[12.5m], memory [83.1mb]->[36.8mb]/[197.2mb], all_pools {[young] [42.8mb]->[45.3kb]/[54.5mb]}{[survivor] [5mb]->[835.8kb]/[6.7mb]}{[old] [35.2mb]->[35.9mb]/[136mb]}
[2015-10-04 10:23:08,215][INFO ][river.jdbc.RiverMetrics ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@f106d7f complete: river jdbc/rondreis metrics: 3304 rows, 0.1295726757143051 mean, (6.604331351617896 1.365257914334395 0.45761655479553015), ingest metrics: elapsed 2 minutes 2 seconds, 915.27 KB bytes, 283.0 bytes avg, 0.007 MB/s
[2015-10-04 10:25:33,021][WARN ][monitor.jvm ] [Speedball] [gc][young][153151][2632] duration [1.5s], collections [1]/[1s], total [1.5s]/[12.5m], memory [64.7mb]->[91.3mb]/[197.2mb], all_pools {[young] [27.8mb]->[54.5mb]/[54.5mb]}{[survivor] [917.6kb]->[2.1mb]/[6.7mb]}{[old] [35.9mb]->[35.9mb]/[136mb]}
[2015-10-04 10:25:43,724][INFO ][monitor.jvm ] [Speedball] [gc][young][153154][2639] duration [1.4s], collections [2]/[2.2s], total [1.4s]/[12.6m], memory [91.7mb]->[45.3mb]/[197.2mb], all_pools {[young] [48.4mb]->[168.9kb]/[54.5mb]}{[survivor] [4.3mb]->[4.2mb]/[6.7mb]}{[old] [39mb]->[40.9mb]/[136mb]}
[2015-10-04 10:25:57,116][WARN ][monitor.jvm ] [Speedball] [gc][young][153165][2641] duration [3s], collections [1]/[3.3s], total [3s]/[12.7m], memory [90.7mb]->[52.1mb]/[197.2mb], all_pools {[young] [44.5mb]->[2kb]/[54.5mb]}{[survivor] [4.2mb]->[6.7mb]/[6.7mb]}{[old] [41.8mb]->[45.4mb]/[136mb]}
[2015-10-04 10:26:06,283][WARN ][monitor.jvm ] [Speedball] [gc][young][153171][2642] duration [3.6s], collections [1]/[3.9s], total [3.6s]/[12.7m], memory [89.4mb]->[54.8mb]/[197.2mb], all_pools {[young] [37.2mb]->[4.3kb]/[54.5mb]}{[survivor] [6.7mb]->[5mb]/[6.7mb]}{[old] [45.4mb]->[49.8mb]/[136mb]}
[2015-10-04 10:26:12,194][WARN ][monitor.jvm ] [Speedball] [gc][young][153172][2643] duration [4.9s], collections [1]/[5.9s], total [4.9s]/[12.8m], memory [54.8mb]->[57.9mb]/[197.2mb], all_pools {[young] [4.3kb]->[4.1kb]/[54.5mb]}{[survivor] [5mb]->[3.7mb]/[6.7mb]}{[old] [49.8mb]->[54.2mb]/[136mb]}
[2015-10-04 10:26:38,171][WARN ][monitor.jvm ] [Speedball] [gc][young][153176][2644] duration [3s], collections [1]/[22.2s], total [3s]/[12.9m], memory [101.9mb]->[42.2mb]/[197.2mb], all_pools {[young] [43.9mb]->[4.1kb]/[54.5mb]}{[survivor] [3.7mb]->[0b]/[6.7mb]}{[old] [54.2mb]->[42.2mb]/[136mb]}
[2015-10-04 10:26:39,186][WARN ][monitor.jvm ] [Speedball] [gc][old][153176][209] duration [18.3s], collections [1]/[22.2s], total [18.3s]/[4.4m], memory [101.9mb]->[42.2mb]/[197.2mb], all_pools {[young] [43.9mb]->[4.1kb]/[54.5mb]}{[survivor] [3.7mb]->[0b]/[6.7mb]}{[old] [54.2mb]->[42.2mb]/[136mb]}
[2015-10-04 10:26:47,155][WARN ][monitor.jvm ] [Speedball] [gc][young][153182][2646] duration [2.6s], collections [1]/[2.9s], total [2.6s]/[12.9m], memory [102.9mb]->[67.5mb]/[197.2mb], all_pools {[young] [47mb]->[6.1kb]/[54.5mb]}{[survivor] [6.7mb]->[6.7mb]/[6.7mb]}{[old] [49.1mb]->[60.8mb]/[136mb]}
[2015-10-04 10:26:54,247][WARN ][monitor.jvm ] [Speedball] [gc][young][153186][2647] duration [1.4s], collections [1]/[3.1s], total [1.4s]/[12.9m], memory [106mb]->[77.7mb]/[197.2mb], all_pools {[young] [38.4mb]->[842.4kb]/[54.5mb]}{[survivor] [6.7mb]->[6.7mb]/[6.7mb]}{[old] [60.8mb]->[70.2mb]/[136mb]}
[2015-10-04 10:27:03,745][WARN ][monitor.jvm ] [Speedball] [gc][young][153189][2648] duration [1.6s], collections [1]/[7s], total [1.6s]/[13m], memory [118.3mb]->[51.7mb]/[197.2mb], all_pools {[young] [41.3mb]->[8.1kb]/[54.5mb]}{[survivor] [6.7mb]->[0b]/[6.7mb]}{[old] [70.2mb]->[51.7mb]/[136mb]}
[2015-10-04 10:27:58,542][INFO ][index.shard.service ] [Speedball] [tblproductoffer][0] updating refresh_interval from [-1000] to [1s]
[2015-10-04 10:28:27,313][INFO ][river.jdbc.RiverMetrics ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@72a191c1 complete: river jdbc/tblproductoffer metrics: 76480 rows, 2.962182052284631 mean, (98.523212205934 28.85622315906601 10.274778138333478), ingest metrics: elapsed 7 minutes 21 seconds, 16.72 MB bytes, 229.0 bytes avg, 0.038 MB/s
Mensajes
Oct 4 10:47:33 kernel: ps invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Oct 4 10:47:35 kernel: ps cpuset=/ mems_allowed=0
Oct 4 10:47:35 kernel: Pid: 31557, comm: ps Not tainted 2.6.32-504.3.3.el6.x86_64 #1
Oct 4 10:47:35 kernel: Call Trace:
Oct 4 10:47:35 kernel: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Oct 4 10:47:35 kernel: [<ffffffff81127300>] ? dump_header+0x90/0x1b0
Oct 4 10:47:35 kernel: [<ffffffff8122eabc>] ? security_real_capable_noaudit+0x3c/0x70
Oct 4 10:47:35 kernel: [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
Oct 4 10:47:35 kernel: [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
Oct 4 10:47:35 kernel: [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
Oct 4 10:47:35 kernel: [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
Oct 4 10:47:35 kernel: [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
Oct 4 10:47:35 kernel: [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
Oct 4 10:47:35 kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
Oct 4 10:47:35 kernel: [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
Oct 4 10:47:35 kernel: [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
Oct 4 10:47:35 kernel: [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
Oct 4 10:47:35 kernel: [<ffffffff81198b9b>] ? pipe_wait+0x5b/0x80
Oct 4 10:47:35 kernel: [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
Oct 4 10:47:35 kernel: [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
Oct 4 10:47:35 kernel: [<ffffffff8128c245>] ? _atomic_dec_and_lock+0x55/0x80
Oct 4 10:47:35 kernel: [<ffffffff811b07d0>] ? mntput_no_expire+0x30/0x110
Oct 4 10:47:35 kernel: [<ffffffff8118f981>] ? __fput+0x1a1/0x210
Oct 4 10:47:35 kernel: [<ffffffff8152ff7e>] ? do_page_fault+0x3e/0xa0
Oct 4 10:47:35 kernel: [<ffffffff8152d335>] ? page_fault+0x25/0x30
Oct 4 10:47:35 kernel: Mem-Info:
Oct 4 10:47:35 kernel: Node 0 DMA per-cpu:
Oct 4 10:47:35 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Oct 4 10:47:35 kernel: Node 0 DMA32 per-cpu:
Oct 4 10:47:35 kernel: CPU 0: hi: 186, btch: 31 usd: 173
Oct 4 10:47:35 kernel: active_anon:87918 inactive_anon:88009 isolated_anon:5376
Oct 4 10:47:35 kernel: active_file:136 inactive_file:122 isolated_file:332
Oct 4 10:47:35 kernel: unevictable:0 dirty:0 writeback:284 unstable:0
Oct 4 10:47:35 kernel: free:12258 slab_reclaimable:3027 slab_unreclaimable:14974
Oct 4 10:47:35 kernel: mapped:478 shmem:22 pagetables:36389 bounce:0
Oct 4 10:47:35 kernel: Node 0 DMA free:4648kB min:668kB low:832kB high:1000kB active_anon:4400kB inactive_anon:4648kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):768kB isolated(file):48kB present:15368kB mlocked:0kB dirty:0kB writeback:20kB mapped:44kB shmem:0kB slab_reclaimable:36kB slab_unreclaimable:244kB kernel_stack:16kB pagetables:864kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12 all_unreclaimable? no
Oct 4 10:47:35 kernel: lowmem_reserve[]: 0 994 994 994
Oct 4 10:47:35 kernel: Node 0 DMA32 free:44384kB min:44384kB low:55480kB high:66576kB active_anon:347272kB inactive_anon:347388kB active_file:544kB inactive_file:488kB unevictable:0kB isolated(anon):20736kB isolated(file):1280kB present:1018016kB mlocked:0kB dirty:0kB writeback:1116kB mapped:1868kB shmem:88kB slab_reclaimable:12072kB slab_unreclaimable:59652kB kernel_stack:3440kB pagetables:144692kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3034 all_unreclaimable? yes
Oct 4 10:47:35 kernel: lowmem_reserve[]: 0 0 0 0
Oct 4 10:47:35 kernel: Node 0 DMA: 34*4kB 64*8kB 22*16kB 2*32kB 8*64kB 8*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4648kB
Oct 4 10:47:35 kernel: Node 0 DMA32: 2240*4kB 2782*8kB 373*16kB 81*32kB 34*64kB 3*128kB 2*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 44384kB
Oct 4 10:47:35 kernel: 17908 total pagecache pages
Oct 4 10:47:35 kernel: 17292 pages in swap cache
Oct 4 10:47:35 kernel: Swap cache stats: add 2986384, delete 2969092, find 1357308/1573932
Oct 4 10:47:35 kernel: Free swap = 0kB
Oct 4 10:47:35 kernel: Total swap = 2097148kB
Oct 4 10:47:35 kernel: 262127 pages RAM
Oct 4 10:47:35 kernel: 7514 pages reserved
Oct 4 10:47:35 kernel: 132906 pages shared
Oct 4 10:47:35 kernel: 231782 pages non-shared
Oct 4 10:47:35 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Oct 4 10:47:35 kernel: [ 294] 0 294 2725 36 0 -17 -1000 udevd
Oct 4 10:47:35 kernel: [ 812] 0 812 6905 74 0 -17 -1000 auditd
Oct 4 10:47:35 kernel: [ 842] 0 842 62271 98 0 0 0 rsyslogd
Oct 4 10:47:35 kernel: [ 863] 25 863 40013 68 0 0 0 named
Oct 4 10:47:35 kernel: [ 878] 81 878 5358 38 0 0 0 dbus-daemon
Oct 4 10:47:35 kernel: [ 905] 0 905 1020 37 0 0 0 acpid
Oct 4 10:47:35 kernel: [ 925] 0 925 16672 39 0 -17 -1000 sshd
Oct 4 10:47:35 kernel: [ 937] 0 937 13145 14 0 0 0 vsftpd
Oct 4 10:47:35 kernel: [ 974] 0 974 27075 41 0 0 0 mysqld_safe
Oct 4 10:47:35 kernel: [ 1289] 27 1289 237303 9646 0 0 0 mysqld
Oct 4 10:47:35 kernel: [ 1369] 0 1369 4816 35 0 0 0 dovecot
Oct 4 10:47:35 kernel: [ 1375] 97 1375 3246 43 0 0 0 anvil
Oct 4 10:47:35 kernel: [ 1376] 0 1376 3278 45 0 0 0 log
Oct 4 10:47:35 kernel: [ 1378] 0 1378 3589 44 0 0 0 config
Oct 4 10:47:35 kernel: [ 1394] 498 1394 271775 3584 0 0 0 java
Oct 4 10:47:35 kernel: [ 1408] 93 1408 23430 42 0 0 0 exim
Oct 4 10:47:35 kernel: [ 1420] 0 1420 112748 522 0 0 0 httpd
Oct 4 10:47:35 kernel: [ 1437] 0 1437 16578 52 0 0 0 nginx
Oct 4 10:47:35 kernel: [ 1439] 496 1439 16713 139 0 0 0 nginx
Oct 4 10:47:35 kernel: [ 1441] 496 1441 16762 137 0 0 0 nginx
Oct 4 10:47:35 kernel: [ 1442] 496 1442 16579 85 0 0 0 nginx
Oct 4 10:47:35 kernel: [ 1450] 0 1450 13814 11 0 0 0 vesta-nginx
Oct 4 10:47:35 kernel: [ 1452] 502 1452 13814 29 0 0 0 vesta-nginx
Oct 4 10:47:35 kernel: [ 1462] 0 1462 34253 56 0 0 0 vesta-php
Oct 4 10:47:35 kernel: [ 1471] 502 1471 34253 31 0 0 0 vesta-php
Oct 4 10:47:35 kernel: [ 1472] 502 1472 34253 31 0 0 0 vesta-php
Oct 4 10:47:35 kernel: [ 1482] 0 1482 29333 68 0 0 0 crond
Oct 4 10:47:35 kernel: [ 1499] 0 1499 204119 485 0 0 0 fail2ban-server
Oct 4 10:47:35 kernel: [ 1539] 0 1539 1016 34 0 0 0 mingetty
Oct 4 10:47:35 kernel: [ 1541] 0 1541 1016 34 0 0 0 mingetty
Oct 4 10:47:35 kernel: [ 1543] 0 1543 1016 34 0 0 0 mingetty
Oct 4 10:47:35 kernel: [ 1545] 0 1545 1016 34 0 0 0 mingetty
Oct 4 10:47:35 kernel: [ 1547] 0 1547 1016 34 0 0 0 mingetty
Oct 4 10:47:35 kernel: [ 1549] 0 1549 1016 34 0 0 0 mingetty
Oct 4 10:47:35 kernel: [ 1560] 0 1560 2724 31 0 -17 -1000 udevd
Oct 4 10:47:35 kernel: [ 1561] 0 1561 2724 31 0 -17 -1000 udevd
Oct 4 10:47:35 kernel: [10565] 0 10565 62866 239 0 0 0 httpd
Oct 4 10:47:35 kernel: [10567] 503 10567 141456 1109 0 0 0 httpd
Oct 4 10:47:35 kernel: [10568] 503 10568 141469 1260 0 0 0 httpd
Oct 4 10:47:35 kernel: [10569] 503 10569 142692 1186 0 0 0 httpd
Oct 4 10:47:35 kernel: [10570] 503 10570 141602 1183 0 0 0 httpd
Oct 4 10:47:35 kernel: [10571] 503 10571 141610 589 0 0 0 httpd
Oct 4 10:47:35 kernel: [10572] 503 10572 141756 1256 0 0 0 httpd
Oct 4 10:47:35 kernel: [10573] 503 10573 141157 1020 0 0 0 httpd
Oct 4 10:47:35 kernel: [10574] 503 10574 142386 1227 0 0 0 httpd
Oct 4 10:47:35 kernel: [12531] 503 12531 141259 606 0 0 0 httpd
Oct 4 10:47:35 kernel: [12532] 503 12532 141667 945 0 0 0 httpd
Oct 4 10:47:35 kernel: [12533] 503 12533 141454 609 0 0 0 httpd
Oct 4 10:47:35 kernel: [29641] 502 29641 35055 52 0 0 0 crond
Oct 4 10:47:35 kernel: [29645] 0 29645 43896 51 0 0 0 sudo
Oct 4 10:47:35 kernel: [29648] 0 29648 26582 46 0 0 0 v-backup-users
Oct 4 10:47:35 kernel: [29958] 0 29958 26681 46 0 0 0 v-backup-user
Oct 4 10:47:35 kernel: [30067] 0 30067 28084 67 0 0 0 find
Oct 4 10:47:35 kernel: [30068] 0 30068 30701 45 0 0 0 tar
Oct 4 10:47:35 kernel: [30826] 503 30826 116293 1098 0 0 0 httpd
Oct 4 10:47:35 kernel: [30834] 503 30834 116141 630 0 0 0 httpd
Oct 4 10:47:35 kernel: [30835] 503 30835 116329 1236 0 0 0 httpd
Oct 4 10:47:35 kernel: [30837] 503 30837 116329 997 0 0 0 httpd
Oct 4 10:47:35 kernel: [30838] 503 30838 116397 1242 0 0 0 httpd
Oct 4 10:47:35 kernel: [30839] 503 30839 116329 1057 0 0 0 httpd
Oct 4 10:47:35 kernel: [30844] 503 30844 116236 1131 0 0 0 httpd
Oct 4 10:47:35 kernel: [30845] 503 30845 116108 673 0 0 0 httpd
Oct 4 10:47:35 kernel: [30848] 503 30848 140602 988 0 0 0 httpd
Oct 4 10:47:35 kernel: [30849] 503 30849 116104 614 0 0 0 httpd
—
——continues for a while—
—
Oct 4 10:47:35 kernel: [31576] 503 31576 112875 833 0 0 0 httpd
Oct 4 10:47:35 kernel: [31577] 503 31577 112879 978 0 0 0 httpd
Oct 4 10:47:35 kernel: [31578] 503 31578 112785 797 0 0 0 httpd
Oct 4 10:47:35 kernel: [31579] 503 31579 112876 841 0 0 0 httpd
Oct 4 10:47:35 kernel: [31580] 503 31580 112781 496 0 0 0 httpd
Oct 4 10:47:35 kernel: [31581] 503 31581 112782 726 0 0 0 httpd
Oct 4 10:47:35 kernel: [31582] 503 31582 112875 845 0 0 0 httpd
Oct 4 10:47:35 kernel: [31583] 48 31583 112748 480 0 0 0 httpd
Oct 4 10:47:35 kernel: [31584] 48 31584 112748 480 0 0 0 httpd
Oct 4 10:47:35 kernel: [31585] 48 31585 112748 480 0 0 0 httpd
Oct 4 10:47:35 kernel: [31586] 0 31586 112748 445 0 0 0 httpd
Oct 4 10:47:35 kernel: [31587] 48 31587 112748 441 0 0 0 httpd
Oct 4 10:47:35 kernel: [31588] 0 31588 112748 414 0 0 0 httpd
Oct 4 10:47:35 kernel: [31589] 48 31589 112748 434 0 0 0 httpd
Oct 4 10:47:35 kernel: [31590] 48 31590 112748 441 0 0 0 httpd
Oct 4 10:47:35 kernel: [31591] 0 31591 112748 423 0 0 0 httpd
Oct 4 10:47:35 kernel: [31592] 0 31592 112748 404 0 0 0 httpd
Oct 4 10:47:35 kernel: [31593] 0 31593 112748 445 0 0 0 httpd
Oct 4 10:47:35 kernel: Out of memory: Kill process 1394 (java) score 107 or sacrifice child
Oct 4 10:47:35 kernel: Killed process 1394, UID 498, (java) total-vm:1087100kB, anon-rss:14132kB, file-rss:204kB
Y el registro de búsqueda elástica de cuando se cayó hoy (pude revivirlo con ssh esta vez sin reiniciar usando el 'reinicio de elasticsearch del servicio').
[2015-10-09 10:21:08,062][INFO ][index.shard.service ] [Black Panther] [tblproductoffer][0] updating refresh_interval from [1s] to [-1000]
[2015-10-09 10:22:22,647][DEBUG][action.admin.indices.settings.put] [Black Panther] failed to update settings on indices [rondreis]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:263)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-10-09 10:22:23,002][WARN ][monitor.jvm ] [Black Panther] [gc][young][83250][1221] duration [5.5s], collections [1]/[14.2s], total [5.5s]/[3.8m], memory [125.8mb]->[74.4mb]/[197.2mb], all_pools {[young] [53.1mb]->[1.5mb]/[54.5mb]}{[survivor] [1018kb]->[1.1mb]/[6.7mb]}{[old] [71.7mb]->[71.7mb]/[136mb]}
[2015-10-09 10:22:23,101][ERROR][river.jdbc.SimpleRiverFlow] failed to process cluster event (update-settings) within 30s
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings) within 30s
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:263)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-10-09 10:25:02,504][WARN ][monitor.jvm ] [Black Panther] [gc][young][83345][1222] duration [3.4s], collections [1]/[4.2s], total [3.4s]/[3.9m], memory [127.3mb]->[76mb]/[197.2mb], all_pools {[young] [54.5mb]->[1.1mb]/[54.5mb]}{[survivor] [1.1mb]->[3.1mb]/[6.7mb]}{[old] [71.7mb]->[71.7mb]/[136mb]}
[2015-10-09 10:25:42,385][WARN ][monitor.jvm ] [Black Panther] [gc][young][83361][1223] duration [8.8s], collections [1]/[9.7s], total [8.8s]/[4m], memory [128.8mb]->[75.4mb]/[197.2mb], all_pools {[young] [53.9mb]->[66kb]/[54.5mb]}{[survivor] [3.1mb]->[2.8mb]/[6.7mb]}{[old] [71.7mb]->[72.5mb]/[136mb]}
[2015-10-09 10:25:45,132][INFO ][river.jdbc.RiverMetrics ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@322e6dcf complete: river jdbc/rondreis metrics: 3288 rows, 0.12815869356635873 mean, (6.572349117469625 1.358646495863042 0.45540049399748883), ingest metrics: elapsed 4 minutes 38 seconds, 908.23 KB bytes, 282.0 bytes avg, 0.003 MB/s
[2015-10-09 10:27:44,274][INFO ][node ] [Black Panther] stopping ...
[2015-10-09 10:28:01,713][WARN ][river.jdbc.RiverThread ] interrupted
[2015-10-09 10:29:45,413][WARN ][common.jna ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
[2015-10-09 10:30:01,866][INFO ][node ] [Vance Astrovik] version[1.4.2], pid[11796], build[927caff/2014-12-16T14:11:12Z]
[2015-10-09 10:30:01,867][INFO ][node ] [Vance Astrovik] initializing ...
[2015-10-09 10:30:04,408][INFO ][plugins ] [Vance Astrovik] loaded [jdbc-1.4.0.6-664f819], sites []
[2015-10-09 10:32:14,502][INFO ][node ] [Vance Astrovik] initialized
[2015-10-09 10:32:14,502][INFO ][node ] [Vance Astrovik] starting ...
[2015-10-09 10:32:17,555][INFO ][transport ] [Vance Astrovik] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/185.13.224.246:9300]}
[2015-10-09 10:32:18,209][INFO ][discovery ] [Vance Astrovik] elasticsearch/3s3H31MrSCmyEze2t2L_jA
[2015-10-09 10:32:22,383][INFO ][cluster.service ] [Vance Astrovik] new_master [Vance Astrovik][3s3H31MrSCmyEze2t2L_jA][sensead.localdomain][inet[/185.13.224.246:9300]], reason: zen-disco-join (elected_as_master)
[2015-10-09 10:32:22,864][INFO ][http ] [Vance Astrovik] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/185.13.224.246:9200]}
[2015-10-09 10:32:22,864][INFO ][node ] [Vance Astrovik] started
[2015-10-09 10:32:34,032][INFO ][gateway ] [Vance Astrovik] recovered [3] indices into cluster_state
[2015-10-09 10:32:37,313][DEBUG][action.search.type ] [Vance Astrovik] All shards failed for phase: [query]
[2015-10-09 10:32:40,213][INFO ][river.jdbc.JDBCRiver ] scheduled river instance with cron expressions [0 21 * * * ?]
[2015-10-09 10:32:42,423][INFO ][river.jdbc.JDBCRiver ] scheduled river instance with cron expressions [0 21 * * * ?]
[2015-10-09 10:33:14,578][WARN ][indices.ttl ] [Vance Astrovik] failed to execute ttl purge
La última línea se imprimirá en el registro cada 60 segundos.
Llevo un par de semanas intentando solucionar este problema. ¿Estoy supervisando algo simple o está sucediendo algo más grande?
Respuesta1
Basado en la entrada del registro:
Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
Deberá realizar cambios en su configuración de elasticsearch además de actualizar su RAM como se mencionóaquí
Respuesta2
Bueno, este mensaje dice muy claro que el problema que tienes es la poca memoria:
Oct 4 10:47:35 kernel: Out of memory: Kill process 1394 (java) score 107 or sacrifice child
O limite el tamaño de la memoria a sus instancias java/httpd.