Servidor CentOs 6.2 com ElasticSearch travando várias vezes ao dia

Servidor CentOs 6.2 com ElasticSearch travando várias vezes ao dia

Estou executando um servidor CentOS 6.2 com 1 GB de memória e 98 GB de tamanho de disco

Elasticsearch versão 1.4.2 com cerca de 9.000 documentos e usando JDBC para recuperar os registros de um banco de dados mysql

Nas últimas semanas, ele travou em média 2 vezes por dia. Quando isso acontece, o servidor não estará acessível até que seja reinicializado. (sem acesso ssh, erro 500 ao tentar acessar o site, etc)

Reiniciar o Elasticsearch com frequência parece impedir que o servidor fique inativo, mas prefiro encontrar e corrigir a origem do problema. Estes são os registros de uma falha que aconteceu há alguns dias:

Elasticsearch.log

[2015-10-04 10:20:58,247][WARN ][monitor.jvm              ] [Speedball] [gc][young][152940][2627] duration [2.2s], collections [1]/[3.9s], total [2.2s]/[12.5m], memory [89.8mb]->[35.9mb]/[197.2mb], all_pools {[young] [54.1mb]->[110.1kb]/[54.5mb]}{[survivor] [472.9kb]->[656.6kb]/[6.7mb]}{[old] [35.2mb]->[35.2mb]/[136mb]}
[2015-10-04 10:21:05,538][INFO ][index.shard.service      ] [Speedball] [rondreis][0] updating refresh_interval from [1s] to [-1000]
[2015-10-04 10:21:06,795][INFO ][index.shard.service      ] [Speedball] [tblproductoffer][0] updating refresh_interval from [1s] to [-1000]
[2015-10-04 10:22:54,653][WARN ][monitor.jvm              ] [Speedball] [gc][young][153002][2628] duration [1.3s], collections [1]/[1.9s], total [1.3s]/[12.5m], memory [90.3mb]->[36.8mb]/[197.2mb], all_pools {[young] [54.5mb]->[24kb]/[54.5mb]}{[survivor] [656.6kb]->[1.5mb]/[6.7mb]}{[old] [35.2mb]->[35.2mb]/[136mb]}
[2015-10-04 10:22:57,406][INFO ][index.shard.service      ] [Speedball] [rondreis][0] updating refresh_interval from [-1000] to [1s]
[2015-10-04 10:22:59,971][WARN ][monitor.jvm              ] [Speedball] [gc][young][153006][2630] duration [1.7s], collections [1]/[2.2s], total [1.7s]/[12.5m], memory [83.1mb]->[36.8mb]/[197.2mb], all_pools {[young] [42.8mb]->[45.3kb]/[54.5mb]}{[survivor] [5mb]->[835.8kb]/[6.7mb]}{[old] [35.2mb]->[35.9mb]/[136mb]}
[2015-10-04 10:23:08,215][INFO ][river.jdbc.RiverMetrics  ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@f106d7f complete: river jdbc/rondreis metrics: 3304 rows, 0.1295726757143051 mean, (6.604331351617896 1.365257914334395 0.45761655479553015), ingest metrics: elapsed 2 minutes 2 seconds, 915.27 KB bytes, 283.0 bytes avg, 0.007 MB/s
[2015-10-04 10:25:33,021][WARN ][monitor.jvm              ] [Speedball] [gc][young][153151][2632] duration [1.5s], collections [1]/[1s], total [1.5s]/[12.5m], memory [64.7mb]->[91.3mb]/[197.2mb], all_pools {[young] [27.8mb]->[54.5mb]/[54.5mb]}{[survivor] [917.6kb]->[2.1mb]/[6.7mb]}{[old] [35.9mb]->[35.9mb]/[136mb]}
[2015-10-04 10:25:43,724][INFO ][monitor.jvm              ] [Speedball] [gc][young][153154][2639] duration [1.4s], collections [2]/[2.2s], total [1.4s]/[12.6m], memory [91.7mb]->[45.3mb]/[197.2mb], all_pools {[young] [48.4mb]->[168.9kb]/[54.5mb]}{[survivor] [4.3mb]->[4.2mb]/[6.7mb]}{[old] [39mb]->[40.9mb]/[136mb]}
[2015-10-04 10:25:57,116][WARN ][monitor.jvm              ] [Speedball] [gc][young][153165][2641] duration [3s], collections [1]/[3.3s], total [3s]/[12.7m], memory [90.7mb]->[52.1mb]/[197.2mb], all_pools {[young] [44.5mb]->[2kb]/[54.5mb]}{[survivor] [4.2mb]->[6.7mb]/[6.7mb]}{[old] [41.8mb]->[45.4mb]/[136mb]}
[2015-10-04 10:26:06,283][WARN ][monitor.jvm              ] [Speedball] [gc][young][153171][2642] duration [3.6s], collections [1]/[3.9s], total [3.6s]/[12.7m], memory [89.4mb]->[54.8mb]/[197.2mb], all_pools {[young] [37.2mb]->[4.3kb]/[54.5mb]}{[survivor] [6.7mb]->[5mb]/[6.7mb]}{[old] [45.4mb]->[49.8mb]/[136mb]}
[2015-10-04 10:26:12,194][WARN ][monitor.jvm              ] [Speedball] [gc][young][153172][2643] duration [4.9s], collections [1]/[5.9s], total [4.9s]/[12.8m], memory [54.8mb]->[57.9mb]/[197.2mb], all_pools {[young] [4.3kb]->[4.1kb]/[54.5mb]}{[survivor] [5mb]->[3.7mb]/[6.7mb]}{[old] [49.8mb]->[54.2mb]/[136mb]}
[2015-10-04 10:26:38,171][WARN ][monitor.jvm              ] [Speedball] [gc][young][153176][2644] duration [3s], collections [1]/[22.2s], total [3s]/[12.9m], memory [101.9mb]->[42.2mb]/[197.2mb], all_pools {[young] [43.9mb]->[4.1kb]/[54.5mb]}{[survivor] [3.7mb]->[0b]/[6.7mb]}{[old] [54.2mb]->[42.2mb]/[136mb]}
[2015-10-04 10:26:39,186][WARN ][monitor.jvm              ] [Speedball] [gc][old][153176][209] duration [18.3s], collections [1]/[22.2s], total [18.3s]/[4.4m], memory [101.9mb]->[42.2mb]/[197.2mb], all_pools {[young] [43.9mb]->[4.1kb]/[54.5mb]}{[survivor] [3.7mb]->[0b]/[6.7mb]}{[old] [54.2mb]->[42.2mb]/[136mb]}
[2015-10-04 10:26:47,155][WARN ][monitor.jvm              ] [Speedball] [gc][young][153182][2646] duration [2.6s], collections [1]/[2.9s], total [2.6s]/[12.9m], memory [102.9mb]->[67.5mb]/[197.2mb], all_pools {[young] [47mb]->[6.1kb]/[54.5mb]}{[survivor] [6.7mb]->[6.7mb]/[6.7mb]}{[old] [49.1mb]->[60.8mb]/[136mb]}
[2015-10-04 10:26:54,247][WARN ][monitor.jvm              ] [Speedball] [gc][young][153186][2647] duration [1.4s], collections [1]/[3.1s], total [1.4s]/[12.9m], memory [106mb]->[77.7mb]/[197.2mb], all_pools {[young] [38.4mb]->[842.4kb]/[54.5mb]}{[survivor] [6.7mb]->[6.7mb]/[6.7mb]}{[old] [60.8mb]->[70.2mb]/[136mb]}
[2015-10-04 10:27:03,745][WARN ][monitor.jvm              ] [Speedball] [gc][young][153189][2648] duration [1.6s], collections [1]/[7s], total [1.6s]/[13m], memory [118.3mb]->[51.7mb]/[197.2mb], all_pools {[young] [41.3mb]->[8.1kb]/[54.5mb]}{[survivor] [6.7mb]->[0b]/[6.7mb]}{[old] [70.2mb]->[51.7mb]/[136mb]}
[2015-10-04 10:27:58,542][INFO ][index.shard.service      ] [Speedball] [tblproductoffer][0] updating refresh_interval from [-1000] to [1s]
[2015-10-04 10:28:27,313][INFO ][river.jdbc.RiverMetrics  ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@72a191c1 complete: river jdbc/tblproductoffer metrics: 76480 rows, 2.962182052284631 mean, (98.523212205934 28.85622315906601 10.274778138333478), ingest metrics: elapsed 7 minutes 21 seconds, 16.72 MB bytes, 229.0 bytes avg, 0.038 MB/s

Mensagens

        Oct  4 10:47:33   kernel: ps invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
    Oct  4 10:47:35   kernel: ps cpuset=/ mems_allowed=0
    Oct  4 10:47:35   kernel: Pid: 31557, comm: ps Not tainted 2.6.32-504.3.3.el6.x86_64 #1
    Oct  4 10:47:35   kernel: Call Trace:
    Oct  4 10:47:35   kernel: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
    Oct  4 10:47:35   kernel: [<ffffffff81127300>] ? dump_header+0x90/0x1b0
    Oct  4 10:47:35   kernel: [<ffffffff8122eabc>] ? security_real_capable_noaudit+0x3c/0x70
    Oct  4 10:47:35   kernel: [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
    Oct  4 10:47:35   kernel: [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
    Oct  4 10:47:35   kernel: [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
    Oct  4 10:47:35   kernel: [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
    Oct  4 10:47:35   kernel: [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
    Oct  4 10:47:35   kernel: [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
    Oct  4 10:47:35   kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
    Oct  4 10:47:35   kernel: [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
    Oct  4 10:47:35   kernel: [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
    Oct  4 10:47:35   kernel: [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
    Oct  4 10:47:35   kernel: [<ffffffff81198b9b>] ? pipe_wait+0x5b/0x80
    Oct  4 10:47:35   kernel: [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
    Oct  4 10:47:35   kernel: [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
    Oct  4 10:47:35   kernel: [<ffffffff8128c245>] ? _atomic_dec_and_lock+0x55/0x80
    Oct  4 10:47:35   kernel: [<ffffffff811b07d0>] ? mntput_no_expire+0x30/0x110
    Oct  4 10:47:35   kernel: [<ffffffff8118f981>] ? __fput+0x1a1/0x210
    Oct  4 10:47:35   kernel: [<ffffffff8152ff7e>] ? do_page_fault+0x3e/0xa0
    Oct  4 10:47:35   kernel: [<ffffffff8152d335>] ? page_fault+0x25/0x30
    Oct  4 10:47:35   kernel: Mem-Info:
    Oct  4 10:47:35   kernel: Node 0 DMA per-cpu:
    Oct  4 10:47:35   kernel: CPU    0: hi:    0, btch:   1 usd:   0
    Oct  4 10:47:35   kernel: Node 0 DMA32 per-cpu:
    Oct  4 10:47:35   kernel: CPU    0: hi:  186, btch:  31 usd: 173
    Oct  4 10:47:35   kernel: active_anon:87918 inactive_anon:88009 isolated_anon:5376
    Oct  4 10:47:35   kernel: active_file:136 inactive_file:122 isolated_file:332
    Oct  4 10:47:35   kernel: unevictable:0 dirty:0 writeback:284 unstable:0
    Oct  4 10:47:35   kernel: free:12258 slab_reclaimable:3027 slab_unreclaimable:14974
    Oct  4 10:47:35   kernel: mapped:478 shmem:22 pagetables:36389 bounce:0
    Oct  4 10:47:35   kernel: Node 0 DMA free:4648kB min:668kB low:832kB high:1000kB active_anon:4400kB inactive_anon:4648kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):768kB isolated(file):48kB present:15368kB mlocked:0kB dirty:0kB writeback:20kB mapped:44kB shmem:0kB slab_reclaimable:36kB slab_unreclaimable:244kB kernel_stack:16kB pagetables:864kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12 all_unreclaimable? no
    Oct  4 10:47:35   kernel: lowmem_reserve[]: 0 994 994 994
    Oct  4 10:47:35   kernel: Node 0 DMA32 free:44384kB min:44384kB low:55480kB high:66576kB active_anon:347272kB inactive_anon:347388kB active_file:544kB inactive_file:488kB unevictable:0kB isolated(anon):20736kB isolated(file):1280kB present:1018016kB mlocked:0kB dirty:0kB writeback:1116kB mapped:1868kB shmem:88kB slab_reclaimable:12072kB slab_unreclaimable:59652kB kernel_stack:3440kB pagetables:144692kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3034 all_unreclaimable? yes
    Oct  4 10:47:35   kernel: lowmem_reserve[]: 0 0 0 0
    Oct  4 10:47:35   kernel: Node 0 DMA: 34*4kB 64*8kB 22*16kB 2*32kB 8*64kB 8*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4648kB
    Oct  4 10:47:35   kernel: Node 0 DMA32: 2240*4kB 2782*8kB 373*16kB 81*32kB 34*64kB 3*128kB 2*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 44384kB
    Oct  4 10:47:35   kernel: 17908 total pagecache pages
    Oct  4 10:47:35   kernel: 17292 pages in swap cache
    Oct  4 10:47:35   kernel: Swap cache stats: add 2986384, delete 2969092, find 1357308/1573932
    Oct  4 10:47:35   kernel: Free swap  = 0kB
    Oct  4 10:47:35   kernel: Total swap = 2097148kB
    Oct  4 10:47:35   kernel: 262127 pages RAM
    Oct  4 10:47:35   kernel: 7514 pages reserved
    Oct  4 10:47:35   kernel: 132906 pages shared
    Oct  4 10:47:35   kernel: 231782 pages non-shared
    Oct  4 10:47:35   kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
    Oct  4 10:47:35   kernel: [  294]     0   294     2725       36   0     -17         -1000 udevd
    Oct  4 10:47:35   kernel: [  812]     0   812     6905       74   0     -17         -1000 auditd
    Oct  4 10:47:35   kernel: [  842]     0   842    62271       98   0       0             0 rsyslogd
    Oct  4 10:47:35   kernel: [  863]    25   863    40013       68   0       0             0 named
    Oct  4 10:47:35   kernel: [  878]    81   878     5358       38   0       0             0 dbus-daemon
    Oct  4 10:47:35   kernel: [  905]     0   905     1020       37   0       0             0 acpid
    Oct  4 10:47:35   kernel: [  925]     0   925    16672       39   0     -17         -1000 sshd
    Oct  4 10:47:35   kernel: [  937]     0   937    13145       14   0       0             0 vsftpd
    Oct  4 10:47:35   kernel: [  974]     0   974    27075       41   0       0             0 mysqld_safe
    Oct  4 10:47:35   kernel: [ 1289]    27  1289   237303     9646   0       0             0 mysqld
    Oct  4 10:47:35   kernel: [ 1369]     0  1369     4816       35   0       0             0 dovecot
    Oct  4 10:47:35   kernel: [ 1375]    97  1375     3246       43   0       0             0 anvil
    Oct  4 10:47:35   kernel: [ 1376]     0  1376     3278       45   0       0             0 log
    Oct  4 10:47:35   kernel: [ 1378]     0  1378     3589       44   0       0             0 config
    Oct  4 10:47:35   kernel: [ 1394]   498  1394   271775     3584   0       0             0 java
    Oct  4 10:47:35   kernel: [ 1408]    93  1408    23430       42   0       0             0 exim
    Oct  4 10:47:35   kernel: [ 1420]     0  1420   112748      522   0       0             0 httpd
    Oct  4 10:47:35   kernel: [ 1437]     0  1437    16578       52   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1439]   496  1439    16713      139   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1441]   496  1441    16762      137   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1442]   496  1442    16579       85   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1450]     0  1450    13814       11   0       0             0 vesta-nginx
    Oct  4 10:47:35   kernel: [ 1452]   502  1452    13814       29   0       0             0 vesta-nginx
    Oct  4 10:47:35   kernel: [ 1462]     0  1462    34253       56   0       0             0 vesta-php
    Oct  4 10:47:35   kernel: [ 1471]   502  1471    34253       31   0       0             0 vesta-php
    Oct  4 10:47:35   kernel: [ 1472]   502  1472    34253       31   0       0             0 vesta-php
    Oct  4 10:47:35   kernel: [ 1482]     0  1482    29333       68   0       0             0 crond
    Oct  4 10:47:35   kernel: [ 1499]     0  1499   204119      485   0       0             0 fail2ban-server
    Oct  4 10:47:35   kernel: [ 1539]     0  1539     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1541]     0  1541     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1543]     0  1543     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1545]     0  1545     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1547]     0  1547     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1549]     0  1549     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1560]     0  1560     2724       31   0     -17         -1000 udevd
    Oct  4 10:47:35   kernel: [ 1561]     0  1561     2724       31   0     -17         -1000 udevd
    Oct  4 10:47:35   kernel: [10565]     0 10565    62866      239   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10567]   503 10567   141456     1109   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10568]   503 10568   141469     1260   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10569]   503 10569   142692     1186   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10570]   503 10570   141602     1183   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10571]   503 10571   141610      589   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10572]   503 10572   141756     1256   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10573]   503 10573   141157     1020   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10574]   503 10574   142386     1227   0       0             0 httpd
    Oct  4 10:47:35   kernel: [12531]   503 12531   141259      606   0       0             0 httpd
    Oct  4 10:47:35   kernel: [12532]   503 12532   141667      945   0       0             0 httpd
    Oct  4 10:47:35   kernel: [12533]   503 12533   141454      609   0       0             0 httpd
    Oct  4 10:47:35   kernel: [29641]   502 29641    35055       52   0       0             0 crond
    Oct  4 10:47:35   kernel: [29645]     0 29645    43896       51   0       0             0 sudo
    Oct  4 10:47:35   kernel: [29648]     0 29648    26582       46   0       0             0 v-backup-users
    Oct  4 10:47:35   kernel: [29958]     0 29958    26681       46   0       0             0 v-backup-user
    Oct  4 10:47:35   kernel: [30067]     0 30067    28084       67   0       0             0 find
    Oct  4 10:47:35   kernel: [30068]     0 30068    30701       45   0       0             0 tar
    Oct  4 10:47:35   kernel: [30826]   503 30826   116293     1098   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30834]   503 30834   116141      630   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30835]   503 30835   116329     1236   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30837]   503 30837   116329      997   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30838]   503 30838   116397     1242   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30839]   503 30839   116329     1057   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30844]   503 30844   116236     1131   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30845]   503 30845   116108      673   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30848]   503 30848   140602      988   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30849]   503 30849   116104      614   0       0             0 httpd
    ——continues for a while—
    Oct  4 10:47:35   kernel: [31576]   503 31576   112875      833   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31577]   503 31577   112879      978   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31578]   503 31578   112785      797   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31579]   503 31579   112876      841   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31580]   503 31580   112781      496   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31581]   503 31581   112782      726   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31582]   503 31582   112875      845   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31583]    48 31583   112748      480   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31584]    48 31584   112748      480   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31585]    48 31585   112748      480   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31586]     0 31586   112748      445   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31587]    48 31587   112748      441   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31588]     0 31588   112748      414   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31589]    48 31589   112748      434   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31590]    48 31590   112748      441   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31591]     0 31591   112748      423   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31592]     0 31592   112748      404   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31593]     0 31593   112748      445   0       0             0 httpd
    Oct  4 10:47:35   kernel: Out of memory: Kill process 1394 (java) score 107 or sacrifice child
    Oct  4 10:47:35   kernel: Killed process 1394, UID 498, (java) total-vm:1087100kB, anon-rss:14132kB, file-rss:204kB

E o log de pesquisa elástica de quando caiu hoje (foi capaz de reanimá-lo com ssh desta vez sem reiniciar usando 'service elasticsearch reboot').

[2015-10-09 10:21:08,062][INFO ][index.shard.service      ] [Black Panther] [tblproductoffer][0] updating refresh_interval from [1s] to [-1000]
[2015-10-09 10:22:22,647][DEBUG][action.admin.indices.settings.put] [Black Panther] failed to update settings on indices [rondreis]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings) within 30s
    at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:263)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
[2015-10-09 10:22:23,002][WARN ][monitor.jvm              ] [Black Panther] [gc][young][83250][1221] duration [5.5s], collections [1]/[14.2s], total [5.5s]/[3.8m], memory [125.8mb]->[74.4mb]/[197.2mb], all_pools {[young] [53.1mb]->[1.5mb]/[54.5mb]}{[survivor] [1018kb]->[1.1mb]/[6.7mb]}{[old] [71.7mb]->[71.7mb]/[136mb]}
[2015-10-09 10:22:23,101][ERROR][river.jdbc.SimpleRiverFlow] failed to process cluster event (update-settings) within 30s
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings) within 30s
    at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:263)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
[2015-10-09 10:25:02,504][WARN ][monitor.jvm              ] [Black Panther] [gc][young][83345][1222] duration [3.4s], collections [1]/[4.2s], total [3.4s]/[3.9m], memory [127.3mb]->[76mb]/[197.2mb], all_pools {[young] [54.5mb]->[1.1mb]/[54.5mb]}{[survivor] [1.1mb]->[3.1mb]/[6.7mb]}{[old] [71.7mb]->[71.7mb]/[136mb]}
[2015-10-09 10:25:42,385][WARN ][monitor.jvm              ] [Black Panther] [gc][young][83361][1223] duration [8.8s], collections [1]/[9.7s], total [8.8s]/[4m], memory [128.8mb]->[75.4mb]/[197.2mb], all_pools {[young] [53.9mb]->[66kb]/[54.5mb]}{[survivor] [3.1mb]->[2.8mb]/[6.7mb]}{[old] [71.7mb]->[72.5mb]/[136mb]}
[2015-10-09 10:25:45,132][INFO ][river.jdbc.RiverMetrics  ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@322e6dcf complete: river jdbc/rondreis metrics: 3288 rows, 0.12815869356635873 mean, (6.572349117469625 1.358646495863042 0.45540049399748883), ingest metrics: elapsed 4 minutes 38 seconds, 908.23 KB bytes, 282.0 bytes avg, 0.003 MB/s
[2015-10-09 10:27:44,274][INFO ][node                     ] [Black Panther] stopping ...
[2015-10-09 10:28:01,713][WARN ][river.jdbc.RiverThread   ] interrupted
[2015-10-09 10:29:45,413][WARN ][common.jna               ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
[2015-10-09 10:30:01,866][INFO ][node                     ] [Vance Astrovik] version[1.4.2], pid[11796], build[927caff/2014-12-16T14:11:12Z]
[2015-10-09 10:30:01,867][INFO ][node                     ] [Vance Astrovik] initializing ...
[2015-10-09 10:30:04,408][INFO ][plugins                  ] [Vance Astrovik] loaded [jdbc-1.4.0.6-664f819], sites []
[2015-10-09 10:32:14,502][INFO ][node                     ] [Vance Astrovik] initialized
[2015-10-09 10:32:14,502][INFO ][node                     ] [Vance Astrovik] starting ...
[2015-10-09 10:32:17,555][INFO ][transport                ] [Vance Astrovik] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/185.13.224.246:9300]}
[2015-10-09 10:32:18,209][INFO ][discovery                ] [Vance Astrovik] elasticsearch/3s3H31MrSCmyEze2t2L_jA
[2015-10-09 10:32:22,383][INFO ][cluster.service          ] [Vance Astrovik] new_master [Vance Astrovik][3s3H31MrSCmyEze2t2L_jA][sensead.localdomain][inet[/185.13.224.246:9300]], reason: zen-disco-join (elected_as_master)
[2015-10-09 10:32:22,864][INFO ][http                     ] [Vance Astrovik] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/185.13.224.246:9200]}
[2015-10-09 10:32:22,864][INFO ][node                     ] [Vance Astrovik] started
[2015-10-09 10:32:34,032][INFO ][gateway                  ] [Vance Astrovik] recovered [3] indices into cluster_state
[2015-10-09 10:32:37,313][DEBUG][action.search.type       ] [Vance Astrovik] All shards failed for phase: [query]
[2015-10-09 10:32:40,213][INFO ][river.jdbc.JDBCRiver     ] scheduled river instance with cron expressions [0 21 * * * ?]
[2015-10-09 10:32:42,423][INFO ][river.jdbc.JDBCRiver     ] scheduled river instance with cron expressions [0 21 * * * ?]
[2015-10-09 10:33:14,578][WARN ][indices.ttl              ] [Vance Astrovik] failed to execute ttl purge

A última linha será impressa no log a cada 60 segundos.

Estou tentando resolver esse problema há algumas semanas. Estou supervisionando algo simples ou algo maior está acontecendo?

Responder1

Com base na entrada de registro:

Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).

Você precisará fazer alterações na configuração do elasticsearch além de atualizar sua RAM conforme mencionadoaqui

Responder2

Bem, esta mensagem deixa bem claro que o problema que você está tendo é pouca memória:

         Oct  4 10:47:35   kernel: Out of memory: Kill process 1394 (java) score 107 or sacrifice child

Ou limite o tamanho da memória às suas instâncias java/httpd.

informação relacionada