Nginx キャッシュノードの %sys 使用率が高い

2024-6-21 • tag-icon

私たちは、ファイル共有サーバーのローカルキャッシュノードとしてLua（openrestyバンドル）を使用してNginxを設定し、ファイルを「50MBずつ」のチャンクに分割しました（これ方法）を使用してキャッシュに保存し、効率を高めます。トラフィックが少ない場合は問題なく動作しますが、キャッシュされたファイルと負荷が増加すると（それほど高くなくても）、ほとんどの場合、80％を超えるシステム購入によりキャッシュが応答しなくなります。このような状況でパフォーマンスを低下させる要因は何でしょうか？

いくつかのパラメータ（キャッシュディレクトリレベル、RAIDパラメータなど）の調整を試みましたが、まだ最適な解決策は得られていません。

Ps. 症状は、サーバー上で約 300 接続/秒でキャッシュに 10000 個のファイルがあるだけで発生します。

キャッシュサーバーの仕様

    1xCPU 2.5 Ghz 12 Cores
    128GB RAM
    10x500GB Samsung SSD RAID0 (128KB chuck s) storage
    linux Os -CentOS 6.6 64bit 
    File system ext4 4k block

Nginx の設定

 worker_processes  auto;

events {

    use epoll;
    worker_connections 1024;
    multi_accept on;
 }


http {
    include       /usr/local/openresty/nginx/conf/mime.types;

    proxy_cache_path  /mnt/cache/ levels=2:2:2 keys_zone=default:1000m loader_threshold=100 loader_files=2000
                     loader_sleep=10 inactive=1y max_size=3500000m;
    proxy_temp_path /mnt/temp2 2 2;
    client_body_temp_path /mnt/temp 2 2;
    limit_conn_zone $remote_addr$uri zone=addr:100m;

    map $request_method $disable_cache {
      HEAD  1;
      default   0;
    }

    lua_package_path "/opt/ranger/external/lua-resty-http/lib/?.lua;/opt/ranger/external/nginx_log_by_lua/?.lua;/opt/ranger/external/bitset/lib/?.lua;;";

    lua_shared_dict file_dict  50M;
    lua_shared_dict log_dict   100M;
    lua_shared_dict cache_dict 100M;
    lua_shared_dict chunk_dict 100M;


    proxy_read_timeout 20s;
    proxy_send_timeout 25s;
    reset_timedout_connection on;

    init_by_lua_file '/opt/ranger/init.lua';

    # Server that has the lua code and will be accessed by clients
    server {
      listen       80 default;
      server_name  _;
      server_name_in_redirect off;

      set $ranger_cache_status $upstream_cache_status;

      lua_check_client_abort on;
      lua_code_cache on;

      resolver ----;
      server_tokens off;
      resolver_timeout 1s;

      location / {
        try_files $uri $uri/ index.html;
      }

      location  ~* ^/download/ {
        lua_http10_buffering off;
        content_by_lua_file '/opt/ranger/content.lua';
        log_by_lua_file '/opt/ranger/log.lua';
        limit_conn addr 2;
      } 
    }

    # Server that works as a backend to the lua code
    server {
      listen 8080;

      server_tokens off;
      resolver_timeout 1s;

      location  ~* ^/download/(.*?)/(.*?)/(.*) {
        set $download_uri  $3;
        set $download_host $2;
        set $download_url http://$download_host/$download_uri?$args;
        proxy_no_cache $disable_cache;
        proxy_cache_valid 200 1y;
        proxy_cache_valid 206 1y;
        proxy_cache_key "$scheme$proxy_host$uri$http_range"; 
        proxy_cache_use_stale error timeout http_502;
        proxy_cache default; 
        proxy_cache_min_uses 1;

        proxy_pass $download_url;
      }
    }
}

答え1

アドバイスをくれた @myaut に感謝します。調べたところ、 _spin_lock_irqsave Nginx ではなくカーネル自体に関連していることがわかりました。

によるとこれ記事によると、問題は RedHat Transparent Huge Page 機能を無効にすることで解決できるとのことです。

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

答え1

関連情報