Nginx 캐시 노드의 높은 %sys 활용률

2024-6-21 • tag-icon

우리는 Lua(openresty 번들)를 사용하여 Nginx를 파일 공유 서버에 대한 로컬 캐싱 노드로 설정하고 파일을 "각각 50MB" 청크로 분리했습니다.이것방법)을 캐시에 저장하여 효율성을 높입니다. 낮은 트래픽에서는 잘 작동하지만 캐시된 파일과 로드가 증가하면(실제로 높지는 않더라도) 대부분의 경우 80% 이상의 sys 구매로 인해 캐시가 응답하지 않게 됩니다. 그렇다면 그러한 상황에서 성능 킬러가 될 수 있는 것은 무엇일까요?

여러 매개변수(예: 캐싱 디렉터리 수준, RAID 매개변수) 조정을 실험했지만 아직 최적의 솔루션을 제공하지 못했습니다.

추신. 서버에서 초당 최대 300개의 연결로 캐시에 파일이 10000개만 있으면 증상이 시작됩니다.

캐시 서버 사양

    1xCPU 2.5 Ghz 12 Cores
    128GB RAM
    10x500GB Samsung SSD RAID0 (128KB chuck s) storage
    linux Os -CentOS 6.6 64bit 
    File system ext4 4k block

Nginx 설정

 worker_processes  auto;

events {

    use epoll;
    worker_connections 1024;
    multi_accept on;
 }


http {
    include       /usr/local/openresty/nginx/conf/mime.types;

    proxy_cache_path  /mnt/cache/ levels=2:2:2 keys_zone=default:1000m loader_threshold=100 loader_files=2000
                     loader_sleep=10 inactive=1y max_size=3500000m;
    proxy_temp_path /mnt/temp2 2 2;
    client_body_temp_path /mnt/temp 2 2;
    limit_conn_zone $remote_addr$uri zone=addr:100m;

    map $request_method $disable_cache {
      HEAD  1;
      default   0;
    }

    lua_package_path "/opt/ranger/external/lua-resty-http/lib/?.lua;/opt/ranger/external/nginx_log_by_lua/?.lua;/opt/ranger/external/bitset/lib/?.lua;;";

    lua_shared_dict file_dict  50M;
    lua_shared_dict log_dict   100M;
    lua_shared_dict cache_dict 100M;
    lua_shared_dict chunk_dict 100M;


    proxy_read_timeout 20s;
    proxy_send_timeout 25s;
    reset_timedout_connection on;

    init_by_lua_file '/opt/ranger/init.lua';

    # Server that has the lua code and will be accessed by clients
    server {
      listen       80 default;
      server_name  _;
      server_name_in_redirect off;

      set $ranger_cache_status $upstream_cache_status;

      lua_check_client_abort on;
      lua_code_cache on;

      resolver ----;
      server_tokens off;
      resolver_timeout 1s;

      location / {
        try_files $uri $uri/ index.html;
      }

      location  ~* ^/download/ {
        lua_http10_buffering off;
        content_by_lua_file '/opt/ranger/content.lua';
        log_by_lua_file '/opt/ranger/log.lua';
        limit_conn addr 2;
      } 
    }

    # Server that works as a backend to the lua code
    server {
      listen 8080;

      server_tokens off;
      resolver_timeout 1s;

      location  ~* ^/download/(.*?)/(.*?)/(.*) {
        set $download_uri  $3;
        set $download_host $2;
        set $download_url http://$download_host/$download_uri?$args;
        proxy_no_cache $disable_cache;
        proxy_cache_valid 200 1y;
        proxy_cache_valid 206 1y;
        proxy_cache_key "$scheme$proxy_host$uri$http_range"; 
        proxy_cache_use_stale error timeout http_502;
        proxy_cache default; 
        proxy_cache_min_uses 1;

        proxy_pass $download_url;
      }
    }
}

답변1

안내해 주신 @myaut에게 감사드립니다. 찾아보니 _spin_lock_irqsave Nginx가 아니라 커널 자체와 관련된 것으로 밝혀졌습니다.

에 따르면이것기사에서는 문제를 해결한 RedHat Transparent Huge Page 기능을 비활성화하여 문제를 해결할 수 있습니다.

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

답변1

관련 정보