Terminação SSL do AWS ALB lenta

Terminação SSL do AWS ALB lenta

Estou rastreando um problema de desempenho com solicitações SSL.

Executamos dois servidores web em duas instâncias EC2 (us-east-2a/us-east-2b) com um ALB que também faz terminação SSL lá, Route53 é responsável pelo domínio com um CNAME para o CNAME do ALB. Tudo roda em uma VPC privada, com duas sub-redes privadas, ambas as sub-redes possuem uma tabela de rotas com acesso à Internet através de um gateway NAT. Estou usando uma VPN para alcançar os endpoints do balanceador/EC2.

Acessando diretamente ao ALB usando HTTP (sem redirecionamento de HTTP para HTTPS),

% ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            80

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   5.015 seconds
Complete requests:      10
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Non-2xx responses:      10
Total transferred:      4059 bytes
HTML transferred:       1989 bytes
Requests per second:    1.99 [#/sec] (mean)
Time per request:       501.536 [ms] (mean)
Time per request:       501.536 [ms] (mean, across all concurrent requests)
Transfer rate:          0.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      203  251  51.0    243     315
Processing:   216  251  43.5    221     309
Waiting:      216  250  43.5    221     309
Total:        420  501  77.9    520     617

Percentage of the requests served within a certain time (ms)
  50%    520
  66%    536
  75%    550
  80%    612
  90%    617
  95%    617
  98%    617
  99%    617
 100%    617 (longest request)

Acessando diretamente o ALB usando HTTPS,

% ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key:        ECDH P-256 256 bits
TLS Server Name:        service.internal.stg

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   9.822 seconds
Complete requests:      10
Failed requests:        0
Non-2xx responses:      10
Total transferred:      4060 bytes
HTML transferred:       1990 bytes
Requests per second:    1.02 [#/sec] (mean)
Time per request:       982.242 [ms] (mean)
Time per request:       982.242 [ms] (mean, across all concurrent requests)
Transfer rate:          0.40 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      633  737 100.4    792     883
Processing:   220  245  31.5    231     303
Waiting:      220  245  31.5    231     303
Total:        858  982 105.1   1039    1114

Percentage of the requests served within a certain time (ms)
  50%   1039
  66%   1041
  75%   1061
  80%   1108
  90%   1114
  95%   1114
  98%   1114
  99%   1114
 100%   1114 (longest request)

Eu tenho tempos de conexão muito mais altos. Mas, executando ab com HTTP Keepalive (-k), recebo apenas uma solicitação lenta (~ 900 ms), mas nesse meio tempo estamos muito bem, atingindo ~ 320 ms.

% ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key:        ECDH P-256 256 bits
TLS Server Name:        service.internal.stg

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   3.242 seconds
Complete requests:      10
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Non-2xx responses:      10
Keep-Alive requests:    10
Total transferred:      4109 bytes
HTML transferred:       1989 bytes
Requests per second:    3.08 [#/sec] (mean)
Time per request:       324.238 [ms] (mean)
Time per request:       324.238 [ms] (mean, across all concurrent requests)
Transfer rate:          1.24 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   92 292.2      0     924
Processing:   217  232  22.7    223     279
Waiting:      217  232  22.6    223     279
Total:        217  324 289.5    224    1146

Percentage of the requests served within a certain time (ms)
  50%    224
  66%    227
  75%    269
  80%    279
  90%   1146
  95%   1146
  98%   1146
  99%   1146
 100%   1146 (longest request)

Estou em dúvida com o desempenho da terminação SSL no ALB, mas não tenho certeza de como lidar/trabalhar nisso.

Informações adicionais: - Ping da minha localização para a instância EC2

% ping 10.1.1.95 -c 10                                                                                                                              ~
PING 10.1.1.95 (10.1.1.95): 56 data bytes
64 bytes from 10.1.1.95: icmp_seq=0 ttl=61 time=203.177 ms
64 bytes from 10.1.1.95: icmp_seq=1 ttl=61 time=202.369 ms
64 bytes from 10.1.1.95: icmp_seq=2 ttl=61 time=317.346 ms
64 bytes from 10.1.1.95: icmp_seq=3 ttl=61 time=232.651 ms
64 bytes from 10.1.1.95: icmp_seq=4 ttl=61 time=252.859 ms
64 bytes from 10.1.1.95: icmp_seq=5 ttl=61 time=271.837 ms
64 bytes from 10.1.1.95: icmp_seq=6 ttl=61 time=204.135 ms
64 bytes from 10.1.1.95: icmp_seq=7 ttl=61 time=208.154 ms
64 bytes from 10.1.1.95: icmp_seq=8 ttl=61 time=201.772 ms
64 bytes from 10.1.1.95: icmp_seq=9 ttl=61 time=208.608 ms

--- 10.1.1.95 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 201.772/230.291/317.346/37.138 ms
  • AB em execução a partir de uma instância EC2 na mesma VPC
ubuntu@ip-10-1-11-72:~$ ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name:        service.internal.stg

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   0.164 seconds
Complete requests:      10
Failed requests:        2
   (Connect: 0, Receive: 0, Length: 2, Exceptions: 0)
Non-2xx responses:      10
Total transferred:      4058 bytes
HTML transferred:       1988 bytes
Requests per second:    61.11 [#/sec] (mean)
Time per request:       16.363 [ms] (mean)
Time per request:       16.363 [ms] (mean, across all concurrent requests)
Transfer rate:          24.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        4    6   2.6      5      11
Processing:     8   11   2.1     11      15
Waiting:        8   11   2.1     11      15
Total:         12   16   4.0     15      24

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     16
  75%     20
  80%     21
  90%     24
  95%     24
  98%     24
  99%     24
 100%     24 (longest request)
  • AB rodando a partir de uma instância EC2 na mesma VPC, acessando o servidor web.
ubuntu@ip-10-1-11-72:~$ ab -n10 -c1 -k \
>     -H "Host: service.internal.stg" \
>     http://10.1.1.95:9999/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.1.1.95 (be patient).....done


Server Software:        Skipper
Server Hostname:        10.1.1.95
Server Port:            9999

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   0.075 seconds
Complete requests:      10
Failed requests:        0
Non-2xx responses:      10
Keep-Alive requests:    10
Total transferred:      4110 bytes
HTML transferred:       1990 bytes
Requests per second:    133.79 [#/sec] (mean)
Time per request:       7.475 [ms] (mean)
Time per request:       7.475 [ms] (mean, across all concurrent requests)
Transfer rate:          53.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:     6    7   1.4      7      11
Waiting:        6    7   1.4      7      11
Total:          6    7   1.4      7      11

Percentage of the requests served within a certain time (ms)
  50%      7
  66%      8
  75%      8
  80%      9
  90%     11
  95%     11
  98%     11
  99%     11
 100%     11 (longest request)
ubuntu@ip-10-1-11-72:~$

Responder1

O estabelecimento da conexão requer poucas solicitações do cliente ao servidor - dependendo da versão do TLS fica entre 1 e 4, da memória.

Sua latência para o servidor é de 200 a 320 ms e é altamente variável. A alta latência é a razão pela qual o estabelecimento da sessão SSL é lento no seu local e também explica por que é muito mais rápido quando executado localmente.

As soluções podem incluir:

  • Localizar o servidor mais próximo de você ou de seus usuários ou executar vários servidores com geolocalização
  • Use o CloudFront para fazer a terminação/descarga de TLS na borda. Provavelmente não é uma ótima solução encerrar https na borda, mas talvez o CloudFront ou um CDN possam tornar isso mais eficiente usando uma rede mais otimizada.
  • Force versões mais recentes do TLS, que são mais eficientes.

informação relacionada