Terminación SSL de AWS ALB lenta

Terminación SSL de AWS ALB lenta

Estoy rastreando un problema de rendimiento con las solicitudes SSL.

Ejecutamos dos servidores web en dos instancias EC2 (us-east-2a/us-east-2b) con un ALB que también realiza la terminación SSL allí, Route53 está a cargo del dominio con un CNAME para el CNAME del ALB. Todo corre en una VPC privada, con dos subredes privadas, ambas subredes tienen una tabla de rutas con acceso a Internet a través de una puerta de enlace NAT. Estoy usando una VPN para llegar a los puntos finales del balanceador/EC2.

Ir directamente al ALB usando HTTP (sin redirección de HTTP a HTTPS),

% ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            80

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   5.015 seconds
Complete requests:      10
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Non-2xx responses:      10
Total transferred:      4059 bytes
HTML transferred:       1989 bytes
Requests per second:    1.99 [#/sec] (mean)
Time per request:       501.536 [ms] (mean)
Time per request:       501.536 [ms] (mean, across all concurrent requests)
Transfer rate:          0.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      203  251  51.0    243     315
Processing:   216  251  43.5    221     309
Waiting:      216  250  43.5    221     309
Total:        420  501  77.9    520     617

Percentage of the requests served within a certain time (ms)
  50%    520
  66%    536
  75%    550
  80%    612
  90%    617
  95%    617
  98%    617
  99%    617
 100%    617 (longest request)

Llegando directamente al ALB usando HTTPS,

% ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key:        ECDH P-256 256 bits
TLS Server Name:        service.internal.stg

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   9.822 seconds
Complete requests:      10
Failed requests:        0
Non-2xx responses:      10
Total transferred:      4060 bytes
HTML transferred:       1990 bytes
Requests per second:    1.02 [#/sec] (mean)
Time per request:       982.242 [ms] (mean)
Time per request:       982.242 [ms] (mean, across all concurrent requests)
Transfer rate:          0.40 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      633  737 100.4    792     883
Processing:   220  245  31.5    231     303
Waiting:      220  245  31.5    231     303
Total:        858  982 105.1   1039    1114

Percentage of the requests served within a certain time (ms)
  50%   1039
  66%   1041
  75%   1061
  80%   1108
  90%   1114
  95%   1114
  98%   1114
  99%   1114
 100%   1114 (longest request)

Obtuve tiempos de conexión muchísimo más altos. Pero, al ejecutar ab con HTTP Keepalive (-k), solo tengo una solicitud lenta (~900 ms), pero mientras tanto estamos bastante bien alcanzando ~320 ms.

% ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key:        ECDH P-256 256 bits
TLS Server Name:        service.internal.stg

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   3.242 seconds
Complete requests:      10
Failed requests:        1
   (Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Non-2xx responses:      10
Keep-Alive requests:    10
Total transferred:      4109 bytes
HTML transferred:       1989 bytes
Requests per second:    3.08 [#/sec] (mean)
Time per request:       324.238 [ms] (mean)
Time per request:       324.238 [ms] (mean, across all concurrent requests)
Transfer rate:          1.24 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   92 292.2      0     924
Processing:   217  232  22.7    223     279
Waiting:      217  232  22.6    223     279
Total:        217  324 289.5    224    1146

Percentage of the requests served within a certain time (ms)
  50%    224
  66%    227
  75%    269
  80%    279
  90%   1146
  95%   1146
  98%   1146
  99%   1146
 100%   1146 (longest request)

Entonces tengo dudas con el rendimiento de la terminación SSL en el ALB, pero no estoy seguro de cómo manejar/trabajar en esto.

Información adicional: - Hacer ping desde mi ubicación a la instancia EC2

% ping 10.1.1.95 -c 10                                                                                                                              ~
PING 10.1.1.95 (10.1.1.95): 56 data bytes
64 bytes from 10.1.1.95: icmp_seq=0 ttl=61 time=203.177 ms
64 bytes from 10.1.1.95: icmp_seq=1 ttl=61 time=202.369 ms
64 bytes from 10.1.1.95: icmp_seq=2 ttl=61 time=317.346 ms
64 bytes from 10.1.1.95: icmp_seq=3 ttl=61 time=232.651 ms
64 bytes from 10.1.1.95: icmp_seq=4 ttl=61 time=252.859 ms
64 bytes from 10.1.1.95: icmp_seq=5 ttl=61 time=271.837 ms
64 bytes from 10.1.1.95: icmp_seq=6 ttl=61 time=204.135 ms
64 bytes from 10.1.1.95: icmp_seq=7 ttl=61 time=208.154 ms
64 bytes from 10.1.1.95: icmp_seq=8 ttl=61 time=201.772 ms
64 bytes from 10.1.1.95: icmp_seq=9 ttl=61 time=208.608 ms

--- 10.1.1.95 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 201.772/230.291/317.346/37.138 ms
  • AB ejecutándose desde una instancia EC2 en la misma VPC
ubuntu@ip-10-1-11-72:~$ ab -n10 -c1 \
    -H "Host: service.internal.stg" \
    http://service.internal.stg/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking service.internal.stg (be patient).....done


Server Software:        Skipper
Server Hostname:        service.internal.stg
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name:        service.internal.stg

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   0.164 seconds
Complete requests:      10
Failed requests:        2
   (Connect: 0, Receive: 0, Length: 2, Exceptions: 0)
Non-2xx responses:      10
Total transferred:      4058 bytes
HTML transferred:       1988 bytes
Requests per second:    61.11 [#/sec] (mean)
Time per request:       16.363 [ms] (mean)
Time per request:       16.363 [ms] (mean, across all concurrent requests)
Transfer rate:          24.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        4    6   2.6      5      11
Processing:     8   11   2.1     11      15
Waiting:        8   11   2.1     11      15
Total:         12   16   4.0     15      24

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     16
  75%     20
  80%     21
  90%     24
  95%     24
  98%     24
  99%     24
 100%     24 (longest request)
  • AB ejecutándose desde una instancia EC2 en la misma VPC y accediendo al servidor web.
ubuntu@ip-10-1-11-72:~$ ab -n10 -c1 -k \
>     -H "Host: service.internal.stg" \
>     http://10.1.1.95:9999/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.1.1.95 (be patient).....done


Server Software:        Skipper
Server Hostname:        10.1.1.95
Server Port:            9999

Document Path:          /
Document Length:        199 bytes

Concurrency Level:      1
Time taken for tests:   0.075 seconds
Complete requests:      10
Failed requests:        0
Non-2xx responses:      10
Keep-Alive requests:    10
Total transferred:      4110 bytes
HTML transferred:       1990 bytes
Requests per second:    133.79 [#/sec] (mean)
Time per request:       7.475 [ms] (mean)
Time per request:       7.475 [ms] (mean, across all concurrent requests)
Transfer rate:          53.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:     6    7   1.4      7      11
Waiting:        6    7   1.4      7      11
Total:          6    7   1.4      7      11

Percentage of the requests served within a certain time (ms)
  50%      7
  66%      8
  75%      8
  80%      9
  90%     11
  95%     11
  98%     11
  99%     11
 100%     11 (longest request)
ubuntu@ip-10-1-11-72:~$

Respuesta1

El establecimiento de la conexión requiere pocas solicitudes del cliente al servidor; según la versión de TLS, está entre 1 y 4, desde la memoria.

Su latencia hacia el servidor es de 200 a 320 ms y es muy variable. La alta latencia es la razón por la cual el establecimiento de la sesión SSL es lento desde su ubicación y también explica por qué es mucho más rápido cuando se ejecuta localmente.

Las soluciones podrían incluir:

  • Ubicar el servidor más cerca de usted o de sus usuarios, o ejecutar múltiples servidores con geolocalización
  • Utilice CloudFront para realizar la terminación/descarga de TLS en el borde. Probablemente no sea una gran solución para realizar la terminación https en el borde, pero tal vez CloudFront o una CDN puedan hacer esto más eficiente utilizando una red más optimizada.
  • Fuerce versiones más nuevas de TLS, que son más eficientes.

información relacionada