
我們有一個帶有活動/備用資源管理器服務的 Hadoop 集群,活動資源管理器位於 master1 機器上,備用資源管理器位於 master2 機器上
在我們的包含資源管理器服務的叢集 YARN 服務中,管理工作機器上的 276 個節點管理器元件
從 Ambari WEB UI 警報(資源管理器警報)中,我們注意到以下內容
Resource Manager Web UI
Connection failed to http://master2.jupiter.com:8088(timed out)
我們開始透過 wget 使用連接埠 8088 偵錯問題,我們發現進程掛起 - HTTP 請求已傳送awaiting response... No data received
。
資源管理器機器的範例
wget --debug http://master2.jupiter.com:8088
DEBUG output created by Wget 1.14 on Linux-gnu.
URI encoding = ‘UTF-8’
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2024-02-21 10:13:42-- http://master2` .jupiter.com:8088/
Resolving master2.jupiter.com (master2.jupiter.com)... 192.9.201.169
Caching master2.jupiter.com => 192.9.201.169
Connecting to master2.jupiter.com (master2.jupiter.com)|192.9.201.169|:8088... connected.
Created socket 3.
Releasing 0x0000000000a0da00 (new refcount 1).
---request begin---
GET / HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: master2.jupiter.com:8088
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 21 Feb 2024 10:13:42 GMT
Date: Wed, 21 Feb 2024 10:13:42 GMT
Pragma: no-cache
Expires: Wed, 21 Feb 2024 10:13:42 GMT
Date: Wed, 21 Feb 2024 10:13:42 GMT
Pragma: no-cache
Content-Type: text/plain; charset=UTF-8
X-Frame-Options: SAMEORIGIN
Location: http://master1.jupiter.com:8088/
Content-Length: 43
Server: Jetty(6.1.26.hwx)
---response end---
307 TEMPORARY_REDIRECT
Registered socket 3 for persistent reuse.
URI content encoding = ‘UTF-8’
Location: http://master1.jupiter.com:8088/ [following]
Skipping 43 bytes of body: [This is standby RM. The redirect url is: /
] done.
URI content encoding = None
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2024-02-21 10:13:42-- http://master1.jupiter.com:8088/
conaddr is: 192.9.201.169
Resolving master1.jupiter.com (master1.jupiter.com)... 192.9.66.14
Caching master1.jupiter.com => 192.9.66.14
Releasing 0x0000000000a0f320 (new refcount 1).
Found master1.jupiter.com in host_name_addresses_map (0xa0f320)
Connecting to master1.jupiter.com (master1.jupiter.com)|192.9.66.14|:8088... connected.
Created socket 4.
Releasing 0x0000000000a0f320 (new refcount 1).
.
.
.
---response end---
302 Found
Disabling further reuse of socket 3.
Closed fd 3
Registered socket 4 for persistent reuse.
URI content encoding = ‘UTF-8’
Location: http://master1.jupiter.com:8088/cluster [following]
] done.
URI content encoding = None
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2024-02-21 10:27:07-- http://master1.jupiter.com:8088/cluster
Reusing existing connection to master1.jupiter.com:8088.
Reusing fd 4.
---request begin---
GET /cluster HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: master1.jupiter.com:8088
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Wed, 21 Feb 2024 10:30:23 GMT
Date: Wed, 21 Feb 2024 10:30:23 GMT
Pragma: no-cache
Expires: Wed, 21 Feb 2024 10:30:23 GMT
Date: Wed, 21 Feb 2024 10:30:23 GMT
Pragma: no-cache
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)
---response end---
200 OK
URI content encoding = ‘utf-8’
Length: unspecified [text/html]
Saving to: ‘index.html’
[ <=> ] 1,018,917 --.-K/s in 0.04s
2024-02-21 10:31:31 (24.0 MB/s) - ‘index.html’ saved [1018917]
正如我們在上面看到的,wget 在很長一段時間(約 20 分鐘)後完成,而不是在一兩秒內完成該過程
我們可以將 tcpdump 視為
tcpdump -vv -s0 tcp port 8088 -w /tmp/why_8088_hang.pcap
但我想了解是否有更好的簡單方法來理解為什麼我們發送 HTTP 請求,等待回應...,也許它與資源管理器服務有關