![wget 失敗:連線逾時](https://rvso.com/image/769294/wget%20%E5%A4%B1%E6%95%97%EF%BC%9A%E9%80%A3%E7%B7%9A%E9%80%BE%E6%99%82.png)
我有以下命令來複製網站,
當它嘗試存取 sun.com 時,連接逾時。
我希望 wget 排除 sun.com,以便 wget 繼續進行下一件事。
現有問題
$ wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows http://pt.jikos.cz/garfield/
.
.
2021-08-09 03:28:28 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
2021-08-09 03:28:30 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
.
Location: https : //packages. debian. org /robots.txt [following]
--2021-08-09 03:28:33-- https : //packages. debian. org /robots.txt
Connecting to packages.debian.org (packages.debian.org)|128.0.10.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24 [text/plain]
Saving to: ‘packages.debian.org/robots.txt’
packages.debian.org 100%[===================>] 24 --.-KB/s in 0s
2021-08-09 03:28:34 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
Loading robots.txt; please ignore errors.
--2021-08-09 03:28:34-- http ://wwws. sun. com/ robots.txt
Resolving wwws.sun.com (wwws.sun.com)... 137.254.16.75
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:28:56-- (try: 2) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:29:19-- (try: 3) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:29:43-- (try: 4) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:30:08-- (try: 5) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:30:34-- (try: 6) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
--2021-08-09 03:31:01-- (try: 7) http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.
預計 $wget 會在沒有超時的情況下保存整個網站,如果有超時,那麼 wget 將跳過超時連接。
答案1
--span-hosts
請閱讀有關使用( -H
) 選項的「風險」以及如何透過新增限制來限制這些風險的詳細手冊:
https://www.gnu.org/software/wget/manual/wget.html#Spanning-Hosts
--span-hosts
or選項-H
開啟主機跨越,從而允許 Wget 的遞歸運行存取連結引用的任何主機。除非應用足夠的遞歸限制標準,否則這些外部主機通常會連結到更多主機,依此類推直到 Wget 最終吸收的數據比您預期的多得多。
…
限制跨越某些域
-D
此-D
選項可讓您指定將遵循的網域,從而將遞歸限制為僅屬於這些網域的主機。
…
禁止對某些網域進行下載
--exclude-domains
如果您想專門排除某些域,可以使用 來執行此操作--exclude-domains
,它接受與 相同類型的參數-D
,但會排除所有列出的域。