使 wget 引用本機副本,而無需冗餘下載文件

使 wget 引用本機副本,而無需冗餘下載文件

我想存檔留言板,我透過使用wget和參數來實現:--page-requisites--span-hosts--convert-links--no-clobber

問題是使用--convert-links禁用--no-clobber.對於每個主題頁面,wget 都會重新下載網站外觀、腳本和圖示(以保持更新)。

有沒有辦法防止 wget 下載本地已存在的文件,將文件鏈接引用到其本地副本,並且僅下載文件系統中尚未存在的文件?

答案1

我相信,如果您包含該開關,-N它將強制wget使用時間戳。

   -N
   --timestamping
       Turn on time-stamping.

使用此開關,wget將僅下載本地尚未存在的檔案。

例子

robots.txt下載本機尚不存在的檔案。

$ wget -N http://google.com/robots.txt
--2014-06-15 21:18:16--  http://google.com/robots.txt
Resolving google.com (google.com)... 173.194.41.9, 173.194.41.14, 173.194.41.0, ...
Connecting to google.com (google.com)|173.194.41.9|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.com/robots.txt [following]
--2014-06-15 21:18:17--  http://www.google.com/robots.txt
Resolving www.google.com (www.google.com)... 173.194.46.83, 173.194.46.84, 173.194.46.80, ...
Connecting to www.google.com (www.google.com)|173.194.46.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘robots.txt’

    [ <=>                                                                                                                                 ] 7,608       --.-K/s   in 0s      

2014-06-15 21:18:17 (359 MB/s) - ‘robots.txt’ saved [7608]

使用本地文件再次嘗試robots.txt

$ wget -N http://google.com/robots.txt
--2014-06-15 21:18:19--  http://google.com/robots.txt
Resolving google.com (google.com)... 173.194.41.8, 173.194.41.9, 173.194.41.14, ...
Connecting to google.com (google.com)|173.194.41.8|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.com/robots.txt [following]
--2014-06-15 21:18:19--  http://www.google.com/robots.txt
Resolving www.google.com (www.google.com)... 173.194.46.82, 173.194.46.83, 173.194.46.84, ...
Connecting to www.google.com (www.google.com)|173.194.46.82|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Server file no newer than local file ‘robots.txt’ -- not retrieving.

請注意,第二次時,wget沒有再次檢索文件。

相關內容