取得網頁完整數據

取得網頁完整數據

我正在使用 busybox 工具,我想獲取網頁中的所有 http 連結。我使用curl 或wget 保存範例連結頁面。但是,它將頁面儲存為 html。如何使用curl或wget指令來做到這一點?

example webpage = http://www.turanevdekorasyon.com/wp-includes/test/ 

以下的資料是用firefox瀏覽器以文字格式儲存。

Index of /wp-includes/test/

Name <http://www.turanevdekorasyon.com/wp-includes/test/?ND>                                                                             Last modified <http://www.turanevdekorasyon.com/wp-includes/test/?MA>         Size <http://www.turanevdekorasyon.com/wp-includes/test/?SA>  Description  <http://www.turanevdekorasyon.com/wp-includes/test/?DA>

------------------------------------------------------------------------
up Parent Directory <http://www.turanevdekorasyon.com/wp-includes/>                                                                 28-May-2019 02:15        -       
[CMP] v1.0.zip <http://www.turanevdekorasyon.com/wp-includes/test/v1.0.zip>                                                                         28-May-2019 02:15       4k       
[CMP] v1.1.zip <http://www.turanevdekorasyon.com/wp-includes/test/v1.1.zip>                                                                         28-May-2019 02:15       4k       
[CMP] v1.2.zip <http://www.turanevdekorasyon.com/wp-includes/test/v1.2.zip>                                                                         28-May-2019 02:15       4k       

------------------------------------------------------------------------
Proudly Served by LiteSpeed Web Server at www.turanevdekorasyon.com Port 80

答案1

我建議使用 F伊萊|節省AChromium 的功能並將網頁保存在MHT 格式開啟實驗性的「將頁面另存為 MHTML」選項後,透過在 Chrome 瀏覽器中存取連結「chrome://flags/#save-page-as-mhtml」。

答案2

使用有什麼意義捲曲或者獲取?使用山貓:

lynx -dump 'www.example.com'

它將輸出所有顯示和隱藏的連結。

相關內容