使用 Twitter 檔案建立 WARC 文件

Question

看起來您正在使用 warcit 工具從本地 Twitter 存檔建立 WARC 檔案。 warcit 工具可讓您透過抓取網站並將頁面和相關資源（例如圖像和樣式表）儲存到 WARC 檔案來從網站建立 WARC 檔案。

若要建立將 Twitter 檔案中的 index.html 檔案作為頁麵包含的 WARC 文件，您可以使用 --index-files 選項和 index.html 檔案的路徑。例如：

warcit --name twitter_archive -o --no-gzip -d 20221122010159 --index-files=E:/twitter_archive/index.html http://website.com/ "E:/twitter_archive/"

這將創建一個 WARC 文件，其中包含作為頁面的 index.html 文件，以及從該文件連結的任何其他頁面和資源。

如果要將 Twitter 檔案中的所有檔案包含在 WARC 檔案中，可以使用 --mirror 選項而不是 --index-files 選項。這將創建一個 WARC 文件，其中包含 Twitter 存檔中的所有文件，而不僅僅是從特定索引文件連結的文件。

warcit --name twitter_archive -o --no-gzip -d 20221122010159 --mirror http://website.com/ "E:/twitter_archive/"

Answer 1

看起來您正在使用 warcit 工具從本地 Twitter 存檔建立 WARC 檔案。 warcit 工具可讓您透過抓取網站並將頁面和相關資源（例如圖像和樣式表）儲存到 WARC 檔案來從網站建立 WARC 檔案。

若要建立將 Twitter 檔案中的 index.html 檔案作為頁麵包含的 WARC 文件，您可以使用 --index-files 選項和 index.html 檔案的路徑。例如：

warcit --name twitter_archive -o --no-gzip -d 20221122010159 --index-files=E:/twitter_archive/index.html http://website.com/ "E:/twitter_archive/"

這將創建一個 WARC 文件，其中包含作為頁面的 index.html 文件，以及從該文件連結的任何其他頁面和資源。

如果要將 Twitter 檔案中的所有檔案包含在 WARC 檔案中，可以使用 --mirror 選項而不是 --index-files 選項。這將創建一個 WARC 文件，其中包含 Twitter 存檔中的所有文件，而不僅僅是從特定索引文件連結的文件。

warcit --name twitter_archive -o --no-gzip -d 20221122010159 --mirror http://website.com/ "E:/twitter_archive/"

相關內容