
我有一大堆從我解散的 IMAP 帳戶中保存的電子郵件。
檔案名稱是每封電子郵件的主題行。
現在不幸的是,當使用非 ASCII 編碼時,主題行看起來就像它們內部的樣子 - 它們將帶有前綴=_
並使用編碼:
=_UTF-8_Q_Auftragsbest=C3=A4tigung_(Kundennummer__)_=_20100819_150312_37.eml
=_windows-1252_Q_Best=E4tigung=3A_Wir_haben_Ihre_=_20100819_150310_28.eml
有人知道可以用來在檔案系統層級大規模修復此問題的工具嗎?
解決方案必須 1. 刪除=_ENCODING
前綴,2. 如果可能,將檔案名稱中的編碼字元轉換為其正確的檔案系統等效變音符號。
我使用的是 Windows 7 或 XP,但我準備將其帶到 Linux 虛擬機器上,因為它是大的資料夾和自動化解決方案是偉大的。
答案1
我自己建立了一個 PHP 腳本。我想如果其他人遇到類似的問題我會分享它。它適用於我和我需要的編碼(您可能必須擴展編碼數組)。
該腳本轉換 MIME 編碼的文件名字遞歸地將指定的目錄結構轉換為 UTF-8。
它不會產生完全完美的結果:有幾個特殊字元會被雙重轉換,或者根本不會。據我所知,這是 IMAP 匯出程式的錯誤或電子郵件本身內部的編碼資訊不正確。
mb_decode_mimeheader()
是整件事的核心。
發佈到公共領域;沒有任何保證。需要 PHP 5.2。
它應該在 CLI 和 Web 上運行;我在瀏覽器中測試過。
在對資料運行此類腳本之前進行備份。
<?php
/* Directory to parse */
$dir = "D:/IMAP";
/* Extensions to parse. Leave empty for none */
$extensions = array("eml");
/* Set to true to actually run the renaming */
define ("GO", true);
/* No need to change past this point */
/* Output content type header if not in CLI */
if (strtolower(php_sapi_name()) != "CLI")
header("Content-type: text/plain; charset=utf-8");
$FixNames = new FixEmlNames($dir, $extensions);
$FixNames->fixAll();
class FixEmlNames
{
/* List of possible encodings here */
private $encodings = array("iso-8859-1", "iso-8859-15", "windows-1252", "utf-8");
/* Encoding Prefix. The exporter exports e.g. =_iso-8859-1_ with underscores
instead of question marks */
private $encoding_prefix = "=_";
/* Encoding postfix */
private $encoding_postfix = "_";
/* Temporary storage for files */
private $files;
/* Array of file extensions to process. Leave empty to parse all files and directories */
private $extensions = array();
/* Count of renamed files */
private $count = 0;
/* Count of failed renames */
private $failed = 0;
/* Count of skipped renames */
private $skipped = 0;
/* Transform forbidden characters in host OS */
private $transform_characters = array(":" => "_", "?" => "_", ">" => "_");
function __construct($dir, $extensions = array("eml"))
{
$this->files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
$this->extensions = $extensions;
}
function fixAll()
{
echo "Starting....\n";
while($this->files->valid())
{
if (!$this->files->isDot())
{
$path = $this->files->key();
$ext = pathinfo($path, PATHINFO_EXTENSION);
if ((count($this->extensions) == 0 ) or (in_array($ext, $this->extensions)))
$this->renameOne($path);
}
$this->files->next();
}
echo "Done. ";
/* Show stats */
$status = array();
if ($this->count > 0) $status[] = $this->count." OK";
if ($this->failed > 0) $status[] = $this->failed." failed";
if ($this->skipped > 0) $status[] = $this->skipped." skipped";
echo implode(", ", $status);
}
function renameOne($fullPath)
{
$filename = pathinfo($fullPath, PATHINFO_BASENAME);
$is_mime = false;
// See whether file name is MIME encoded or not
foreach ($this->encodings as $encoding)
{ if (stristr($filename, $this->encoding_prefix.$encoding.$this->encoding_postfix))
$is_mime = true;
}
// No MIME encoding? Skip.
if (!$is_mime)
{
# uncomment to see skipped files
# echo "Skipped: $filename\n";
$this->skipped++;
return true;
}
mb_internal_encoding("UTF-8");
$filename = str_replace("_", "?", $filename); // Question marks were converted to underscores
$filename = mb_decode_mimeheader($filename);
$filename = str_replace("?", "_", $filename);
// Remove forbidden characters
$filename = strtr($filename, $this->transform_characters);
// Rename
if (constant("GO") == true)
{
// We catch the error manually
$old = error_reporting(0);
$success = rename($fullPath, realpath(dirname($fullPath)).DIRECTORY_SEPARATOR.$filename);
error_reporting($old);
if ($success)
{
echo "OK: $filename\n";
$this->count++;
return true;
}
else
{
$error = error_get_last();
$message = $error["message"];
$this->failed++;
echo "Failed renaming $fullPath. Error message: ".$message."\n";
return false;
}
}
else
{
$this->count++;
echo "Simulation: $filename\n";
return true;
}
}
}
答案2
既然您願意遷移到 Linux,您可以在其上安裝一個 php 伺服器並製作一個相當簡單的腳本來重新編碼檔案。難度取決於您是否曾經做過任何程式設計。您可以參考這些函數php.net
這些是您需要的功能
<?php
opendir ( string $path [, resource $context ] )
readdir ([ resource $dir_handle ] )
file_get_contents(ENTER THE FILE NAMES HERE WITH A VARIABLE PASSED FROM readdir)
preg_replace(REGULAR EXPRESSION TO REMOVE THE =ENCODING part of the filename)
string mb_convert_encoding ( string $str , string $to_encoding [, mixed $from_encoding ] )
file_put_contents(THE NEW FILE NAME.eml)
?>