Powershell 中的 GnuWin32 / sed 意外行為

Question

這是統一碼。 sed 輸出的內容是 Unicode，沒有 PowerShell 用來區分 Unicode 和 ASCII 的 2 位元組前綴。因此，PowerShell 認為它是 ASCII 並保留 \0 位元組（2 位元組 Unicode 字元的高位元組），顯示為空白。由於 PowerShell 內部處理 Unicode，它實際上將每個原始位元組擴展為 2 位元組 Unicode 字元。無法強制 PowerShell 接受 Unicode。可能的解決方法是：

Unicode 是否會作為 SED 的輸入？不太可能，但我認為有可能。檢查一下。

使 SED 的輸出以 Unicode 指示符 \uFEFF 開頭。這可能是 SED 原始碼中遺漏的內容：

_setmode(_fileno(stdout), _O_WTEXT); // probably present and makes it send Unicode
wprintf(L"\uFEFF"); // probably missing

您可以在 SED 命令中新增程式碼，例如

sed "1s/^/\xFF\xFE/;..." # won't work if SED produces Unicode but would work it SED passes Unicode through from its input
sed "1s/^/\uFEFF/;..." # use if SED produces Unicode itself, hopefully SED supports \u

將 sed 的輸出寫入文件，然後使用 Get-Content -Encoding Unicode 讀取。請注意，切換到檔案必須在 cmd.exe 內的命令中完成，例如：
```
cmd /c "sed ... >file"
```
如果你只是讓 >file 在 PowerShell 中處理，它也會以同樣的方式變得混亂。
從 PowerShell 產生的文字中刪除 \0 字元。這對於創建包含代碼 0xA 或 0xD 的 Unicode 位元組的國際字元來說效果不佳 - 最終會得到行分割而不是它們。

Answer 1

這是統一碼。 sed 輸出的內容是 Unicode，沒有 PowerShell 用來區分 Unicode 和 ASCII 的 2 位元組前綴。因此，PowerShell 認為它是 ASCII 並保留 \0 位元組（2 位元組 Unicode 字元的高位元組），顯示為空白。由於 PowerShell 內部處理 Unicode，它實際上將每個原始位元組擴展為 2 位元組 Unicode 字元。無法強制 PowerShell 接受 Unicode。可能的解決方法是：

Unicode 是否會作為 SED 的輸入？不太可能，但我認為有可能。檢查一下。

使 SED 的輸出以 Unicode 指示符 \uFEFF 開頭。這可能是 SED 原始碼中遺漏的內容：

_setmode(_fileno(stdout), _O_WTEXT); // probably present and makes it send Unicode
wprintf(L"\uFEFF"); // probably missing

您可以在 SED 命令中新增程式碼，例如

sed "1s/^/\xFF\xFE/;..." # won't work if SED produces Unicode but would work it SED passes Unicode through from its input
sed "1s/^/\uFEFF/;..." # use if SED produces Unicode itself, hopefully SED supports \u

將 sed 的輸出寫入文件，然後使用 Get-Content -Encoding Unicode 讀取。請注意，切換到檔案必須在 cmd.exe 內的命令中完成，例如：
```
cmd /c "sed ... >file"
```
如果你只是讓 >file 在 PowerShell 中處理，它也會以同樣的方式變得混亂。
從 PowerShell 產生的文字中刪除 \0 字元。這對於創建包含代碼 0xA 或 0xD 的 Unicode 位元組的國際字元來說效果不佳 - 最終會得到行分割而不是它們。

Powershell 中的 GnuWin32 / sed 意外行為

答案1

相關內容