檢查文字檔案的所有可見或不可見字元

Question 1

一個好的十六進位編輯器可能是您最好的選擇。試試FrHed（http://frhed.sourceforge.net/en/）如果你在 Windows 上或祝福（http://home.gna.org/bless/）在Linux上。

Answer

一個好的十六進位編輯器可能是您最好的選擇。試試FrHed（http://frhed.sourceforge.net/en/）如果你在 Windows 上或祝福（http://home.gna.org/bless/）在Linux上。

Question 2

這通天塔編輯器很棒：當您將遊標放在字元後面時，它會顯示 Unicode 編號和 Unicode 名稱。它有一個內建的 Unicode 資訊檢視器，可以顯示字元的許多 Unicode 屬性。不幸的是，它處理 BOM 而不是顯示它，並且它還解釋換行符而不是顯示它們。也許有辦法改變這一點；它的文檔......好吧，不是它最好的部分。但它會顯示像LRM這樣的隱形控件，並且可以區分空格和不間斷空格等。

Answer

這通天塔編輯器很棒：當您將遊標放在字元後面時，它會顯示 Unicode 編號和 Unicode 名稱。它有一個內建的 Unicode 資訊檢視器，可以顯示字元的許多 Unicode 屬性。不幸的是，它處理 BOM 而不是顯示它，並且它還解釋換行符而不是顯示它們。也許有辦法改變這一點；它的文檔......好吧，不是它最好的部分。但它會顯示像LRM這樣的隱形控件，並且可以區分空格和不間斷空格等。

Question 3

也許這會有所幫助，儘管答案更適合 Stack Overflow。我用 Perl 建立了一個小型解析器，它可以滿足您的需求。可惜這裡沒有突出顯示。

#!/usr/bin/perl
use strict; use warnings;
use feature qw(say);
use Data::Dumper;
use Unicode::String;
use utf8;

my $line_no = 1;
# Read stuff from the __DATA__ section as if it were a file,
# one line at a time
while (my $line = <DATA>) {
  # Create a Unicode::String object
  my $us = Unicode::String->new($line);

  # Iterate over the length of the string
  for (my $i = 0; $i < $us->length; $i++) {
    # Get the next char
    my $char = $us->substr($i, 1);
    # Output a description, one line per character
    printf "Line %i, column %i, 0x%x '%s' (%s)\n",
      $line_no,         # line number
      $i,               # colum number
      $char->ord,       # the ordinal of the char, in hex
      $char->as_string, # the stringified char (as in the input)
      $char->name;      # the glyph's name
  }
  # increment line number
  $line_no++;
}

# Below is the DATA section, which can be used as a file handle
__DATA__
This is some very strange unicode stuff right here:
٩(-̮̮̃-̃)۶ ٩(●̮̮̃•̃)۶ ٩(͡๏̯͡๏)۶ ٩(-̮̮̃•̃).

讓我們看看這是做什麼的：

DATA逐行從檔案句柄中讀取（該部分可以像這樣使用）。
從該行建立一個表示 Unicode 字串的物件。
迭代該字串中的字符
輸出每個字元的名稱、編號和內容

這真的非常簡單。也許你可以將它改編為 php，儘管我不知道是否有一個方便的名稱庫。

希望能幫助你。

我在這裡舉起了笑臉的東西：像 ٩(•̮̮̃•̃)ö 這樣的表情符號是由哪些 Unicode 字元組成？

Answer