Estou usando o código abaixo para converter caracteres como '\u00c0' para formato Unicode como 'À'
unicode(){ sed -i 's/\\\u00c0/À/g' $1;sed -i 's/\\\u00c1/Á/g' $1;sed -i 's/\\\u00c2/Â/g' $1;sed -i 's/\\\u00c3/Ã/g' $1;sed -i 's/\\\u00c4/Ä/g' $1;sed -i 's/\\\u00c5/Å/g' $1;sed -i 's/\\\u00c6/Æ/g' $1;sed -i 's/\\\u00c7/Ç/g' $1;sed -i 's/\\\u00c8/È/g' $1;sed -i 's/\\\u00c9/É/g' $1;sed -i 's/\\\u00ca/Ê/g' $1;sed -i 's/\\\u00cb/Ë/g' $1;sed -i 's/\\\u00cc/Ì/g' $1;sed -i 's/\\\u00cd/Í/g' $1;sed -i 's/\\\u00ce/Î/g' $1;sed -i 's/\\\u00cf/Ï/g' $1;sed -i 's/\\\u00d0/Ð/g' $1;sed -i 's/\\\u00d1/Ñ/g' $1;sed -i 's/\\\u00d2/Ò/g' $1;sed -i 's/\\\u00d3/Ó/g' $1;sed -i 's/\\\u00d4/Ô/g' $1;sed -i 's/\\\u00d5/Õ/g' $1;sed -i 's/\\\u00d6/Ö/g' $1;sed -i 's/\\\u00d7/×/g' $1;sed -i 's/\\\u00d8/Ø/g' $1;sed -i 's/\\\u00d9/Ù/g' $1;sed -i 's/\\\u00da/Ú/g' $1;sed -i 's/\\\u00db/Û/g' $1;sed -i 's/\\\u00dc/Ü/g' $1;sed -i 's/\\\u00dd/Ý/g' $1;sed -i 's/\\\u00de/Þ/g' $1;sed -i 's/\\\u00df/ß/g' $1;sed -i 's/\\\u00e0/à/g' $1;sed -i 's/\\\u00e1/á/g' $1;sed -i 's/\\\u00e2/â/g' $1;sed -i 's/\\\u00e3/ã/g' $1;sed -i 's/\\\u00e4/ä/g' $1;sed -i 's/\\\u00e5/å/g' $1;sed -i 's/\\\u00e6/æ/g' $1;sed -i 's/\\\u00e7/ç/g' $1;sed -i 's/\\\u00e8/è/g' $1;sed -i 's/\\\u00e9/é/g' $1;sed -i 's/\\\u00ea/ê/g' $1;sed -i 's/\\\u00eb/ë/g' $1;sed -i 's/\\\u00ec/ì/g' $1;sed -i 's/\\\u00ed/í/g' $1;sed -i 's/\\\u00ee/î/g' $1;sed -i 's/\\\u00ef/ï/g' $1;sed -i 's/\\\u00f0/ð/g' $1;sed -i 's/\\\u00f1/ñ/g' $1;sed -i 's/\\\u00f2/ò/g' $1;sed -i 's/\\\u00f3/ó/g' $1;sed -i 's/\\\u00f4/ô/g' $1;sed -i 's/\\\u00f5/õ/g' $1;sed -i 's/\\\u00f6/ö/g' $1;sed -i 's/\\\u00f7/÷/g' $1;sed -i 's/\\\u00f8/ø/g' $1;sed -i 's/\\\u00f9/ù/g' $1;sed -i 's/\\\u00fa/ú/g' $1;sed -i 's/\\\u00fb/û/g' $1;sed -i 's/\\\u00fc/ü/g' $1;sed -i 's/\\\u00fd/ý/g' $1;sed -i 's/\\\u00fe/þ/g' $1;sed -i 's/\\\u00ff/ÿ/g' $1; }
Então eu uso unicode file.txt
para converter para Unicode.
Se eu tiver um arquivo chamado original_text e ele tiver uma string como \u00d8rsted, por exemplo, run unicode original_text
converterá essa string em Ørsted
.
Isso está funcionando muito bem, mas o código parece estar bastante incorreto e, na verdade, parece um pouco feio.
Eu me pergunto, existe uma maneira melhor de fazer essa conversão (no shell ou mesmo em um comando unix para converter esses caracteres)?
Responder1
ascii2uni
deuni2asciipode fazer isso.
$ ./ascii2uni -q -a U <<< '\u00d8rsted'
Ørsted