行のグループごとにファイルを並べ替える

Question 1

Perl を使用すると、次のようなものを実行できます。

ファイルを丸呑みする ( perl -0n)
インデントされていない行で入力を分割するsplit(/^(?=\S)/m)
並べ替えと印刷

perl -0ne 'print sort split(/^(?=\S)/m) ' ex

Answer

Perl を使用すると、次のようなものを実行できます。

ファイルを丸呑みする ( perl -0n)
インデントされていない行で入力を分割するsplit(/^(?=\S)/m)
並べ替えと印刷

perl -0ne 'print sort split(/^(?=\S)/m) ' ex

Question 2

まず、sed は各セクションを 1 行に配置し、テキストを<EOL>セクション行間の区切りとして使用します。次に、セクションを並べ替え、2 番目の sed を使用して各セクションを改行<EOL>に戻します。

sed -r ':r;$!{N;br};s:\n([[:blank:]])(\1*):<EOL>\1\2:g' file|sort|sed -r '/^$/d;:l;G;s:(.*)<EOL>(.*)(\n):\1\3\2:;tl;$s:\n$::'

入力ファイルに区切り文字が含まれている可能性があるため、区切り文字として文字を選択しませんでしたが、代わりにを使用しました<EOL>。

出力：入力ファイルのスタイルを再現するために、最後のセクションを除く各セクションの後に改行を追加しました。

FirstSection
    Unique first line in first section
    Unique second line in first section

NthSection
    Unique first line in Nth section
    Unique second line in Nth section

SecondSection
    Unique first line in second section
    Unique second line in second section

Answer

まず、sed は各セクションを 1 行に配置し、テキストを<EOL>セクション行間の区切りとして使用します。次に、セクションを並べ替え、2 番目の sed を使用して各セクションを改行<EOL>に戻します。

sed -r ':r;$!{N;br};s:\n([[:blank:]])(\1*):<EOL>\1\2:g' file|sort|sed -r '/^$/d;:l;G;s:(.*)<EOL>(.*)(\n):\1\3\2:;tl;$s:\n$::'

入力ファイルに区切り文字が含まれている可能性があるため、区切り文字として文字を選択しませんでしたが、代わりにを使用しました<EOL>。

出力：入力ファイルのスタイルを再現するために、最後のセクションを除く各セクションの後に改行を追加しました。

FirstSection
    Unique first line in first section
    Unique second line in first section

NthSection
    Unique first line in Nth section
    Unique second line in Nth section

SecondSection
    Unique first line in second section
    Unique second line in second section

Question 3

GNU を使用するawkとasort()、PROCINFO["sorted_in"]各グループ間の改行に基づいて awk の関連配列にすべてのレコードグループを保持し、配列をソートしてasort()for ループ内のすべてのグループを出力できます。

awk '/^$/{ ++grpNr; next }
{ groups[grpNr]=(groups[grpNr]==""? "" : groups[grpNr] RS) $0 }
END{ asort(groups); 
     for(grp in groups) print groups[grp]
}'  infile

注記:PROCINFO["sorted_in"]要素を使用して、必要な並べ替えの種類を設定できます。たとえばPROCINFO["sorted_in"]="@val_str_desc"、ヴァル配列のueとしてstringとin説明注文。

または、 any awk(Nul で区切られたレコードブロックを生成する) + sort -z(改行ではなく Nul 文字に基づいて並べ替える) + tr( によって以前に追加された Nul 文字を削除するawk) を使用すると、次のようになります。

<infile awk '/^$/{ ++grpNr; next }
{ groups[grpNr]=(groups[grpNr]==""? "\0" : groups[grpNr] RS) $0 }
END{ for(grp in groups) print groups[grp] }' |sort -z |tr -d '\0'

次のような入力ファイルでテストします:

BFirstSection
    Unique first line in first section
    Unique second line in first section

DSecondSection
    Unique first line in second section
    Unique second line in second section

Aanothersection...
    ...
    ...

CfourthSection
    Unique first line in Nth section
    Unique second line in Nth section

出力は次のようになります:

Aanothersection...
    ...
    ...
BFirstSection
    Unique first line in first section
    Unique second line in first section
CfourthSection
    Unique first line in Nth section
    Unique second line in Nth section
DSecondSection
    Unique first line in second section
    Unique second line in second section

Answer