열 머리글을 기반으로 파일 내의 열 병합

Question 1

입력이 탭으로 구분된 경우:

awk -F"\t" '
NR == 1 {for (i=1; i<=NF; i++)  COL[i] = $i
        }
        {for (i=1; i<=NF; i++) OUT[NR, COL[i]] = $i
        }
END     {for (n=1; n<=NR; n++)  {split ("", DUP)
                                 for (i=1; i<=NF; i++)  if (!DUP[COL[i]]++) printf "%s" FS, OUT[n,COL[i]]
                                 printf RS
                                }
        }
' file
A   B   C   
1   5   4   
3   1   2   
2   2   1   
1       3   
3       2   
1       4

나중에 부분 인덱스로 사용하기 위해 열 헤더를 저장한 다음 각 라인에 대해 라인 번호와 헤더 부분 인덱스로 인덱스된 배열로 값을 수집합니다. 섹션 에서는 END중복된 열 헤더를 처리하면서 원래 순서대로 해당 배열을 인쇄합니다.

보다 복잡한 파일 구조에서는 중복 처리가 큰 노력이 될 수 있습니다.

Answer

입력이 탭으로 구분된 경우:

awk -F"\t" '
NR == 1 {for (i=1; i<=NF; i++)  COL[i] = $i
        }
        {for (i=1; i<=NF; i++) OUT[NR, COL[i]] = $i
        }
END     {for (n=1; n<=NR; n++)  {split ("", DUP)
                                 for (i=1; i<=NF; i++)  if (!DUP[COL[i]]++) printf "%s" FS, OUT[n,COL[i]]
                                 printf RS
                                }
        }
' file
A   B   C   
1   5   4   
3   1   2   
2   2   1   
1       3   
3       2   
1       4

나중에 부분 인덱스로 사용하기 위해 열 헤더를 저장한 다음 각 라인에 대해 라인 번호와 헤더 부분 인덱스로 인덱스된 배열로 값을 수집합니다. 섹션 에서는 END중복된 열 헤더를 처리하면서 원래 순서대로 해당 배열을 인쇄합니다.

보다 복잡한 파일 구조에서는 중복 처리가 큰 노력이 될 수 있습니다.

Question 2

탭으로 구분된 입력의 경우.

헤더와 해당 열 번호를 입력 파일에 표시된 배열로 읽습니다. 그런 다음 각 열의 입력 파일을 동일한 headerName을 갖는 동일한 파일 이름 headerName.txt로 분할합니다. 결국 함께 붙여넣고column출력을 아름답게 하는 데 사용되는 명령입니다.

awk -F'\t' '
    ## find all the column number(s) when same header found and store in `h` array
    ## key is the column number and value is header name. for an example:
    ## for the header value 'A', keys will be columns 1 &4
    NR==1{ while (++i<=NF) h[i]=$i; next; }

         { for (i=1; i<=NF; i++) {

    ## save the field content to a file which its key column matches with the column 
    ## number of the current field. for an example:
    ## for the first field in column 1; the column number is 1, and so 1 is the key  
    ## column for header value A, so this will be written to "A.txt" filename
    ## only if it was not empty.
               if ($i!=""){ print $i> h[i]".txt" };
         }; }

    ## at the end paste those all files and beautify output with `column` command.
    ## number of .txt files above is limit to the number of uniq headers in your input. 
END{ system("paste *.txt |column \011 -tn") }' infile

주석 없는 명령:

awk -F'\t' '
    NR==1{ while (++i<=NF) h[i]=$i; next; }
         { for (i=1; i<=NF; i++) {
               if ($i!=""){ print $i> h[i]".txt" };
         }; }
END{ system("paste *.txt |column \011 -tn") }' infile

Answer

탭으로 구분된 입력의 경우.

헤더와 해당 열 번호를 입력 파일에 표시된 배열로 읽습니다. 그런 다음 각 열의 입력 파일을 동일한 headerName을 갖는 동일한 파일 이름 headerName.txt로 분할합니다. 결국 함께 붙여넣고column출력을 아름답게 하는 데 사용되는 명령입니다.

awk -F'\t' '
    ## find all the column number(s) when same header found and store in `h` array
    ## key is the column number and value is header name. for an example:
    ## for the header value 'A', keys will be columns 1 &4
    NR==1{ while (++i<=NF) h[i]=$i; next; }

         { for (i=1; i<=NF; i++) {

    ## save the field content to a file which its key column matches with the column 
    ## number of the current field. for an example:
    ## for the first field in column 1; the column number is 1, and so 1 is the key  
    ## column for header value A, so this will be written to "A.txt" filename
    ## only if it was not empty.
               if ($i!=""){ print $i> h[i]".txt" };
         }; }

    ## at the end paste those all files and beautify output with `column` command.
    ## number of .txt files above is limit to the number of uniq headers in your input. 
END{ system("paste *.txt |column \011 -tn") }' infile

주석 없는 명령:

awk -F'\t' '
    NR==1{ while (++i<=NF) h[i]=$i; next; }
         { for (i=1; i<=NF; i++) {
               if ($i!=""){ print $i> h[i]".txt" };
         }; }
END{ system("paste *.txt |column \011 -tn") }' infile

Question 3

전체 파일을 "버퍼링"할 필요가 없는 약간 다른 접근 방식입니다.

AWK 스크립트 colmerge.awk:

FNR==1{
    for (i=1; i<=NF; i++)
    {
    hdr[i]=$i;
    if (map[$i]==0) {map[$i]=i; uniq_hdr[++u]=$i; printf("%s",$i);}
    if (i==NF) printf("%s",ORS); else printf("%s",OFS);
    }
}

FNR>1{
    delete linemap;
    for (i=1; i<=NF; i++) if ($i!="") linemap[hdr[i]]=$i;
    for (i=1; i<=u; i++)
    {
    printf("%s",linemap[uniq_hdr[i]]);
    if (i==u) printf("%s",ORS); else printf("%s",OFS);
    }
}

로 사용

awk -F'\t' -v OFS='\t' -f colmerge.awk file

이렇게 하면 모든 헤더를 수집하고 "고유한" 헤더와 라인 1에서 첫 번째 발생을 식별하고, 연속되는 각 라인에 대해 헤더와 비어 있지 않은 값 사이의 맵을 생성한 다음 "고유한" 헤더의 순서로 인쇄합니다. 첫 번째 줄을 처리하는 동안 식별된 대로입니다.

그러나 이는 입력 파일이 탭으로 구분된 경우에만 작동합니다. 이는 "빈" 필드를 안정적으로 감지하는 유일한 방법이기 때문입니다.

delete전체 배열에 대한 명령문 은 linemap모든 구현에서 지원되지 않을 수도 있습니다 (그러나 , 및 awk에서는 작동해야 함 ).gawkmawknawk

Answer