특정 필드를 조작하는 방법은 무엇입니까?

Question 1

awk를 배우는 것은 훌륭한 목표이지만 실제 CSV 파일(특히 이스케이프 또는 따옴표로 묶인 구분 기호가 포함될 수 있는 필드)을 구문 분석하기 위한 기본 제공 메커니즘이 없으며 시간 함수는 GNU 전용이며 이식 가능하지 않습니다.

이러한 이유로 Perl 사용을 고려할 수 있습니다.텍스트::CSV모듈), Python - 또는 이런 종류의 작업에서 제가 현재 가장 좋아하는 것은밀러. 진정한 CSV 구문 분석을 제공할 뿐만 아니라 적절한 strptime기능도 제공하는 반면, GNU awk를 사용하더라도 인수 mktime를 수동으로 구문 분석하고 조합해야 합니다 datespec.

예를 들어 Miller에서는 다음을 수행할 수 있습니다.

mlr --csv \
  put -S '
    s = strptime($time,"%Y-%m-%dT%H:%M:%SZ") + 3*3600; 
    $date = strftime(s,"%d.%m.%Y"); 
    $time = strftime(s,"%H:%M:%S"); 
    $place =~ "(.* of |)([^,]*),(.*)$" { $place = "\2" }
  ' then cut -o -f date,time,latitude,longitude,depth,mag,place input.csv

공백으로 구분된 출력 열을 원하는 경우 ("예쁘게 인쇄된" 표 형식 출력 - 헤더 포함) 또는 (간단한 공백으로 구분된 출력) --csv로 변경하세요.--icsv --opprint--icsv --onidx

전.

$ mlr --icsv --opprint   put -S '
    s = strptime($time,"%Y-%m-%dT%H:%M:%SZ") + 3*3600; 
    $date = strftime(s,"%d.%m.%Y"); 
    $time = strftime(s,"%H:%M:%S"); 
    $place =~ "(.* of |)([^,]*),(.*)$" { $place = "\2" }
  ' then cut -o -f date,time,latitude,longitude,depth,mag,place input.csv
date       time     latitude longitude depth mag place
06.12.2019 16:04:46 -15.2838 -175.1193 10    6   Hihifo
04.12.2019 23:10:03 -19.0515 169.5628  266   6   Isangel
03.12.2019 11:46:36 -18.5597 -70.6504  32.44 6   Arica
02.12.2019 08:01:54 51.3218  -178.2425 27.33 6   Amatignak Island
27.11.2019 10:23:42 35.7272  23.2673   71.76 6   Platanos
26.11.2019 05:54:12 41.5112  19.5151   20    6.4 Mamurras
24.11.2019 03:54:01 51.3809  -175.5108 20    6.3 Adak
23.11.2019 15:11:16 1.6286   132.7854  10    6.1 Papua region
21.11.2019 02:50:43 19.4533  101.3558  10    6.2 Chaloem Phra Kiat

Miller는 Ubuntu universe저장소에서 사용할 수 있습니다.

Answer

awk를 배우는 것은 훌륭한 목표이지만 실제 CSV 파일(특히 이스케이프 또는 따옴표로 묶인 구분 기호가 포함될 수 있는 필드)을 구문 분석하기 위한 기본 제공 메커니즘이 없으며 시간 함수는 GNU 전용이며 이식 가능하지 않습니다.

이러한 이유로 Perl 사용을 고려할 수 있습니다.텍스트::CSV모듈), Python - 또는 이런 종류의 작업에서 제가 현재 가장 좋아하는 것은밀러. 진정한 CSV 구문 분석을 제공할 뿐만 아니라 적절한 strptime기능도 제공하는 반면, GNU awk를 사용하더라도 인수 mktime를 수동으로 구문 분석하고 조합해야 합니다 datespec.

예를 들어 Miller에서는 다음을 수행할 수 있습니다.

mlr --csv \
  put -S '
    s = strptime($time,"%Y-%m-%dT%H:%M:%SZ") + 3*3600; 
    $date = strftime(s,"%d.%m.%Y"); 
    $time = strftime(s,"%H:%M:%S"); 
    $place =~ "(.* of |)([^,]*),(.*)$" { $place = "\2" }
  ' then cut -o -f date,time,latitude,longitude,depth,mag,place input.csv

공백으로 구분된 출력 열을 원하는 경우 ("예쁘게 인쇄된" 표 형식 출력 - 헤더 포함) 또는 (간단한 공백으로 구분된 출력) --csv로 변경하세요.--icsv --opprint--icsv --onidx

전.

$ mlr --icsv --opprint   put -S '
    s = strptime($time,"%Y-%m-%dT%H:%M:%SZ") + 3*3600; 
    $date = strftime(s,"%d.%m.%Y"); 
    $time = strftime(s,"%H:%M:%S"); 
    $place =~ "(.* of |)([^,]*),(.*)$" { $place = "\2" }
  ' then cut -o -f date,time,latitude,longitude,depth,mag,place input.csv
date       time     latitude longitude depth mag place
06.12.2019 16:04:46 -15.2838 -175.1193 10    6   Hihifo
04.12.2019 23:10:03 -19.0515 169.5628  266   6   Isangel
03.12.2019 11:46:36 -18.5597 -70.6504  32.44 6   Arica
02.12.2019 08:01:54 51.3218  -178.2425 27.33 6   Amatignak Island
27.11.2019 10:23:42 35.7272  23.2673   71.76 6   Platanos
26.11.2019 05:54:12 41.5112  19.5151   20    6.4 Mamurras
24.11.2019 03:54:01 51.3809  -175.5108 20    6.3 Adak
23.11.2019 15:11:16 1.6286   132.7854  10    6.1 Papua region
21.11.2019 02:50:43 19.4533  101.3558  10    6.2 Chaloem Phra Kiat

Miller는 Ubuntu universe저장소에서 사용할 수 있습니다.

Question 2

먼저, 포함된 쉼표를 더 잘 처리하기 위해 CSV 입력을 사전 처리해야 합니다. 그런 다음 AWK를 기능적 덩어리로 나눕니다.

$ cat preprocess.sed
#!/bin/sed -f
:start   # loop back to here
/"/{  # for any line that has a double quote
  h   # copy to the hold buffer
  s/[^"]*"\([^"]*\).*/\1/  # what is between the first pair of dquotes
  s/,/@@/g    # replace comma with '@@'
  G   # append the hold buffer to the pattern buffer
      # so we get what was in dqoutes followed by a newline followed by the
      # original line
  s/\(.*\)\n\([^"]*\)"\([^"]*\)"\(.*\)/\2\1\4/
      # replace the unquoted part with what was there
  t start   # go back to 'start'
}

".*,.*"이는 를 으로 대체하여 .*@@.*AWK를 더 쉽게 만듭니다.

날짜만 새 시간대로 변경하려면 첫 번째 줄을 바꾸세요.

$ cat change_date.sh
#!/bin/sh
userTZ="${1:-UTC+3}"
sed 's/,/ /' |
    while read datestr rest; do
        if [ "${datestr}" = time ]; then
            newdate="${datestr}"
        else
            newdate=$(TZ=${userTZ} date -d "${datestr}" "+%d %m %Y %H:%M:%S")
        fi
        echo "${newdate}:${rest}"

    done

AWK 스크립트는 다음과 같습니다:

$ cat reformat.awk
#!/bin/awk  -f
BEGIN {IFS=","}  # comma separated fields
NR==1 {print; next;}  # print the header and do nothing more with it
{   # get just the "town" from the place field
    sub(/.* of /,"",$14)  # strip up to the " of "
    sub(/@@ .*/,"",$14)   # strip after the embedded comma (now '@@')
}
{
    printf("%s %8.3f %8.3f %8.3fs %8.3f %s\n", $1, $2, $3, $4, $5, $14)
}

둘 다 실행 가능하고 실행되는지 확인하십시오.preprocess.sed sample.csv | change_date.sh | reformat.awk

또는 한 줄에:

sed ':start;/"/{;h;s/[^"]*"\([^"]*\).*/\1/;s/,/@@/g;G;s/\(.*\)\n\([^"]*\)"\([^"]*\)"\(.*\)/\2\1\4/;t start;};s/,/ /' test.csv | while read datestr rest; do if [ "$datestr" = "time" ]; then newdate="${datestr}"; else newdate=$(TZ=UTC+3 date -d "$datestr" "+%d %m %Y %H:%M:%S"); fi; echo "${newdate},${rest}"; done | awk -F, 'NR==1 {print;next} {sub(/.* of /,"",$14);sub(/@@ .*/,"",$14)} {printf("%s %8.3f %8.3f %8.3fs %8.3f %s\n", $1, $2, $3, $4, $5, $14)}'

Answer