
파일이 있는데 두 개의 필드로 구성되어 있습니다. 첫 번째 필드 형식은 다음과 같습니다."%FT%T".
샘플 데이터:
2019-01-01T00:00:00 4.8
2019-01-01T01:00:00 5.1
2019-01-01T02:00:00 5.4
2019-01-01T03:00:00 5.7
2019-01-01T04:00:00 5.8
2019-01-01T05:00:00 5.4
2019-01-01T06:00:00 5
2019-01-01T07:00:00 4.4
2019-01-01T08:00:00 3.8
2019-01-01T09:00:00 3.7
2019-01-01T10:00:00 3.8
2019-01-01T11:00:00 4.1
2019-01-01T12:00:00 5
2019-01-01T13:00:00 6.7
2019-01-01T14:00:00 8.4
2019-01-01T15:00:00 9.1
2019-01-01T16:00:00 8.6
2019-01-01T17:00:00 8.5
2019-01-01T18:00:00 8.6
2019-01-01T19:00:00 8.1
2019-01-01T20:00:00 8
2019-01-01T21:00:00 6.9
2019-01-01T22:00:00 5.6
2019-01-01T23:00:00 5.2
2019-01-02T00:00:00 5.2
2019-01-02T01:00:00 5.3
2019-01-02T02:00:00 5.8
2019-01-02T03:00:00 6
2019-01-02T04:00:00 5.7
2019-01-02T05:00:00 5.4
2019-01-02T06:00:00 5.7
2019-01-02T07:00:00 5.3
2019-01-02T08:00:00 4.8
2019-01-02T09:00:00 4.3
2019-01-02T10:00:00 3.6
2019-01-02T11:00:00 2.8
2019-01-02T12:00:00 3.2
2019-01-02T13:00:00 4.2
2019-01-02T14:00:00 4.9
2019-01-02T15:00:00 5.4
2019-01-02T16:00:00 5.9
2019-01-02T17:00:00 6.5
2019-01-02T18:00:00 6.7
2019-01-02T19:00:00 7.1
2019-01-02T20:00:00 5.7
2019-01-02T21:00:00 4.4
2019-01-02T22:00:00 4.1
2019-01-02T23:00:00 3.8
2019-01-03T00:00:00 4
2019-01-03T01:00:00 3.5
2019-01-03T02:00:00 3.6
2019-01-03T03:00:00 4
2019-01-03T04:00:00 4.2
2019-01-03T05:00:00 3.9
2019-01-03T06:00:00 3.7
2019-01-03T07:00:00 3.8
2019-01-03T08:00:00 3.7
2019-01-03T09:00:00 3.7
2019-01-03T10:00:00 4
2019-01-03T11:00:00 4.7
2019-01-03T12:00:00 5.4
2019-01-03T13:00:00 6.5
2019-01-03T14:00:00 7.6
2019-01-03T15:00:00 7.7
2019-01-03T16:00:00 7.3
2019-01-03T17:00:00 7.4
2019-01-03T18:00:00 8
2019-01-03T19:00:00 8.5
2019-01-03T20:00:00 8.1
2019-01-03T21:00:00 6.5
2019-01-03T22:00:00 5.6
2019-01-03T23:00:00 5.6
두 번째 열의 일일 평균을 계산하고 싶습니다.
출력은 다음과 같아야합니다 ...
01-01-2019 6.1
02-01-2019 5.1
03-01-2019 5.5
답변1
이상한 접근 방식:
$ awk '{
date=substr($1,1,10);
tot[date]+=$2;
num[date]++
}
END{
for(date in tot){
printf "%s %.1f\n", date,tot[date]/num[date]
}
}' file
2019-01-01 6.1
2019-01-02 5.1
2019-01-03 5.5
답변2
사용밀러
$ mlr --nidx --repifs put '
$1 = strftime(strptime($1,"%FT%T"),"%d-%m-%Y")
' then stats1 -a mean -f 2 -g 1 file
01-01-2019 6.070833
02-01-2019 5.075000
03-01-2019 5.458333
결과 형식을 지정하는 것은 Miller가 다소 부족한 영역인 것 같으므로 필요한 경우 numfmt
ex를 통해 결과를 파이핑하는 것이 좋습니다.
$ mlr --nidx --repifs put '
$1 = strftime(strptime($1,"%FT%T"),"%d-%m-%Y")
' then stats1 -a mean -f 2 -g 1 file | numfmt --field=2 --format='%.1f'
01-01-2019 6.1
02-01-2019 5.1
03-01-2019 5.5
또는 충분히 최신 버전의 GNU awk를 사용하여 날짜의 신기원 시간으로 및 배열을 mktime
색인화하는 데 사용합니다.sum
count
gawk '
{
split($1,dt,"[-T:]");
k = mktime(sprintf("%04d %02d %02d 00 00 00", dt[1], dt[2], dt[3]));
sum[k] += $2; count[k] += 1;
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc";
for(k in count) printf "%s %.1f\n", strftime("%d-%m-%Y",k), sum[k]/count[k];
}
' file
Python 기반에서 csvsql
/를 사용하는 또 다른 대안은 다음과 같습니다 .csvformat
csvkit
$ csvsql -d ' ' -HS --query '
SELECT strftime("%d-%m-%Y",date(a)) AS [Day], round(avg(b),1) AS [Avg] FROM file GROUP BY date(a)
' file | csvformat -T
/usr/lib/python3/dist-packages/agate/table/from_csv.py:88: RuntimeWarning: Column names not specified. "('a', 'b')" will be used as names.
Day Avg
01-01-2019 6.1
02-01-2019 5.1
03-01-2019 5.5