計算多個資料列的每小時平均值

Question 1

#!/usr/bin/perl

use strict;

my $prev = '';
my (@sums,@avg) = ();
my $count = 0;

while(<>) {
  chomp;
  if (m/^Timestamp/) {
    my @headers = split /,/;
    # insert "Ave_" at start of each header
    @headers = map { "Ave_" . $_ } @headers;
    # replace Timestamp header with Date,Hour headers.
    splice @headers,0,1,qw(Date Hour);
    print join(",",@headers), "\n";
    next;
  };

  my (@data) = split /,/;
  # extract and remove date and hour from first element of @data
  (my $current = shift @data) =~  s/^(.*) (\d\d):.*$/$1,$2/;

  if ($count == 0 || $current eq $prev) {
    # add each field in @data to the same field in @sums
    foreach my $i (0..$#data) { $sums[$i] += $data[$i] };
    $prev = $current;
    $count++;
    next unless eof;
  };

  # calculate and print the averages for the previous hour
  foreach my $i (0..$#sums) { $avg[$i] = $sums[$i] / $count };
  print join(",", $prev, @avg), "\n";

  # special case handling for when there's a new date/hour on the
  # last line of file (otherwise it wouldn't get printed)
  if (eof && $prev ne $current) {
    print join(",", $current, @data), "\n";
  };

  @sums = @data;
  @avg = ();
  $prev = $current;
  $count = 1;
};

這應該適用於任意數量的資料欄位。

另存為，例如，average.pl使其可執行chmod +x average.pl並運行如下：

$ ./average.pl input.csv 
Date,Hour,Ave_data1,Ave_data2
2018 07 16,13,24.8,453
2018 07 16,14,18,457
2018 07 16,15,234,459
2018 07 16,17,23,845
2018 07 16,18,239,453
2018 07 17,10,29,452
2018 07 18,13,49,451
2018 07 19,13,28,456

`map`關於 perl 以及循環和迭代器的額外有趣的（IMO）內容：

僅供參考，foreach my $i ...可以重寫循環以使用 perl 的map函數（請參閱perldoc -f map，但簡而言之：map迭代列表，對每個元素執行操作，並返回新生成的列表或該生成列表中元素的計數）。這是更慣用的 Perl 語言，但對於新的 Perl 程式設計師來說可能更難理解。例如

     foreach my $i (0..$#data) { $sums[$i] += $data[$i] };

could be written as:

     @sums = map { $sums[$_] + $data[$_] } 0..$#data;

這兩個都迭代指數@data 數組 ( 0..$#data) 的。 for 迴圈直接建立/修改 @sums 的元素，而map傳回一個新的 sum 數組，然後將其指派給 @sums 陣列。

該函數不使用$i迭代器變量，而是map自動建立並使用名為的（本地化）標量變數$_。 $_在 perl 中隨處使用，並且在未提供參數時是大多數函數的隱式（即預設）參數。例如，print沒有參數實際上是print $_，並且split /,/實際上是split /,/, $_。它對於模式比對運算子也是隱式的，例如s/foo/baris 實際上$_ =~ s/foo/bar/。

類似地，while (<>)實際上是類似的while (defined($_ = <>))（即從輸入檔案或標準輸入中讀取一行，如果有任何內容要讀取，則將其分配給 $_ 並評估為 true。否則評估為 false 並結束循環while）。

$_通常被非正式地稱為“當前事物”或“事物”。查看man perlvar並蒐索\$_更多詳細資訊。還有一個等效的數組@_，用於傳遞給子例程的參數。

  foreach my $i (0..$#sums) { $avg[$i] = $sums[$i] / $count };

could be written as:

  @avg = map { $_ / $count } @sums;

在這裡，foreach循環迭代指數@sums ( 0..$#sums)，而map迭代價值觀數組的@sums。同樣，foreach循環直接修改數組的每個元素@avg，同時map傳回一個分配給的新數組@avg。

兩種形式在此腳本中產生相同的輸出，並且兩種形式都很有用，但是 Perl 程式設計師傾向於map隨著時間的推移使用它，因為它是迭代任何類型列表的通用工具。與執行相同操作的 for/foreach 迴圈相比，鍵入時間更短。因為，過了一段時間，用列表、陣列和雜湊來思考資料就會變得很自然。

它通常用於將數組轉換為哈希（或將哈希的值或鍵轉換為數組）。

順便說一句，map不必返回數組，其中的程式碼區塊{ ... }可以執行 perl 程式碼可以執行的任何操作，並且返回值可以被丟棄或（如果分配給標量變數）返回任何生成列表的計數。

例如，第一個 foreach 迴圈也可以寫成：

map { $sums[$_] += $data[$_] } 0..$#data;

這會直接修改@sums數組（就像foreach循環一樣），並且任何傳回值都會被丟棄（即不分配給任何變數）。

當然，第二個foreach循環也可以寫成：

map { $avg[$_] = $sums[$_] / $count } 0..$#sums;

Answer

#!/usr/bin/perl

use strict;

my $prev = '';
my (@sums,@avg) = ();
my $count = 0;

while(<>) {
  chomp;
  if (m/^Timestamp/) {
    my @headers = split /,/;
    # insert "Ave_" at start of each header
    @headers = map { "Ave_" . $_ } @headers;
    # replace Timestamp header with Date,Hour headers.
    splice @headers,0,1,qw(Date Hour);
    print join(",",@headers), "\n";
    next;
  };

  my (@data) = split /,/;
  # extract and remove date and hour from first element of @data
  (my $current = shift @data) =~  s/^(.*) (\d\d):.*$/$1,$2/;

  if ($count == 0 || $current eq $prev) {
    # add each field in @data to the same field in @sums
    foreach my $i (0..$#data) { $sums[$i] += $data[$i] };
    $prev = $current;
    $count++;
    next unless eof;
  };

  # calculate and print the averages for the previous hour
  foreach my $i (0..$#sums) { $avg[$i] = $sums[$i] / $count };
  print join(",", $prev, @avg), "\n";

  # special case handling for when there's a new date/hour on the
  # last line of file (otherwise it wouldn't get printed)
  if (eof && $prev ne $current) {
    print join(",", $current, @data), "\n";
  };

  @sums = @data;
  @avg = ();
  $prev = $current;
  $count = 1;
};

這應該適用於任意數量的資料欄位。

另存為，例如，average.pl使其可執行chmod +x average.pl並運行如下：

$ ./average.pl input.csv 
Date,Hour,Ave_data1,Ave_data2
2018 07 16,13,24.8,453
2018 07 16,14,18,457
2018 07 16,15,234,459
2018 07 16,17,23,845
2018 07 16,18,239,453
2018 07 17,10,29,452
2018 07 18,13,49,451
2018 07 19,13,28,456

`map`關於 perl 以及循環和迭代器的額外有趣的（IMO）內容：

僅供參考，foreach my $i ...可以重寫循環以使用 perl 的map函數（請參閱perldoc -f map，但簡而言之：map迭代列表，對每個元素執行操作，並返回新生成的列表或該生成列表中元素的計數）。這是更慣用的 Perl 語言，但對於新的 Perl 程式設計師來說可能更難理解。例如

     foreach my $i (0..$#data) { $sums[$i] += $data[$i] };

could be written as:

     @sums = map { $sums[$_] + $data[$_] } 0..$#data;

這兩個都迭代指數@data 數組 ( 0..$#data) 的。 for 迴圈直接建立/修改 @sums 的元素，而map傳回一個新的 sum 數組，然後將其指派給 @sums 陣列。

該函數不使用$i迭代器變量，而是map自動建立並使用名為的（本地化）標量變數$_。 $_在 perl 中隨處使用，並且在未提供參數時是大多數函數的隱式（即預設）參數。例如，print沒有參數實際上是print $_，並且split /,/實際上是split /,/, $_。它對於模式比對運算子也是隱式的，例如s/foo/baris 實際上$_ =~ s/foo/bar/。

類似地，while (<>)實際上是類似的while (defined($_ = <>))（即從輸入檔案或標準輸入中讀取一行，如果有任何內容要讀取，則將其分配給 $_ 並評估為 true。否則評估為 false 並結束循環while）。

$_通常被非正式地稱為“當前事物”或“事物”。查看man perlvar並蒐索\$_更多詳細資訊。還有一個等效的數組@_，用於傳遞給子例程的參數。

  foreach my $i (0..$#sums) { $avg[$i] = $sums[$i] / $count };

could be written as:

  @avg = map { $_ / $count } @sums;

在這裡，foreach循環迭代指數@sums ( 0..$#sums)，而map迭代價值觀數組的@sums。同樣，foreach循環直接修改數組的每個元素@avg，同時map傳回一個分配給的新數組@avg。

兩種形式在此腳本中產生相同的輸出，並且兩種形式都很有用，但是 Perl 程式設計師傾向於map隨著時間的推移使用它，因為它是迭代任何類型列表的通用工具。與執行相同操作的 for/foreach 迴圈相比，鍵入時間更短。因為，過了一段時間，用列表、陣列和雜湊來思考資料就會變得很自然。

它通常用於將數組轉換為哈希（或將哈希的值或鍵轉換為數組）。

順便說一句，map不必返回數組，其中的程式碼區塊{ ... }可以執行 perl 程式碼可以執行的任何操作，並且返回值可以被丟棄或（如果分配給標量變數）返回任何生成列表的計數。

例如，第一個 foreach 迴圈也可以寫成：

map { $sums[$_] += $data[$_] } 0..$#data;

這會直接修改@sums數組（就像foreach循環一樣），並且任何傳回值都會被丟棄（即不分配給任何變數）。

當然，第二個foreach循環也可以寫成：

map { $avg[$_] = $sums[$_] / $count } 0..$#sums;

Question 2

離開GNU awk：

#!/usr/bin/awk -f
BEGIN {
    FS=OFS=","
}

NR == 1 {
    # Build the header here
    for (i = 2; i <= NF; i++) oh = oh OFS "Ave_" $i
    
    print "Date", "Hour" oh
    next
}

{
    # Split date and time and build a timestamp with it.
    # Set MM and SS to 0 to aggregate data from the same hour
    split($1, a, " ")
    sub(/:.*/, "", a[4])
    ct = mktime(a[1] " " a[2] " " a[3] " " a[4] " 00 00")

    # If the 'current time' differ from the 'old time' then
    # do the average and print the line
    if (ct != ot && ot) {
        for (i in avg){
            avg_h = avg_h OFS (avg[i] / cnt[i])
            delete avg[i]
            delete cnt[i]
        }

        sub(/^,/, "", avg_h)
        print cd, ch, avg_h
        avg_h = ""
        saved = 0
    }

    j = 0
    for (i = 2; i <= NF; i++) {
        avg[j] += $i
        cnt[j++] += 1
    }

    # Do the assignment if and only something has changed
    if (!saved) {
        saved = 1
        ot = ct
        cd = a[1] " " a[2] " " a[3]
        ch = a[4]
    }
}

END {
    # There are something else? Print it
    for (i in avg)
        avg_h = avg_h OFS (avg[i] / cnt[i])

    sub(/^,/, "", avg_h)
    print cd, ch, avg_h
}

運行為：./script.awk data

Answer

離開GNU awk：

#!/usr/bin/awk -f
BEGIN {
    FS=OFS=","
}

NR == 1 {
    # Build the header here
    for (i = 2; i <= NF; i++) oh = oh OFS "Ave_" $i
    
    print "Date", "Hour" oh
    next
}

{
    # Split date and time and build a timestamp with it.
    # Set MM and SS to 0 to aggregate data from the same hour
    split($1, a, " ")
    sub(/:.*/, "", a[4])
    ct = mktime(a[1] " " a[2] " " a[3] " " a[4] " 00 00")

    # If the 'current time' differ from the 'old time' then
    # do the average and print the line
    if (ct != ot && ot) {
        for (i in avg){
            avg_h = avg_h OFS (avg[i] / cnt[i])
            delete avg[i]
            delete cnt[i]
        }

        sub(/^,/, "", avg_h)
        print cd, ch, avg_h
        avg_h = ""
        saved = 0
    }

    j = 0
    for (i = 2; i <= NF; i++) {
        avg[j] += $i
        cnt[j++] += 1
    }

    # Do the assignment if and only something has changed
    if (!saved) {
        saved = 1
        ot = ct
        cd = a[1] " " a[2] " " a[3]
        ch = a[4]
    }
}

END {
    # There are something else? Print it
    for (i in avg)
        avg_h = avg_h OFS (avg[i] / cnt[i])

    sub(/^,/, "", avg_h)
    print cd, ch, avg_h
}

運行為：./script.awk data

計算多個資料列的每小時平均值

答案1

`map`關於 perl 以及循環和迭代器的額外有趣的（IMO）內容：

答案2

相關內容

答案1

map關於 perl 以及循環和迭代器的額外有趣的（IMO）內容：

答案2

相關內容

`map`關於 perl 以及循環和迭代器的額外有趣的（IMO）內容：