¿Cómo calcular el valor promedio (por columna) de los datos que tienen el mismo ID en la primera columna?

¿Cómo calcular el valor promedio (por columna) de los datos que tienen el mismo ID en la primera columna?

Quiero calcular el valor promedio de mis datos cuando tienen el mismo ID en la primera columna. Quiero calcular los valores promedio para las columnas restantes que tienen el mismo valor en la primera columna. Sé cómo hacerlo para una columna, pero tengo dificultades para hacer el promedio de todas las columnas que tengo.

Aquí están mis datos de muestra Nombre de archivo: abraham

year    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
1929    7.5 0   7.78    3.695   1.139   0.186   -0.074  -0.205  -0.323  -0.434  -0.539  -0.639  -0.735  -0.916  -1.085  -1.241  -1.387  -1.522  -1.818  -2.058  -2.247  -2.389  -2.488  -2.551  -2.582  -2.521  -2.404  -2.262  -2.118  -1.987  -1.872  -1.775  -1.691  -1.628  -1.578  -1.484  -1.395  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1929    6.11    0   6.112   3.536   1.227   0.252   -0.063  -0.201  -0.32   -0.43   -0.535  -0.634  -0.73   -0.91   -1.078  -1.234  -1.379  -1.512  -1.806  -2.046  -2.236  -2.378  -2.478  -2.54   -2.573  -2.515  -2.402  -2.262  -2.119  -1.988  -1.873  -1.776  -1.692  -1.629  -1.578  -1.484  -1.395  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1929    2.22    0   2.227   2.542   1.022   0.222   -0.058  -0.198  -0.317  -0.427  -0.531  -0.63   -0.725  -0.905  -1.072  -1.227  -1.371  -1.505  -1.798  -2.038  -2.227  -2.37   -2.471  -2.533  -2.567  -2.512  -2.4    -2.262  -2.12   -1.989  -1.874  -1.776  -1.693  -1.63   -1.579  -1.485  -1.395  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1929    2.78    0   2.779   1.993   0.819   0.18    -0.056  -0.195  -0.314  -0.424  -0.527  -0.626  -0.721  -0.9    -1.066  -1.221  -1.364  -1.498  -1.79   -2.03   -2.219  -2.362  -2.464  -2.526  -2.562  -2.509  -2.399  -2.262  -2.12   -1.99   -1.875  -1.777  -1.693  -1.63   -1.579  -1.485  -1.396  -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    0.83    0   0.833   0.85    0.354   0.073   -0.052  -0.187  -0.305  -0.414  -0.517  -0.614  -0.708  -0.885  -1.049  -1.202  -1.344  -1.477  -1.768  -2.006  -2.196  -2.34   -2.442  -2.508  -2.549  -2.5    -2.395  -2.261  -2.122  -1.992  -1.877  -1.779  -1.695  -1.631  -1.58   -1.485  -1.396  -1.311  -1.23   -1.151  -1.076  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    -0.28   0   -0.278  0.461   0.242   0.049   -0.052  -0.185  -0.302  -0.411  -0.513  -0.61   -0.704  -0.88   -1.044  -1.196  -1.338  -1.47   -1.761  -1.999  -2.188  -2.332  -2.435  -2.502  -2.544  -2.498  -2.394  -2.261  -2.122  -1.992  -1.878  -1.779  -1.695  -1.631  -1.581  -1.486  -1.396  -1.312  -1.23   -1.152  -1.076  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    0.56    0.002   0.511   0.348   0.165   0.033   -0.052  -0.183  -0.299  -0.407  -0.509  -0.607  -0.7    -0.875  -1.038  -1.19   -1.332  -1.463  -1.753  -1.991  -2.18   -2.325  -2.429  -2.496  -2.54   -2.495  -2.393  -2.261  -2.123  -1.993  -1.878  -1.78   -1.696  -1.632  -1.581  -1.486  -1.396  -1.312  -1.23   -1.152  -1.076  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    3.89    0   3.884   0.96    0.243   0.041   -0.052  -0.181  -0.297  -0.404  -0.506  -0.603  -0.696  -0.87   -1.033  -1.184  -1.325  -1.457  -1.746  -1.983  -2.173  -2.318  -2.422  -2.49   -2.536  -2.492  -2.391  -2.26   -2.123  -1.994  -1.879  -1.78   -1.696  -1.632  -1.581  -1.486  -1.397  -1.312  -1.23   -1.152  -1.076  -1.002  -0.86   -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.558
1931    -7.22   0.057   -4.563  -0.299  0.035   0.007   -0.054  -0.175  -0.286  -0.391  -0.49   -0.585  -0.676  -0.847  -1.007  -1.156  -1.295  -1.425  -1.711  -1.947  -2.136  -2.282  -2.389  -2.46   -2.514  -2.478  -2.384  -2.259  -2.125  -1.996  -1.882  -1.783  -1.698  -1.633  -1.582  -1.487  -1.397  -1.312  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
1931    -6.11   0.057   -4.1    -0.663  0.004   -0.001  -0.055  -0.174  -0.284  -0.388  -0.487  -0.581  -0.672  -0.843  -1.002  -1.151  -1.289  -1.419  -1.704  -1.94   -2.129  -2.275  -2.382  -2.454  -2.51   -2.476  -2.383  -2.258  -2.125  -1.997  -1.883  -1.783  -1.698  -1.633  -1.583  -1.487  -1.397  -1.312  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
1931    -10.28  0.057   -6.362  -1.527  0   -0.001  -0.055  -0.173  -0.283  -0.386  -0.484  -0.578  -0.668  -0.839  -0.997  -1.145  -1.283  -1.413  -1.697  -1.933  -2.122  -2.268  -2.376  -2.448  -2.506  -2.473  -2.382  -2.258  -2.125  -1.998  -1.883  -1.784  -1.698  -1.633  -1.583  -1.487  -1.398  -1.312  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558
1931    -10 0.057   -7.018  -2.259  -0.001  -0.001  -0.056  -0.172  -0.281  -0.384  -0.481  -0.575  -0.665  -0.834  -0.992  -1.14   -1.278  -1.406  -1.691  -1.926  -2.115  -2.261  -2.369  -2.443  -2.501  -2.47   -2.38   -2.258  -2.125  -1.998  -1.884  -1.784  -1.699  -1.634  -1.583  -1.487  -1.398  -1.313  -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.588  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558

salida deseada

year    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
1929    4.6525  0   4.7245  2.9415  1.05175 0.21    -0.06275    -0.19975    -0.3185 -0.42875    -0.533  -0.63225    -0.72775    -0.90775    -1.07525    -1.23075    -1.37525    -1.50925    -1.803  -2.043  -2.23225    -2.37475    -2.47525    -2.5375 -2.571  -2.51425    -2.40125    -2.262  -2.11925    -1.9885 -1.8735 -1.776  -1.69225    -1.62925    -1.5785 -1.4845 -1.39525    -1.311  -1.229  -1.151  -1.075  -1.002  -0.859  -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.559
1930    1.25    0.0005  1.2375  0.65475 0.251   0.049   -0.052  -0.184  -0.30075    -0.409  -0.51125    -0.6085 -0.702  -0.8775 -1.041  -1.193  -1.33475    -1.46675    -1.757  -1.99475    -2.18425    -2.32875    -2.432  -2.499  -2.54225    -2.49625    -2.39325    -2.26075    -2.1225 -1.99275    -1.878  -1.7795 -1.6955 -1.6315 -1.58075    -1.48575    -1.39625    -1.31175    -1.23   -1.15175    -1.076  -1.002  -0.85925    -0.721  -0.587  -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.308   0.445   0.55875
1931    -8.4025 0.057   -5.51075    -1.187  0.0095  0.001   -0.055  -0.1735 -0.2835 -0.38725    -0.4855 -0.57975    -0.67025    -0.84075    -0.9995 -1.148  -1.28625    -1.41575    -1.70075    -1.9365 -2.1255 -2.2715 -2.379  -2.45125    -2.50775    -2.47425    -2.38225    -2.25825    -2.125  -1.99725    -1.883  -1.7835 -1.69825    -1.63325    -1.58275    -1.487  -1.3975 -1.31225    -1.231  -1.152  -1.076  -1.003  -0.86   -0.722  -0.58725    -0.456  -0.328  -0.171  -0.017  0.068   0.151   0.307   0.444   0.558

Respuesta1

Aquí hay una solución Perl. No espera que se ordenen los años, pero sí necesita leer el archivo completo (bueno, al menos las sumas de cada columna) en la memoria. Siempre que su archivo no sea demasiado grande, debería hacer lo que desee:

#!/usr/bin/env perl

## Read the input file, line by line
while (<>){
    ## Print and skip the 1st line
    if($.==1){
    print; 
    next
    }; 
    ## Split the fileds into the array @F
    @F=split(/\s+/);

    ## $k (%k) is a hash of hashes. The first element of 
    ## $F ($F[0]) is the year, so $k{$year}{tot} will be 
    ## the nuber of lines for each year.
    $k{$F[0]}{tot}++;

    ## Go through each field add its value
    ## to $k{$year}{field number}
    for($i=1;$i<=$#F;$i++){
    $k{$F[0]}{$i}+=$F[$i];
    }
 }
## Now print the data for each year
foreach $y (sort keys(%k)){
    ## $y is the year
    print "$y "; 
    ## $k{$y}{$i} is the sum of the values for this column/year
    ## and $k{$y}{tot} is the number of lines for this year
    for($i=1;$i<=$#F;$i++){
    print " " . $k{$y}{$i}/$k{$y}{tot};

    }
    print "\n";
}

Y aquí está el mismo guión en una sola línea (más bien larga):

$ perl -ane '$.==1 && do {print; next}; $k{$F[0]}{tot}++;
           for($i=1;$i<=$#F;$i++){$k{$F[0]}{$i}+=$F[$i];}
           END{foreach $y (sort keys(%k)){
                    print "$y "; 
                    for($i=1;$i<=$#F;$i++){
                      print " " . $k{$y}{$i}/$k{$y}{tot};
                    } print "\n";} }' abraham 
year    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
1929  4.6525 0 4.7245 2.9415 1.05175 0.21 -0.06275 -0.19975 -0.3185 -0.42875 -0.533 -0.63225 -0.72775 -0.90775 -1.07525 -1.23075 -1.37525 -1.50925 -1.803 -2.043 -2.23225 -2.37475 -2.47525 -2.5375 -2.571 -2.51425 -2.40125 -2.262 -2.11925 -1.9885 -1.8735 -1.776 -1.69225 -1.62925 -1.5785 -1.4845 -1.39525 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930  1.25 0.0005 1.2375 0.65475 0.251 0.049 -0.052 -0.184 -0.30075 -0.409 -0.51125 -0.6085 -0.702 -0.8775 -1.041 -1.193 -1.33475 -1.46675 -1.757 -1.99475 -2.18425 -2.32875 -2.432 -2.499 -2.54225 -2.49625 -2.39325 -2.26075 -2.1225 -1.99275 -1.878 -1.7795 -1.6955 -1.6315 -1.58075 -1.48575 -1.39625 -1.31175 -1.23 -1.15175 -1.076 -1.002 -0.85925 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.55875
1931  -8.4025 0.057 -5.51075 -1.187 0.0095 0.001 -0.055 -0.1735 -0.2835 -0.38725 -0.4855 -0.57975 -0.67025 -0.84075 -0.9995 -1.148 -1.28625 -1.41575 -1.70075 -1.9365 -2.1255 -2.2715 -2.379 -2.45125 -2.50775 -2.47425 -2.38225 -2.25825 -2.125 -1.99725 -1.883 -1.7835 -1.69825 -1.63325 -1.58275 -1.487 -1.3975 -1.31225 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.58725 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558

Respuesta2

Puede usar una matriz para almacenar una suma de los valores de cada columna y, cuando $1 cambie, obtenga el promedio a partir de ahí.

# print header
NR==1 {
    print
    next
}

# print average of each column per year
#  then, reset columns sums and number of lines
function print_sum() {
    printf prev
    # needs GNU awk, for length of array
    for (i=2; i < length(sum) + 2; i++) {
            printf FS sum[i]/nlines
            sum[i] = 0
    }
    printf ORS
    nlines = 0
}

# print average when $1 changes, but not the first time
# also, on end of script
NR>2 && prev!=$1 { print_sum() }
END              { print_sum() }

# for every line with the same $1, sum column values, increment number of lines
{
    prev=$1;
    nlines++
    for (i=2; i <= NF; i++) {
            sum[i]+=$i
    }
}

Respuesta3

Congnu datamash:

{ head -n 1; datamash -s -g 1 $(printf 'mean %s ' {2..54}); } <infile

¿Dónde 54está el número de columnas? Esto supone que sus campos están separados por tabulaciones. Si están separados por espacios, ejecute

{ head -n 1; datamash -t ' ' -s -g 1 $(printf 'mean %s ' {2..54}); } <infile

o si están separados por varios espacios en blanco (espacios/tabulaciones):

{ head -n 1; datamash -W -s -g 1 $(printf 'mean %s ' {2..54}); } <infile

Tenga en cuenta que la salida se ordenará por la primera columna.

información relacionada