Quiero calcular el valor promedio de mis datos cuando tienen el mismo ID en la primera columna. Quiero calcular los valores promedio para las columnas restantes que tienen el mismo valor en la primera columna. Sé cómo hacerlo para una columna, pero tengo dificultades para hacer el promedio de todas las columnas que tengo.
Aquí están mis datos de muestra Nombre de archivo: abraham
year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1929 7.5 0 7.78 3.695 1.139 0.186 -0.074 -0.205 -0.323 -0.434 -0.539 -0.639 -0.735 -0.916 -1.085 -1.241 -1.387 -1.522 -1.818 -2.058 -2.247 -2.389 -2.488 -2.551 -2.582 -2.521 -2.404 -2.262 -2.118 -1.987 -1.872 -1.775 -1.691 -1.628 -1.578 -1.484 -1.395 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1929 6.11 0 6.112 3.536 1.227 0.252 -0.063 -0.201 -0.32 -0.43 -0.535 -0.634 -0.73 -0.91 -1.078 -1.234 -1.379 -1.512 -1.806 -2.046 -2.236 -2.378 -2.478 -2.54 -2.573 -2.515 -2.402 -2.262 -2.119 -1.988 -1.873 -1.776 -1.692 -1.629 -1.578 -1.484 -1.395 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1929 2.22 0 2.227 2.542 1.022 0.222 -0.058 -0.198 -0.317 -0.427 -0.531 -0.63 -0.725 -0.905 -1.072 -1.227 -1.371 -1.505 -1.798 -2.038 -2.227 -2.37 -2.471 -2.533 -2.567 -2.512 -2.4 -2.262 -2.12 -1.989 -1.874 -1.776 -1.693 -1.63 -1.579 -1.485 -1.395 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1929 2.78 0 2.779 1.993 0.819 0.18 -0.056 -0.195 -0.314 -0.424 -0.527 -0.626 -0.721 -0.9 -1.066 -1.221 -1.364 -1.498 -1.79 -2.03 -2.219 -2.362 -2.464 -2.526 -2.562 -2.509 -2.399 -2.262 -2.12 -1.99 -1.875 -1.777 -1.693 -1.63 -1.579 -1.485 -1.396 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 0.83 0 0.833 0.85 0.354 0.073 -0.052 -0.187 -0.305 -0.414 -0.517 -0.614 -0.708 -0.885 -1.049 -1.202 -1.344 -1.477 -1.768 -2.006 -2.196 -2.34 -2.442 -2.508 -2.549 -2.5 -2.395 -2.261 -2.122 -1.992 -1.877 -1.779 -1.695 -1.631 -1.58 -1.485 -1.396 -1.311 -1.23 -1.151 -1.076 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 -0.28 0 -0.278 0.461 0.242 0.049 -0.052 -0.185 -0.302 -0.411 -0.513 -0.61 -0.704 -0.88 -1.044 -1.196 -1.338 -1.47 -1.761 -1.999 -2.188 -2.332 -2.435 -2.502 -2.544 -2.498 -2.394 -2.261 -2.122 -1.992 -1.878 -1.779 -1.695 -1.631 -1.581 -1.486 -1.396 -1.312 -1.23 -1.152 -1.076 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 0.56 0.002 0.511 0.348 0.165 0.033 -0.052 -0.183 -0.299 -0.407 -0.509 -0.607 -0.7 -0.875 -1.038 -1.19 -1.332 -1.463 -1.753 -1.991 -2.18 -2.325 -2.429 -2.496 -2.54 -2.495 -2.393 -2.261 -2.123 -1.993 -1.878 -1.78 -1.696 -1.632 -1.581 -1.486 -1.396 -1.312 -1.23 -1.152 -1.076 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 3.89 0 3.884 0.96 0.243 0.041 -0.052 -0.181 -0.297 -0.404 -0.506 -0.603 -0.696 -0.87 -1.033 -1.184 -1.325 -1.457 -1.746 -1.983 -2.173 -2.318 -2.422 -2.49 -2.536 -2.492 -2.391 -2.26 -2.123 -1.994 -1.879 -1.78 -1.696 -1.632 -1.581 -1.486 -1.397 -1.312 -1.23 -1.152 -1.076 -1.002 -0.86 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.558
1931 -7.22 0.057 -4.563 -0.299 0.035 0.007 -0.054 -0.175 -0.286 -0.391 -0.49 -0.585 -0.676 -0.847 -1.007 -1.156 -1.295 -1.425 -1.711 -1.947 -2.136 -2.282 -2.389 -2.46 -2.514 -2.478 -2.384 -2.259 -2.125 -1.996 -1.882 -1.783 -1.698 -1.633 -1.582 -1.487 -1.397 -1.312 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
1931 -6.11 0.057 -4.1 -0.663 0.004 -0.001 -0.055 -0.174 -0.284 -0.388 -0.487 -0.581 -0.672 -0.843 -1.002 -1.151 -1.289 -1.419 -1.704 -1.94 -2.129 -2.275 -2.382 -2.454 -2.51 -2.476 -2.383 -2.258 -2.125 -1.997 -1.883 -1.783 -1.698 -1.633 -1.583 -1.487 -1.397 -1.312 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
1931 -10.28 0.057 -6.362 -1.527 0 -0.001 -0.055 -0.173 -0.283 -0.386 -0.484 -0.578 -0.668 -0.839 -0.997 -1.145 -1.283 -1.413 -1.697 -1.933 -2.122 -2.268 -2.376 -2.448 -2.506 -2.473 -2.382 -2.258 -2.125 -1.998 -1.883 -1.784 -1.698 -1.633 -1.583 -1.487 -1.398 -1.312 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
1931 -10 0.057 -7.018 -2.259 -0.001 -0.001 -0.056 -0.172 -0.281 -0.384 -0.481 -0.575 -0.665 -0.834 -0.992 -1.14 -1.278 -1.406 -1.691 -1.926 -2.115 -2.261 -2.369 -2.443 -2.501 -2.47 -2.38 -2.258 -2.125 -1.998 -1.884 -1.784 -1.699 -1.634 -1.583 -1.487 -1.398 -1.313 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.588 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
salida deseada
year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1929 4.6525 0 4.7245 2.9415 1.05175 0.21 -0.06275 -0.19975 -0.3185 -0.42875 -0.533 -0.63225 -0.72775 -0.90775 -1.07525 -1.23075 -1.37525 -1.50925 -1.803 -2.043 -2.23225 -2.37475 -2.47525 -2.5375 -2.571 -2.51425 -2.40125 -2.262 -2.11925 -1.9885 -1.8735 -1.776 -1.69225 -1.62925 -1.5785 -1.4845 -1.39525 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 1.25 0.0005 1.2375 0.65475 0.251 0.049 -0.052 -0.184 -0.30075 -0.409 -0.51125 -0.6085 -0.702 -0.8775 -1.041 -1.193 -1.33475 -1.46675 -1.757 -1.99475 -2.18425 -2.32875 -2.432 -2.499 -2.54225 -2.49625 -2.39325 -2.26075 -2.1225 -1.99275 -1.878 -1.7795 -1.6955 -1.6315 -1.58075 -1.48575 -1.39625 -1.31175 -1.23 -1.15175 -1.076 -1.002 -0.85925 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.55875
1931 -8.4025 0.057 -5.51075 -1.187 0.0095 0.001 -0.055 -0.1735 -0.2835 -0.38725 -0.4855 -0.57975 -0.67025 -0.84075 -0.9995 -1.148 -1.28625 -1.41575 -1.70075 -1.9365 -2.1255 -2.2715 -2.379 -2.45125 -2.50775 -2.47425 -2.38225 -2.25825 -2.125 -1.99725 -1.883 -1.7835 -1.69825 -1.63325 -1.58275 -1.487 -1.3975 -1.31225 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.58725 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
Respuesta1
Aquí hay una solución Perl. No espera que se ordenen los años, pero sí necesita leer el archivo completo (bueno, al menos las sumas de cada columna) en la memoria. Siempre que su archivo no sea demasiado grande, debería hacer lo que desee:
#!/usr/bin/env perl
## Read the input file, line by line
while (<>){
## Print and skip the 1st line
if($.==1){
print;
next
};
## Split the fileds into the array @F
@F=split(/\s+/);
## $k (%k) is a hash of hashes. The first element of
## $F ($F[0]) is the year, so $k{$year}{tot} will be
## the nuber of lines for each year.
$k{$F[0]}{tot}++;
## Go through each field add its value
## to $k{$year}{field number}
for($i=1;$i<=$#F;$i++){
$k{$F[0]}{$i}+=$F[$i];
}
}
## Now print the data for each year
foreach $y (sort keys(%k)){
## $y is the year
print "$y ";
## $k{$y}{$i} is the sum of the values for this column/year
## and $k{$y}{tot} is the number of lines for this year
for($i=1;$i<=$#F;$i++){
print " " . $k{$y}{$i}/$k{$y}{tot};
}
print "\n";
}
Y aquí está el mismo guión en una sola línea (más bien larga):
$ perl -ane '$.==1 && do {print; next}; $k{$F[0]}{tot}++;
for($i=1;$i<=$#F;$i++){$k{$F[0]}{$i}+=$F[$i];}
END{foreach $y (sort keys(%k)){
print "$y ";
for($i=1;$i<=$#F;$i++){
print " " . $k{$y}{$i}/$k{$y}{tot};
} print "\n";} }' abraham
year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1929 4.6525 0 4.7245 2.9415 1.05175 0.21 -0.06275 -0.19975 -0.3185 -0.42875 -0.533 -0.63225 -0.72775 -0.90775 -1.07525 -1.23075 -1.37525 -1.50925 -1.803 -2.043 -2.23225 -2.37475 -2.47525 -2.5375 -2.571 -2.51425 -2.40125 -2.262 -2.11925 -1.9885 -1.8735 -1.776 -1.69225 -1.62925 -1.5785 -1.4845 -1.39525 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 1.25 0.0005 1.2375 0.65475 0.251 0.049 -0.052 -0.184 -0.30075 -0.409 -0.51125 -0.6085 -0.702 -0.8775 -1.041 -1.193 -1.33475 -1.46675 -1.757 -1.99475 -2.18425 -2.32875 -2.432 -2.499 -2.54225 -2.49625 -2.39325 -2.26075 -2.1225 -1.99275 -1.878 -1.7795 -1.6955 -1.6315 -1.58075 -1.48575 -1.39625 -1.31175 -1.23 -1.15175 -1.076 -1.002 -0.85925 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.55875
1931 -8.4025 0.057 -5.51075 -1.187 0.0095 0.001 -0.055 -0.1735 -0.2835 -0.38725 -0.4855 -0.57975 -0.67025 -0.84075 -0.9995 -1.148 -1.28625 -1.41575 -1.70075 -1.9365 -2.1255 -2.2715 -2.379 -2.45125 -2.50775 -2.47425 -2.38225 -2.25825 -2.125 -1.99725 -1.883 -1.7835 -1.69825 -1.63325 -1.58275 -1.487 -1.3975 -1.31225 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.58725 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
Respuesta2
Puede usar una matriz para almacenar una suma de los valores de cada columna y, cuando $1 cambie, obtenga el promedio a partir de ahí.
# print header
NR==1 {
print
next
}
# print average of each column per year
# then, reset columns sums and number of lines
function print_sum() {
printf prev
# needs GNU awk, for length of array
for (i=2; i < length(sum) + 2; i++) {
printf FS sum[i]/nlines
sum[i] = 0
}
printf ORS
nlines = 0
}
# print average when $1 changes, but not the first time
# also, on end of script
NR>2 && prev!=$1 { print_sum() }
END { print_sum() }
# for every line with the same $1, sum column values, increment number of lines
{
prev=$1;
nlines++
for (i=2; i <= NF; i++) {
sum[i]+=$i
}
}
Respuesta3
Congnu
datamash
:
{ head -n 1; datamash -s -g 1 $(printf 'mean %s ' {2..54}); } <infile
¿Dónde 54
está el número de columnas? Esto supone que sus campos están separados por tabulaciones. Si están separados por espacios, ejecute
{ head -n 1; datamash -t ' ' -s -g 1 $(printf 'mean %s ' {2..54}); } <infile
o si están separados por varios espacios en blanco (espacios/tabulaciones):
{ head -n 1; datamash -W -s -g 1 $(printf 'mean %s ' {2..54}); } <infile
Tenga en cuenta que la salida se ordenará por la primera columna.