在多執行緒進程上取得效能統計資料的兩種方法中哪一種是正確的？

2024-6-11 • tag-icon

我正在研究多線程資料庫伺服器的效能。有一個特定的工作負載在特定機器上運行大約需要 61 秒。當我針對工作負載執行 perf 時，資料庫進程的 pid 為 79894。

除了資料庫伺服器中的軟體線程之外，還有許多與 Linux 相關的線程，這些線程通常在空閒系統上處於休眠狀態，但在我的工作負載運行時變得活躍。因此我想使用 perf 的 -a 選項以及 -p 選項。

我以兩種方式運行 perf，每種方式都會得到一些不同的結果。

我在一個視窗中運行以下 perf 命令的第一種方法

perf stat -p 2413 -a

並立即在另一個視窗中運行資料庫工作負載。當資料庫工作負載完成時，我控制 C 退出 perf 並得到以下結果

    Performance counter stats for process id '79894':

              1,842,359.55 msec cpu-clock                 #   30.061 CPUs utilized          
                 3,798,673      context-switches          #    0.002 M/sec                  
                   153,995      cpu-migrations            #    0.084 K/sec                  
                16,038,992      page-faults               #    0.009 M/sec                  
         4,939,131,149,436      cycles                    #    2.681 GHz                    
         3,924,220,386,428      stalled-cycles-frontend   #   79.45% frontend cycles idle   
         3,418,137,943,654      instructions              #    0.69  insn per cycle         
                                                          #    1.15  stalled cycles per insn
           402,389,588,237      branches                  #  218.410 M/sec                 
             5,137,510,170      branch-misses             #    1.28% of all branches  


     61.28834199 seconds time elapsed

第二種方法是運行

perf stat  -a  sleep 61

並立即在另一個視窗中運行資料庫工作負載。 61 秒後，perf 和工作負載均完成，perf 產生以下結果

 Performance counter stats for 'system wide':

      4,880,317.67 msec cpu-clock                 #   79.964 CPUs utilized          
         8,274,996      context-switches          #    0.002 M/sec                  
           202,832      cpu-migrations            #    0.042 K/sec                  
        14,605,246      page-faults               #    0.003 M/sec                  
 5,022,298,186,711      cycles                    #    1.029 GHz                    
 7,599,517,323,727      stalled-cycles-frontend   #  151.32% frontend cycles idle   
 3,421,512,233,294      instructions              #    0.68  insn per cycle         
                                                  #    2.22  stalled cycles per insn
   402,726,487,019      branches                  #   82.521 M/sec                  
     5,124,543,680      branch-misses             #    1.27% of all branches        

      61.031494851 seconds time elapsed

因為我在兩個版本中都使用了 -a，所以我預計會得到大致相同的結果。

但隨著睡眠，

cpu-clock is 2.5 times what you get with the -p version, 
context-switches are double what you get with the -p version  
and the other values are more or less the same

所以2個問題，

    (1) which set of results do I believe?
and 
    (2) how can there be more stalled-cycles-frontend than cycles in the sleep version?

相關內容