
我正在使用一個名為 starcluster 的工具http://star.mit.edu/cluster在亞馬遜雲端啟動 SGE 配置的叢集。問題是,它似乎沒有配置任何預設的可消耗資源,除了 SLOTS,我似乎無法直接使用qsub -l slots=X
.每次啟動叢集時,我可能會要求不同類型的 EC2 節點,因此預先配置此插槽資源這一事實非常好。我可以使用預先配置的平行環境請求一定數量的槽,但問題是它是為 MPI 設定的,因此使用該平行環境請求槽有時會授予分佈在多個運算節點上的作業槽。
有沒有辦法 1) 創建一個並行環境,利用 starcluster 在單一節點上請求插槽時設置的現有預配置 HOST=X 插槽設置,或者 2) 使用 SGE 的某種資源是自動意識到的嗎?運行qhost
讓我想到,即使 和NCPU
沒有MEMTOT
在我能看到的任何地方定義,SGE 在某種程度上知道這些資源,是否有一些設置可以讓我可以請求這些資源,而無需明確定義每個資源有多少可用?
謝謝你的時間!
qhost
輸出:
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
master linux-x64 2 0.01 7.3G 167.4M 0.0 0.0
node001 linux-x64 2 0.01 7.3G 139.6M 0.0 0.0
qconf -mc
輸出:
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------
arch a RESTRING == YES NO NONE 0
calendar c RESTRING == YES NO NONE 0
cpu cpu DOUBLE >= YES NO 0 0
display_win_gui dwg BOOL == YES NO 0 0
h_core h_core MEMORY <= YES NO 0 0
h_cpu h_cpu TIME <= YES NO 0:0:0 0
h_data h_data MEMORY <= YES NO 0 0
h_fsize h_fsize MEMORY <= YES NO 0 0
h_rss h_rss MEMORY <= YES NO 0 0
h_rt h_rt TIME <= YES NO 0:0:0 0
h_stack h_stack MEMORY <= YES NO 0 0
h_vmem h_vmem MEMORY <= YES NO 0 0
hostname h HOST == YES NO NONE 0
load_avg la DOUBLE >= NO NO 0 0
load_long ll DOUBLE >= NO NO 0 0
load_medium lm DOUBLE >= NO NO 0 0
load_short ls DOUBLE >= NO NO 0 0
m_core core INT <= YES NO 0 0
m_socket socket INT <= YES NO 0 0
m_topology topo RESTRING == YES NO NONE 0
m_topology_inuse utopo RESTRING == YES NO NONE 0
mem_free mf MEMORY <= YES NO 0 0
mem_total mt MEMORY <= YES NO 0 0
mem_used mu MEMORY >= YES NO 0 0
min_cpu_interval mci TIME <= NO NO 0:0:0 0
np_load_avg nla DOUBLE >= NO NO 0 0
np_load_long nll DOUBLE >= NO NO 0 0
np_load_medium nlm DOUBLE >= NO NO 0 0
np_load_short nls DOUBLE >= NO NO 0 0
num_proc p INT == YES NO 0 0
qname q RESTRING == YES NO NONE 0
rerun re BOOL == NO NO 0 0
s_core s_core MEMORY <= YES NO 0 0
s_cpu s_cpu TIME <= YES NO 0:0:0 0
s_data s_data MEMORY <= YES NO 0 0
s_fsize s_fsize MEMORY <= YES NO 0 0
s_rss s_rss MEMORY <= YES NO 0 0
s_rt s_rt TIME <= YES NO 0:0:0 0
s_stack s_stack MEMORY <= YES NO 0 0
s_vmem s_vmem MEMORY <= YES NO 0 0
seq_no seq INT == NO NO 0 0
slots s INT <= YES YES 1 1000
swap_free sf MEMORY <= YES NO 0 0
swap_rate sr MEMORY >= YES NO 0 0
swap_rsvd srsv MEMORY >= YES NO 0 0
qconf -me master
輸出(以其中一個節點為例):
hostname master
load_scaling NONE
complex_values NONE
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
qconf -msconf
輸出:
algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method load
job_load_adjustments np_load_avg=0.50
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info false
flush_submit_sec 0
flush_finish_sec 0
params none
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 0
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 0.010000
weight_waiting_time 0.000000
weight_deadline 3600000.000000
weight_urgency 0.100000
weight_priority 1.000000
max_reservation 0
default_duration INFINITY
qconf -mq all.q
輸出:
qname all.q
hostlist @allhosts
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make orte
rerun FALSE
slots 1,[master=2],[node001=2]
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
答案1
我找到的解決方案是建立一個具有$pe_slots
分配規則的新平行環境(請參閱 參考資料man sge_pe
)。我將該並行環境可用的插槽數量設定為等於最大值,因為$pe_slots
限制了每個節點的插槽使用量。由於 starcluster 在叢集啟動時設定插槽,因此這似乎可以很好地實現這一目的。您還需要將新的並行環境新增到佇列中。所以為了讓這件事變得簡單:
qconf -ap by_node
這是我編輯文件後的內容:
pe_name by_node
slots 9999999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
也要修改佇列(all.q
由 starcluster 呼叫)以將這個新的平行環境新增到清單中。
qconf -mq all.q
並更改這一行:
pe_list make orte
對此:
pe_list make orte by_node
我擔心從給定作業產生的作業將僅限於單一節點,但情況似乎並非如此。我有一個包含兩個節點的集群,每個節點有兩個插槽。
我製作了一個測試文件,如下所示:
#!/bin/bash
qsub -b y -pe by_node 2 -cwd sleep 100
sleep 100
並像這樣執行它:
qsub -V -pe by_node 2 test.sh
過了一會兒,qstat
顯示兩個作業在不同的節點上運行:
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
25 0.55500 test root r 10/17/2012 21:42:57 all.q@master 2
26 0.55500 sleep root r 10/17/2012 21:43:12 all.q@node001 2
我還測試了一次提交 3 個作業,在單一節點上請求相同數量的槽,並且一次僅運行兩個作業,每個節點一個。所以這似乎是正確設定的!