forked from jcftang/slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES
166 lines (127 loc) · 5.97 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
RELEASE NOTES FOR SLURM VERSION 2.3
10 January 2011
IMPORTANT NOTE:
If using the slurmdbd (SLURM DataBase Daemon) you must update this first.
The 2.3 slurmdbd will work with SLURM daemons of version 2.1.3 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the mysqld.
This is needed when converting large tables over to the new database schema.
SLURM can be upgraded from version 2.2 to version 2.3 without loss of jobs or
other state information.
HIGHLIGHTS
==========
* Support is provided for Cray XT and XE computers
* Support is provided for BlueGene/Q computers.
* For architectures where the slurmd daemon executes on front end nodes (Cray
and BlueGene systems) more than one slurmd daemon may be executed using more
than one front end node for improved fault-tolerance and performance.
NOTE: The slurmctld daemon will report the lack of a front_end_state file
as an error when first started in this configuration.
* The ability to expand running jobs was added
CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
=============================================================
* In order to support more than one front end node, new parameters have been
added to support a new data structure: FrontendName, FrontendAddr, Port,
State and Reason.
* DebugFlags of Frontend added
* Added new configuration parameter MaxJobId. Use with FirstJobId to limit
range of job ID values.
* Added new configuration parameter MaxStepCount to limit the effect of
bad batch scripts. The default value is 40,000 steps per job.
* Change node configuration parameter from "Procs" to "CPUs". Both parameters
will be supported for now.
COMMAND CHANGES (see man pages for details)
===========================================
* scontrol has the ability to get and set front end node state.
* scontrol has the ability to set slurmctld's DebugFlags.
* Add new scontrol option of "show aliases" to report every NodeName that is
associated with a given NodeHostName when running multiple slurmd daemons
per compute node (typically used for testing purposes).
* A reservation flag of "License_Only" has been added for use by the sview and
scontrol commands. If set, then jobs using the reservation may use the
licenses associated with it plus any compute nodes. Otherwise the job is
limited to the compute nodes associated with the reservation.
* The dependency option of "expand" has been added. This option identifies a
job whose resource allocation is intended to be used to expand the allocation
of another job. See https://computing.llnl.gov/linux/slurm//faq.html#job_size
for a description of it's use.
BLUEGENE SPECIFIC CHANGES
=========================
BGQ support added.
OTHER CHANGES
=============
API CHANGES
===========
Changed members of the following structs
========================================
block_info_t
Added job_list
Added used_mp_inx
Added used_mp_str
node_cnt -> cnode_cnt
ionodes -> ionode_str
nodes -> mp_str
bp_inx -> mp_inx
job_step_info_t
Added select_jobinfo
Added the following struct definitions
======================================
front_end_info_msg_t entirely new structure
front_end_info_t entirely new structure
job_info_t
batch_host name of the host running the batch script
preempt_time time that a job become preempted
job_step_create_response_msg_t
select_jobinfo data needed from the select plugin for a step
partition_info_t
grace_time preempted job's grace time in seconds
slurm_ctl_conf
max_job_id maximum supported job id before starting over
with first_job_id
max_step_count maximum number of job steps permitted per job
slurm_step_layout
front_end name of front end host running the step
slurmdb_qos_rec_t
grace_time preempted job's grace time in seconds
update_front_end_msg_t entirely new structure
Changed the following enums and #defines
========================================
job_state_reason
FAIL_BANK_ACCOUNT -> FAIL_ACCOUNT
FAIL_QOS /* invalid QOS */
WAIT_QOS_THRES /* required QOS threshold has been breached */
select_jobdata_type
SELECT_JOBDATA_PTR /* data-> select_jobinfo_t *jobinfo */
SELECT_JOBDATA_BLOCK_PTR /* data-> bg_record_t *bg_record */
SELECT_JOBDATA_DIM_CNT /* data-> uint16_t dim_cnt */
SELECT_JOBDATA_BLOCK_NODE_CNT /* data-> uint32_t block_cnode_cnt */
SELECT_JOBDATA_START_LOC /* data-> uint16_t
* start_loc[SYSTEM_DIMENSIONS] */
select_nodedata_type
SELECT_NODEDATA_PTR /* data-> select_nodeinfo_t *nodeinfo */
select_print_mode
SELECT_PRINT_START_LOC /* Print just the start location */
select_type_plugin_info is no longer and it's contents are now mostly #defines
DEBUG_FLAG_FRONT_END added DebugFlags of Frontend
JOB_PREEMPTED added new job termination state to indicated
job termination was due to preemption
RESERVE_FLAG_LIC_ONLY reserve licenses only, use any nodes
TRIGGER_RES_TYPE_FRONT_END added trigger for frontend state changes
Added the following API's
=========================
slurm_free_front_end_info_msg free front end state information
slurm_init_update_front_end_msg initialize data structure for front end update
slurm_load_front_end() load front end state information
slurm_print_front_end_info_msg print all front end state information
slurm_print_front_end_table print state information for one front end node
slurm_sprint_front_end_table output state information for one front end node
slurm_update_front_end update state of front end node
Changed the following API's
===========================