Thursday, 4 June 2015

PBS Pro Tutorial

Posted by Krishna Arutwar
What is PBS Pro?
Portable Batch System (PBS) is a software which is used in cluster computing to schedule jobs on multiple nodes. PBS was started as contract project by NASA.
            PBS is available in three different versions as below
1) Torque: Terascale Open-source Resource and QUEue Manager (Torque) is developed from OpenPBS. It is developed and maintain by Adaptive Computing Enterprises. It is used as a distributed resource manager can perform well when integrated with Maui cluster scheduler to improve performance.
2) PBS Professional (PBS Pro): It is commercial version of PBS offered by Altair Engineering.
3) OpenPBS: It is open source version released in 1998 developed by NASA. It is not actively developed.
In this article we are going to concentrate on tutorial of PBS Pro it is similar to some extent with Torque.
Fig. 1.1 PBS complex cluster with eight execution host
PBS contain three basic units server, MoM (execution host), scheduler.
1)      Server: It is heart of the PBS, with executable named “pbs_server”. It uses IP network to communicate with the MoMs. PBS server create a batch job, modify the job requested from different MoMs. It keeps track of all resources available, assigned in the PBS complex from different MoMs. It will also monitor the PBS license for jobs. If your license expires it will throw an error.
2)      Scheduler: PBS scheduler uses various algorithms to decide when job should get executed on which node or vnode by using detail of resources available from server. It has executable as “pbs_sched”.
3)      MoM: MoM is the mother of all execution job with executable “pbs_mom”. When MoM gets job from server it will actually execute that job on the host. Each node must have MoM running to get participate in execution.

Installation and Setting up of environment (cluster with multiple nodes)
Extract compressed software of PBS Pro and go the path of extracted folder it contain “INSTALL” file, make that file executable you may use command like “chmod +x ./INSTALL”. As shown in the image below run this executable. It will ask for the “execution directory” where you want to store the executable (such as qsub, pbsnodes, qdel etc.) used for different PBS operations and “home directory” which contain different configuration files. Keep both as default for simplicity.
            There are three kind of installation available as shown in figure:
Fig. 1.2 PBS installation
1) Server node: PBS server, scheduler, MoM and commands are installed on this node. PBS server will keep track of all execution MoMs present in the cluster. It will schedule jobs on this execution nodes. As MoM and commands are also installed on server node it can be used to submit and execute the jobs.
2) Execution node: This type installs MoM and commands. This nodes are added as available nodes for execution in a cluster. They are also allowed to submit the jobs at server side with specific permission by server as we are going to see below. They are not involved in scheduling. This kind of installation ask for PBS server which is used to submit jobs, get status of jobs etc.
3) Client node: This are the nodes which are only allowed to submit a PBS job at server with specific permission by the server and allowed to see the status of the jobs. They are not involved in execution or scheduling.

Creating vnode in PBS Pro:
We can create multiple vnodes in a single node which contain some part of resources in a node. We can execute job on this vnodes with specified allocated resources. We can create vnode using qmgr command which is command line interface to PBS server. We can use command given below to create vnode using qmgr.

Qmgr: create node Vnode1,Vnode2 resources_available.ncpus=8, resources_available.mem=10gb, 
resources_available.ngpus=1, sharing=default_excl 

The command above will create two vnodes named Vnode1 and Vnode2 with 8 cpus cores, 10gb of memory and 1 GPU with sharing mode as default_excl which means this vnode can execute exclusively only one job at a time independent of number of resources free. This sharing mode can be default_shared which means any number of jobs can run on that vnode until all resources are busy. To know more about all attributes which can be used with vnode creation are available in PBS Pro reference guide. 

You can also create a file in "/var/spool/PBS/mom_priv/config.d/" this folder with any name you want I prefer hostname-vnode with sample given below. It will select all files even temporary files with (~) and replace configuration for same vnode so delete unnecessary files to get proper configuration of vnodes.
e.g.

$configversion 2
hostname:resources_available.ncpus=0hostname:resources_available.mem=0hostname:resources_available.ngpus=0hostname[0]:resources_available.ncpus=8
 hostname[0]:resources_available.mem=16gbhostname[0]:resources_available.ngpus=1
hostname[0]:sharing=default_exclhostname[1]:resources_available.ncpus=8hostname[1]:resources_available.mem=16gbhostname[1]:resources_available.ngpus=1
hostname[1]:sharing=default_exclhostname[2]:resources_available.ncpus=8hostname[2]:resources_available.mem=16gbhostname[2]:resources_available.ngpus=1
hostname[2]:sharing=default_exclhostname[3]:resources_available.ncpus=8hostname[3]:resources_available.mem=16gbhostname[3]:resources_available.ngpus=1
hostname[3]:sharing=default_excl
Here in this example we assigned default node configuration to resource available as 0 because by default it will detect and allocate all available resources to default node with sharing attribute as is default_shared. Which cause problem as all the jobs will by default get scheduled on that default vnode because its sharing type is default_shared. If you want to schedule jobs on your customized vnodes you should allocate resources available as 0 on default vnode. Every time whenever you restart the PBS server this vnodes get create unlike vnodes created manually using command line.

PBS get status:
get status of Jobs:
qstat will give details about jobs there states etc.
useful options:
To print detail about all jobs which are running or in hold state: qstat -a 
To print detail about subjobs in JobArray which are running or in hold state: qstat -ta
To print all finished jobs: qstat -x

get status of PBS nodes and vnodes:
"pbsnode -a" command will provide list of all nodes present in PBS complex with there resources available, assigned, status etc.
To get details of all nodes and vnodes you created use "pbsnodes -av" command.
You can also specify node or vnode name to get detail information of that specific node or vnode.
e.g.
pbsnodes wolverine (here wolverine is hostname of the node in PBS complex which is mapped with IP address in /etc/hosts file)
Job submission (qsub):
PBS MoM will submit jobs to the PBS server. Server maintain queue of jobs by default all jobs are submitted to default queue named “workq”. You may create multiple queues by using “qmgr” command which is administrator interface mainly used to create, delete & modify queues and vnodes. PBS server will decide which job to be scheduled on which node or vnode based on scheduling policy and privileges set by user. To schedule jobs server will continuously ping to all MoMs in the PBS complex to get detail of resources available and assigned. PBS assigns unique job identifier to each and every job called JobID.
For job submission PBS uses “qsub” command. It has syntax as shown below
qsub script
Here script may be a shell (sh, csh, tchs, ksh, bash) script. PBS by default uses /bin/sh. You may refer simple script given below
#!/bin/sh
echo “This is PBS job”
sleep 100

When PBS completes execution of job it will store errors in file name with JobName.e{JobID} e.g. Job1.e1492
Output with file name
JobName.o{JobID} e.g. Job1.o1492
By default it will store this files in the current working directory (can be seen by pwd command). You can change this location by giving path with -o option.

you may specify job name with -N option while submitting the job
qsub -N firstJob ./test.sh

If you don't specify job name it will store files by replacing JobName with script name.
e.g. qsub ./test.sh this command will store results in file with test.sh.e1493 and test.sh.o.1493 in current working directory.
 OR
qsub -N firstJob -o /home/user1/ ./test.sh this command will store results in file with test.sh.e1493 and test.sh.o.1493 in /home/user1/ directory.
If submitted job terminate abnormally (errors in job is not abnormal, this errors get stored in JobName.e{JobID} file) it will store its error and output files in "/var/spool/PBS/undelivered/" folder.

Useful Options:
Select resources:
qsub -l select="chunks":ncpus=3:ngpus=1:mem=2gb script 

e.g.
qsub -l select=2:ncpus=3:ngpus=1:mem=2gb /home/titan/PBS/scripts/in.sh

This Job will selects 2 copies with 3 cpus, 1 gpu and 2gb memory which mean it will select 6 cpus, 2 gpus and 4 gb ram.

qsub -l nodes=megamind:ncpus=3 /home/titan/PBS/input/in.sh

This job will select one node specified with hostname.
To select multiple nodes you may use command given below
qsub -l nodes=megamind+titan:ncpus=3 /home/titan/PBS/input/in.sh
Submit multiple jobs with same script (JobArray):
qsub -J 1-20 script

If you specify resources to Job Array each subjob will require resources specified. JobId in Job Array start with JobArrayID[0], JobArrayID[1], .....,, JobArrayID[n-1]

Submit dependant jobs:
In some cases you may require job which should run after successful or unsuccessful completion of some specified jobs for that PBS provide some options such as

qsub -W depend=afterok:316.megamind /home/titan/PBS/input/in.sh


This specified job will start only after successful completion of job with job ID "316.megamind". Like afterok PBS has other options such as beforeok, beforenotok to , afternotok. You may find all this details in the man page of qsub.

Submit Job with priority:
There are two ways using which we can set priority to jobs which are going to execute.
1) Using single queue with different jobs with different priority:
       To change sequence of jobs queued in a execution queue open "$PBS_HOME/sched_priv/sched_config" file, normally $PBS_HOME is present in "/var/spool/PBS/" folder. Open this file and uncomment the line below if present otherwise add it.
job_sort_key : "job_priority HIGH"
After saving this file you will need to restart  the pbs_sched daemon on head node you may use command below
service pbs restart
After completing this task you have to submit the job with -p option to specify priority of job within queue. This value may range between (-1024) to 1023, where -1024 is the lowest priority and 1023 is the highest priority in the queue.
e.g.
qsub -p 100 ./X.sh
qsub -p 101 ./Y.sh
qsub -p 102 ./Z.sh 
 In this case PBS will execute jobs as explain in the diagram given below

2) Using different queues with specified priority: We are going to discuss this point in PBS Queue section.

In this example all jobs in queue 2 will complete first then queue 3 then queue 1, since priority of queue 2 > queue 3 > queue 1.
Because of this job execution flow is as shown below

 J4=> J5=> J6=>J7=> J8=> J9=> J1=> J2=> J3 
PBS Queue:
PBS Pro can manage multiple queue as per users requirement. By default every job is queued in "workq" for execution. There are two types of queue are available execution and routing queue. Jobs in execution queue are used by pbs server for execution. Jobs in routing queue can not be executed they can be redirected to execution queue or another routing queue by using command qmove command. By default queue "workq" is an execution queue. The sequence of job in queue may change by using priority defined while job submission as specified above in job submission section.

Useful qmgr commands:
 First type qmgr which is Manager interface of PBS Pro.
To create queue: 
Qmgr: create queue test2


To set type of queue you created: 
Qmgr: set queue test2 queue_type=execution

OR
Qmgr: set queue test2 queue_type=route


To enable queue: 
Qmgr: set queue test2 enabled=True


To set priority of queue: 
Qmgr: set queue test2 priority=50

Jobs in queue with higher priority will get first preference. After completion of all jobs in the queue with higher priority jobs in lower priority queue are scheduled. There is huge probability of job starvation in queue with lower priority.

To start queue: 
Qmgr: set queue test2 started = True


To activate all queue (present at particular node): 
Qmgr: active queue @default


To set queue for specified users: You require to set acl_user_enable attribute to true which indicate PBS to only allow user present in acl_users list to submit the job.
 Qmgr: set queue test2 acl_user_enable=True


To set users permitted (to submit job in a queue): 
Qmgr: set queue test2 acl_users="user1@..,user2@..,user3@.."

(in place of .. you have to specify hostname of compute node in PBS complex. Only user name without hostname will allow users (with same name) to submit job from all nodes (permitted to submit job) in a PBS Complex).

To delete queues we created: 
Qmgr: delete queue test2


To see details of all queue status: 
qstat -Q 

You may specify specific queue name: qstat -Q test2
To see full details of all queue: qstat -Q -f 
You may specify specific queue name: qstat -Q -f test2

Read More