IBM LoadLeveler is the workload manager on URSA. Users access it directly by loging to ursa.gsu.edu via SSH or they can submit their jobs via the EnginFrame portal.
The portal lets users submit, monitor, and control jobs and data files through a simple web interface. There are services for most frequently used applications: Gaussian, Amber, and SAS. Additionally, there is a service that lets you build your own submission script using a template and upload any data files for your job.

Use

Basic user commands:

llsubmit command_script - submits a job described in command_script

llcancel job_id - cancels a job

llclass - displays classes (queues) existing on the system and their descriptions

llq - lists queued jobs

llhold job_id - holds or releases a job

Find an online book with a wealth of information regarding LoadLeveler LoadLeveler">here.

A generic command script for a parallel job:

#!/bin/ksh

#####################################################

# This is a generic submission script fora parallel

# job. 

#####################################################

# @ error = default.$(jobid).err

# @ output = default.$(jobid).out

# @ class = verylong 

# @ job_type = parallel

# @ env_copy = all

# @ network.MPI = sn_single,shared,us

######################################################

# @ initialdir =   

######################################################

# @ restart = no

# @ node = 1

# @ total_tasks = 2

# @ queue

######################################################

######################################################

your_executable 

-- JaroKlc - 19 Jun 2008

Queues

Below are the current queues (known as 'classes' in Loadleveler) currently configured on the cluster:

*Name* MaxJobCPU
d+hh:mm:ss
*Max
Slots*
*Description*
small 00:15:00 504 15 minute runtime limit queue
medium 02:00:00 504 2 hour runtime limit queue
large 1+00:00:00 496 24 hour runtime limit queue
large5 5+00:00:00 480 5 day runtime limit queue
large10 10+00:00:00 480 10 day runtime limit queue
large15 15+00:00:00 480 15 day runtime limit queue
large20 20+00:00:00 480 20 day runtime limit queue
large25 25+00:00:00 480 25 day runtime limit queue
verylong 30+00:00:00 480 30 day runtime limit queue
largememory64 30+04:00:00 16 30 day queue for the expanded memory node, 64GB
largememory256 30+04:00:00 16 30 day queue for the expanded memory node, 256GB
preemptME unlimited 568 Unlimited runtime, however this class runs at lowest priority
and will ALWAYS be preempted by any other queue
dev_small 00:15:00 16 15 minute runtime limit development queue
dev_medium 02:00:00 16 2 hour runtime limit development queue
sura_small 00:15:00 32 15 minute runtime limit SURAgrid queue
sura_large 5+00:00:00 32 5 day runtime limit SURAgrid queue
sura_priority 5+00:00:00 32 5 day runtime limit SURAgrid queue - preempts other SURA classes
georgia_southern unlimited 16 Georgia Southern University restricted access queue
16 cpus, Node 36

-- VictorBolet - 12 Aug 2009

Topic revision: r2 - 12 Aug 2009 - 16:24:39 - VictorBolet
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback