Page Areas:



Current Submenu:

Additional Information:

News

Information on the Science Park

Hier den Alternativtext zum Bild eingeben!

Position Indication:

Content

Scientific Computing

If you need to use the scientific computing infrastructure of the JKU you need to get an account, log in via ssh, configure your environment, if necessary compile your software and finally submit your jobs. If at any point you get stuck, contact us.

We are running several machines which cover various requirements on parallelism, memory and performance:

  • Mach (mach.jku.at) - big SMP system with 2048 CPU cores and 16 TB RAM. Use this if you need massive parallellism or huge amounts of RAM for your jobs. Unfortunately the size of the system also means that it is not that well suited lots of smaller jobs, so for those you might want to use our cluster instead:
  • Alex (alex.jku.austriangrid.at) - this is a cluster with 48 nodes with 8 CPU cores each, and either 96 or 48 GB RAM. If this is enough for your jobs you should run them here, as it is much more robust and reliable, more readily available and thanks to being partitioned in a lot of smaller units instead of a single huge one, it also has a slight performance edge. Of course you can also run jobs which don't fit on a single node using MPI, but that might have a perceptible performance penalty - please test on both systems and pick the one offering better performance and reliability. While Alex can still be used to good effect for existing projects, it is nevertheless a legacy system and for new projects you should preferably use Lise.
  • Lise (lise.jku.austriangrid.at) - like Alex, this is an Altix ICE 8200, but bigger and with a faster interconnect. There are 128 compute nodes available (with a total of 1024 CPU cores), and the interconnect is a dual-rail infiniband, which can be used to full extent via a specifically configured Mvapich2 MPI implementation. The software infrastructure on Lise is developed and implemented entirely in-house by the department for scientific computing and almost exclusively uses free software (Debain GNU/Linux for OS, Torque/Maui as batch system and scheduler, Mvapich2 for MPI etc.). If you need to choose among Alex and Lise, pick Lise.
  • Lilli (lilli.edvz.uni-linz.ac.at) - This system is not available any more. We are working on providing a replacement. In the meantime, please use one of our other computing resources. We'll be glad to support you in implementing your calculations on Mach, Alex or Lise.

Contact

You can contact us using the e-mail address wradmin(at)jku.at or contact:
DI Johann Messner (johann.messner(at)jku.at, Tel +43 732 2468 8203), or
Faruk Kujundžić (faruk.kujundzic(at)jku.at, Tel. +43 732 2468 8613).

PBS Pro

PBS is a batch system used for scheduling and management of jobs. Users submit jobs to PBS, which then distributes them according to hardware requirements, availability, priority etc.

While submitting jobs on Alex it is important to remember that the nodes have 8 physical cores each, but due to hyperthreading 16 cores are displayed. Hyperthreading is useful for certain types of jobs, while in other cases it might even hurt performance. Therefore PBS is set to allow requests with a maximum of 8 cores pro node in order to prevent hyperthreading from being used unintentionally. Those who need hyperthreading can still use it by adapting their job to run more threads.

For jobs on Mach it is very important not to underestimate your memory needs. Jobs are placed inside cpusets which only have the amount of memory and CPU cores you request. When you use up your allocated memory the system has to use swapped memory, which besides being silly on a machine equipped with 16 Terabytes of RAM, is quite catastrophical for the performance. This can slow the machine to a crawl or even cause a crash outright.

qsub

The command qsub is the primary way of submitting jobs to PBS. It takes the name of a script as an argument, or you can pipe the command you want to execute to it:


echo "/path/to/your/executable" | qsub

Executed this way, your job will output the stdout to the file STDIN.o<job-id> and the STDERR will be in STDIN.e<job-id>. The name STDIN comes from the fact that qsub received the job via its stdin, so don't let that confuse you.

Creating a PBS script allows you to specify more options and to reatain them for later execution, it is in fact the standard way of submitting jobs. An example script could look like this:

#!/bin/bash
#PBS -o myjob.out
#PBS -j oe
#PBS -l ncpus=8
#PBS -l walltime=1:30:00
#PBS -M your.email(/\t)domain.org
<your executable>

The lines starting with #PBS are not comments, they’re used to pass the options to qsub. The options listed cause the output to be put into the file myjob.out, to merge the stdout and stderr, request 8 cpus for the job, and instruct PBS to mail you on your.email(/\t)domain.org if the job is terminated by the batch system. The last line contains the command you want to execute.

Please note that you don't necessarily need to write a script - since the lines starting with #PBS are merely a notation to save options to a file you can also simply pass them to qsub directly. Running qsub with the above script is equivalent to calling it like this:

qsub -o myjob.out -j oe -l ncpus=8 -l walltime=1:30:00 -M your.email(/\t)domain.org -- <your executable>

The double dash preceding the executable lets qsub know that the end of the options is reached. Everything after this point is interpreted as a command that is to be run in the job.

Please also note the resource request - this is quite important to help PBS queue jobs more efficiently. Specifying a walltime - which is a length of time you expect your job to run - enables PBS to "fit in" smaller jobs while waiting for resources to become free for others. For example, if PBS has to allocate 4 entire nodes for a job with a high schedule priority, the nodes will be kept idle until enough are free to run the job. However, if it will take 3 Hours until enough nodes are free, PBS can run jobs requiring less than that in the meantime - even if those jobs are of much lower scheduling priority. Specifying the walltime allows PBS make better scheduling decisions, allowing the resources to be used more thoroughly, and it will probably help your job get executed sooner.

While requesting walltime, take care not to request too much (which will reduce your job’s chances to get scheduled earlier), but above all do not request too little. Once the requested walltime is over, the job will be finished - regardless if the calculation is completed or not. So please try to estimate a realistic value and request 10-20% more as a safety margin. You can use the utility hvlt_pbstimestats to get an estimate of your average job runtime.

qsub and MPI (SGI MPT)

To run an MPI job via PBS, use the following example script as a template:


#!/bin/bash
#PBS -o myjob.out
#PBS -j oe
#PBS -l select=2:ncpus=6:mpiprocs=6
#PBS -l walltime=11:30:00
#PBS -M your.email(/\t)domain.org
mpiexec <your executable>

As you can see the resource request is slightly different, it uses the select statement along with ncpus and mpiprocs. The above example would request two nodes with 6 CPU cores and 6 mpi processes each, total 12 mpi processes. Instead of using ’mpirun’ as you would normally, use ’mpiexec’ - this utility is indended specifically for use with PBS and will honour your resource
requests automagically.

In case you want to use a specific MPI implementation, you can still do it with PBS. Use the appropriate option of your MPI implementation to pass it the machinefile name stored in the environment variable $PBS_NODEFILE . This file will be generated automatically by PBS and placed on the execution hosts.

qstat

The command qstat serves to display the schedule, with running and queued jobs. If you use it with the option -f you’ll see the detailed output. Use this if, for instance, you want to see on which node(s) is your task currently executing etc.

The output from qstat looks like this:


horrovac@service0:~/src/c> qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
41158.service0 STDIN agpXXXXX 00:00:21 R workq
41160.service0 STDIN agpXXXXX 00:00:00 R workq
45228.service0 STDIN kXXXXXX 00:00:00 R workq
47490.service0 STDIN horrovac 00:00:00 R workq

Probably the most useful info you get from this is the job id, which you can use to specify your job to other PBS utilities.

On alex you can also use the command hvlt_qstat, which is a JKU-specific rewrite of the qstat command supposed to offer a somewhat more userfriendly interface.

hvlt_qstat

qdel

The command qdel lets you delete a job in case you don’t need it any more, or if you made a mistake and don’t want the job to complete. First, list the jobs with qstat, and then use the job identifier as an argument to qdel. You can only delete your own jobs.

GNU module

Serving a multitude of users, the cluster needs to provide several environments of varying diversity, sometimes overlapping or excluding one another. Our attempt to fulfill this requirement involves the use of GNU module package. This allows the user to load or unload preset environment settings as needed without needing to deal with the complexity of this environment or any pitfalls, which should have already been caught by the admin while testing. This makes it possible to use a variety of compilers or libraries without interference. It also removes the complexity of catering for different shells, a module uses its own syntax and does the right thing for each shell.

Loading and unloading modules

Using module is pretty easy, to load a module you simply have to issue the command:


module add icc

...to add the icc module. Your software will then be compiled using the intel compiler. If this shouldn’t work, you can revert back to the system default compiler by doing:

module delete icc

...this will remove all the modifications made to the shell environment by loading the module - something that is much more complex when using the software-supplied initialisation scripts.

Viewing the currently loaded modules

To see which modules you currently have loaded, use the command:


<username>@service0:~> module list
Currently Loaded Modulefiles:
   1) icc/11.0 3) git/1.6.4.1
   2) subversion/1.6.11 4) extralibs/1.0

This is the list of modules you can see right after logging in. Some modules are deemed useful enough to be loaded by default - icc because the Intel compiler is optimised for the hardware and will usually deliver better performance, subversion and git are widely used version management systems, and the special module extralibs provides a catch-all for additional packages which are sometimes required yet not quite significant to warrant a module of their own. Additional benefit is that with this configuration, the software management of cluster is simplified and made more robust. Basically, you just don’t need to worry about this module, it is always there, and that’s how it should be. :)

Viewing the available modules

To see which modules are available, type:


horrovac@service0:~> module avail

------------------------------ /usr/share/modules ------------------------------
3.1.6 modulefiles/java/1.6.0_18
modulefiles/NAMD/1.66 modulefiles/module-cvs
modulefiles/R/2.10.0 modulefiles/module-info
modulefiles/R/2.10.1 modulefiles/modules
modulefiles/acml/4.4.0 modulefiles/mpich-ch-p4
modulefiles/charmm/c35b3 modulefiles/mpich-ch-p4mpd
modulefiles/cilk/5.4.6 modulefiles/mpich-g2/1.2.7p1
modulefiles/dot modulefiles/mpt/1.23
modulefiles/extralibs/1.0 modulefiles/null
modulefiles/fftw3/3.2.2 modulefiles/numpy/1.4.1
modulefiles/gcc/4.4.2 modulefiles/perfcatcher
modulefiles/git/1.6.4.1 modulefiles/sqlite3/3.6.17
modulefiles/gromacs/4.0.5 modulefiles/subversion/1.6.11
modulefiles/gromacs/4.0.7 modulefiles/subversion/1.6.5
modulefiles/icc/11.0 modulefiles/use.own

------------------------ /usr/share/modules/modulefiles ------------------------
NAMD/1.66 module-cvs
R/2.10.0 module-info
R/2.10.1 modules
acml/4.4.0 mpich-ch-p4
charmm/c35b3 mpich-ch-p4mpd
cilk/5.4.6 mpich-g2/1.2.7p1
dot mpt/1.23
extralibs/1.0 null
fftw3/3.2.2 numpy/1.4.1
gcc/4.4.2 perfcatcher
git/1.6.4.1 sqlite3/3.6.17
gromacs/4.0.5 subversion/1.6.11(default)
gromacs/4.0.7 subversion/1.6.5
icc/11.0 use.own
java/1.6.0_18

Module versions

In the above list you will note there are some modules with several versions. These versions correspond to the software versions they provide you with. For example if you load the subversion module, you’ll be using version 1.6.11, which has been marked default by the administrator. If instead you want to use version 1.6.5 you can use the module command like this:


module add subversion/1.6.5

If no modulefile has been marked default, then the highest lexicographically sorted modulefile in the directory (modules are directories, versions are files in those directories) will be used. If you think the module loaded automatically is the wrong one, ask the administrator to correct this.

Compiling software

First of all: please don't just start compiling right away. First of all, look around, the software you need may be installed already. The central location for additional software is /apps, you might find it there. If there is a directory with your software, there probably is also a module to make it available (consult the section on GNU module). If you need special libraries, just try using them - additional libraries are installed in /apps/extralibs, which is included in the relevant environment variables per default, so it might just work™.

If you still don't find what you need - either because it's missing entirely or you need a newer version - please don't start compiling quite yet, talk to us first. If you need this, others might need it too, so it is better if we can make it available system-wide instead of everyone growing their own. In case the software or library is too old, remind us to update it. We have already done it at least once, we keep records on how to do it (in case it's buggy or quirky), so it will probably cost us less trouble than you, and more people can benefit.

Last but not least we try to use the latest ∧ greatest compilers with the highest level of optimisation and the combinations of libraries promising the highest performance. So talk to us, it often pays off.

That out of the way, sometimes compiling yourself is unavoidable or just better. For this, you have a choice of intel compiler suite (icc, icpc, ifort and so on) or GNU compiler collection (gcc, g++, gfortran...). If you want to use gcc make sure you load an appropriate module first, or you might end up using an old version. Newer versions are available by loading a specific environment module.

After you load the module the build system (like autotools) should be able to detect and use the proper compiler and libraries. Always inspect the output of configure or build scripts, as some might ignore environment variables and need to be modified manually. Adding header- or library search paths is not required as this is done by the module.

For arithmetic operations you will get the highest performance by using the intel MKL interfaces for blas and lapack. Include them in your CFLAGS or configure parametres like this:


-lmkl_blas95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5


If you would like to try the highest possible optimisation, use the flags:

-O3 -ipo -axSSE4.2 -msse4.2 -parallel

This is the highest available optimisation level on all nodes, so programs compiled with these will work everywhere (except on the login node).

Memory topology of Altix UV 1000

Even though the Altix UV is a single system image, access to memory is not uniformly fast. The machine consists of individual nodes interconnected by NUMAlink to create a single memory image. How fast a memory access can be depends on how "close" the node is to the memory accessed.The graph shows access latencies to memory by its location. Note that a NUMA node consists of one CPU socket, so one phyical altix node consists of two nodes. The lowest latency is achieved for communication to own memory (10), it is higher (13) for the neighbouring CPU socket, rises again (48, 55 and 62) for sockets in the same IRU (Blade enclosure), and is finally highest (63, 70 and 78) for not directly connected sockets.

Tabular representation of the above for the first 48 nodes (the rest follows the same pattern):

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647
0101340406262555562625555626255556262555562625555626255556262555578787070787870707878707078787070
113 1040406262555562625555626255556262555562625555626255556262555578787070787870707878707078787070
24040 10135555484855554848555548485555484855554848555548485555484870706363707063637070636370706363
3404013 105555484855554848555548485555484855554848555548485555484870706363707063637070636370706363
462625555 1013404062625555626255556262555562625555626255556262555578787070787870707878707078787070
56262555513 10404062625555626255556262555562625555626255556262555578787070787870707878707078787070
6555548484040 101355554848555548485555484855554848555548485555484870706363707063637070636370706363
755554848404013 1055554848555548485555484855554848555548485555484870706363707063637070636370706363
86262555562625555 10134040626255556262555562625555626255556262555578787070787870707878707078787070
9626255556262555513 104040626255556262555562625555626255556262555578787070787870707878707078787070
1055554848555548484040 1013555548485555484855554848555548485555484870706363707063637070636370706363
115555484855554848404013 10555548485555484855554848555548485555484870706363707063637070636370706363
12626255556262555562625555 101340406262555562625555626255556262555578787070787870707878707078787070
1362625555626255556262555513 1040406262555562625555626255556262555578787070787870707878707078787070
145555484855554848555548484040 10135555484855554848555548485555484870706363707063637070636370706363
15555548485555484855554848404013 105555484855554848555548485555484870706363707063637070636370706363
1662625555626255556262555562625555 1013404062625555626255556262555578787070787870707878707078787070
176262555562625555626255556262555513 10404062625555626255556262555578787070787870707878707078787070
18555548485555484855554848555548484040 101355554848555548485555484870706363707063637070636370706363
1955554848555548485555484855554848404013 1055554848555548485555484870706363707063637070636370706363
206262555562625555626255556262555562625555 10134040626255556262555578787070787870707878707078787070
21626255556262555562625555626255556262555513 104040626255556262555578787070787870707878707078787070
2255554848555548485555484855554848555548484040 1013555548485555484870706363707063637070636370706363
235555484855554848555548485555484855554848404013 10555548485555484870706363707063637070636370706363
24626255556262555562625555626255556262555562625555 101340406262555578787070787870707878707078787070
2562625555626255556262555562625555626255556262555513 1040406262555578787070787870707878707078787070
265555484855554848555548485555484855554848555548484040 10135555484870706363707063637070636370706363
27555548485555484855554848555548485555484855554848404013 105555484870706363707063637070636370706363
2862625555626255556262555562625555626255556262555562625555 1013404078787070787870707878707078787070
296262555562625555626255556262555562625555626255556262555513 10404078787070787870707878707078787070
30555548485555484855554848555548485555484855554848555548484040 101370706363707063637070636370706363
3155554848555548485555484855554848555548485555484855554848404013 1070706363707063637070636370706363
327878707078787070787870707878707078787070787870707878707078787070 10134040626255556262555562625555
33787870707878707078787070787870707878707078787070787870707878707013 104040626255556262555562625555
3470706363707063637070636370706363707063637070636370706363707063634040 1013555548485555484855554848
357070636370706363707063637070636370706363707063637070636370706363404013 10555548485555484855554848
36787870707878707078787070787870707878707078787070787870707878707062625555 101340406262555562625555
3778787070787870707878707078787070787870707878707078787070787870706262555513 1040406262555562625555
387070636370706363707063637070636370706363707063637070636370706363555548484040 10135555484855554848
39707063637070636370706363707063637070636370706363707063637070636355554848404013 105555484855554848
4078787070787870707878707078787070787870707878707078787070787870706262555562625555 1013404062625555
417878707078787070787870707878707078787070787870707878707078787070626255556262555513 10404062625555
42707063637070636370706363707063637070636370706363707063637070636355554848555548484040 101355554848
4370706363707063637070636370706363707063637070636370706363707063635555484855554848404013 1055554848
447878707078787070787870707878707078787070787870707878707078787070626255556262555562625555 10134040
45787870707878707078787070787870707878707078787070787870707878707062625555626255556262555513 104040
4670706363707063637070636370706363707063637070636370706363707063635555484855554848555548484040 1013
477070636370706363707063637070636370706363707063637070636370706363555548485555484855554848404013 10

FAQ

Q: My job is suspended and another job is being executed instead, even though there are free nodes available - why is my job suspended?
A: This is due to the workings of PBS and the impossibility of moving jobs between nodes. The job that has currently suspended your job had higher priority (probably because of fairshare ranking), and at the time the job was started there were no other nodes available. Once the job was started it has to finish, PBS controls the scheduling and running, but it can't move jobs to other nodes, so your job has to wait despite the free ressources.
     In case your job writes out its data periodically and can resume execution from that point, you can cut the waiting by deleting the job and re-submitting it. For example, gromacs writes checkpoints every couple of minutes and will automatically resume from the last saved point when restarted. One usually loses a bit of computing time, but it may be worth it. There is no way to tell how long you may be required to wait.

Q: My job has spent quite a while in the queue, there are free nodes available, yet my job is not starting - why?
A: use qstat with the -f switch to generate the full output and look for indications why the job is not starting. In most cases it will be a matter of requesting too much resources, and in this case you will find a line looking like: "comment = Not Running: No available resources on nodes" in the output of qstat -f <your_job_id>. Then inspect the resource requests, the lines beginning with "Resource_List" to see what's wrong. In the case of my test-job, this is:


     Resource_List.ncpus = 16
     Resource_List.nodect = 1
     Resource_List.place = free
     Resource_List.select = 1:ncpus=16

...and there we have it, I have requested 1 node with 16 cores, which can't work (maximum is 8 per node). I would have to modify the job to either request 2 nodes with 8 cores each, or modify my command to make the job run with 16 threads after requesting 8 cores.

If however you can't find anything wrong with your request, talk to us.