IGE showcases‎ > ‎

GridWay


This showcase describes several usage scenarios for submitting, monitoring and controlling jobs using the GridWay metascheduler. It shows how to use the GridWay CLI, DRMAA API, and remote BES interface, as well as an alternative way of monitoring jobs and resources through a Web interface.

  1. Single Jobs

  • Create your proxy, e.g.:

$ grid-proxy-init

  • Create a simple Job Template (called jt), such as:

$ cat jt
EXECUTABLE = /bin/ls

  • Use gwsubmit command to submit the job, specifying the job template created:

$ gwsubmit -t jt

  • Use GridWay CLI for monitoring jobs and resources. Alternatively, a GridWay Web interface (see Section 5) can be used for monitoring.

  • Use gwhost command to see available resources:


$ gwhost
HID PRI OS             ARCH   MHZ  %CPU MEM(F/T)  DISK(F/T) NODES(U/F/T) LRMS         HOSTNAME        
0   1  Linux2.6.32.27- x86_64 2533 400 2007/2007     71G/71G   0/2/2  jobmanager-sge  gt5-ige.drg.lrz.de
1   1  Linux2.6.18-238 x86_64 1995 600 2007/2007   927G/927G 0/44/44  jobmanager-pbs  udo-gt01.grid.tu-dortmund.de
2   1  Linux2.6.18-194 x86_64 2993 800 2011/2011 3670M/3670M   0/8/8  jobmanager-fork ve.nikhef.nl             
3   1  Linux2.6.18-194 x86_64 1600 100 1024/1024     40G/40G   0/1/1  jobmanager-fork gt1.epcc.ed.ac.uk
4   1  Linux2.6.18-194 x86_64 2328 400   512/512     95G/95G   0/1/1  jobmanager-pbs  gt01.ige.psnc.pl
5   1  Linux2.6.32-36  x86_64 2000 400 2009/2009 4250G/4250G   0/4/4  jobmanager-fork gt-ige.utcluj.ro


  • and get more detailed information specifying a <HOST_ID>:

$ gwhost 0
HID PRI OS              ARCH   MHZ  %CPU MEM(F/T)  DISK(F/T) NODES(U/F/T) LRMS         HOSTNAME        
0   1   Linux2.6.32.27- x86_64 2533 400 2007/2007    71G/71G    0/2/2  jobmanager-sge  gt5-ige.drg.lrz.de

     

QUEUENAME        SL(F/T) WALLT CPUT  COUNT MAXR  MAXQ  STATUS   DISPATCH   PRIORITY
all.q                2/2     0    0      2    2     2       1      Batch Inte         

  • Check the resources that match job requirements with gwhost -m <JOB_ID>:


$ gwhost -m 0

HID QNAME     RANK    PRI   SLOTS   HOSTNAME
0   all.q      0      1       2     gt5-ige.drg.lrz.de  
1   dgiseq     0      1      44     udo-gt01.grid.tu-dortmund.de
1   dgipar     0      1      44     udo-gt01.grid.tu-dortmund.de
2   default    0      1       8     ve.nikhef.nl        
3   default    0      1       1     gt1.epcc.ed.ac.uk   
4   globus     0      1       1     gt01.ige.psnc.pl    
5   batch      0      1       4     gt-ige.utcluj.ro


  • Follow the evolution of the job with gwps command:

$ gwps
USER     JID DM   EM   START     END      EXEC    XFER   EXIT NAME    HOST                      
user:0     0 done ---- 10:38:24 10:39:04 0:00:21 0:00:08 0    jt      gt5-ige.drg.lrz.de/jobmanager-sge
user:0     1 done ---- 10:46:05 10:46:39 0:00:11 0:00:08 0    jt      udo-gt01.grid.tu-dortmund.de/jobmanager-pbs
user:0     2 wrap actv 10:48:39 --:--:-- 0:00:39 0:00:03 --   jt      gt5-ige.drg.lrz.de/jobmanager-sge


  • Use gwps -c <seconds> for continuous output.
  • See the job history with gwhistory command, specifying a <JOB_ID>:

$ gwhistory 0
HID START    END      PROLOG  WRAPPER EPILOG  MIGR    REASON QUEUE    HOST        
0   10:38:35 10:39:04 0:00:03 0:00:21 0:00:05 0:00:00 ----   all.q    gt5-ige.drg.lrz.de/jobmanager-sge

  • Once execution is done … it is time to check the results. The results of the execution of the job are in stdout.<JOB_ID> by default.


$ ls -l
total 8
-rw-r--r-- 1 user user 21 2011-05-04 16:47 jt
-rw-r--r-- 1 user user 0  2011-05-05 10:39 stderr.0
-rw-r--r-- 1 user user 72 2011-05-05 10:39 stdout.0

$ cat stdout.0
job.env
stderr.execution
stderr.wrapper
stdout.execution
stdout.wrapper

  1. Array Jobs

  • Array jobs are described in the following section with the π example.
  • π is calculated with the integral of the function f(x) = 4/(1+x2). So, π will be the integral of f(x) in the interval [0,1]. In order to calculate the whole integral, it's interesting to divide the function in several sections and compute them separately. π will be more accurate as more sections you make. Therefore, it can be calculated by giving all the nodes a section to compute.
  • Here is the serial source code. Create a pi.c file and copy inside these lines:


#include <string.h>

#include <stdlib.h>


int main (int argc, char** args)

{

int task_id;

int total_tasks;

long long int n;

long long int i;

double l_sum, x, h;


task_id = atoi(args[1]);

total_tasks = atoi(args[2]);

n = atoll(args[3]);

fprintf(stderr, "task_id=%d total_tasks=%d n=%lld\n", task_id, total_tasks, n);

h = 1.0/n;

l_sum = 0.0;

for (i = task_id; i < n; i += total_tasks)

{

x = (i + 0.5)*h;

l_sum += 4.0/(1.0 + x*x);

}

l_sum *= h;

printf("%0.12g\n", l_sum);

}

return 0;


  • Their arguments are:

    • Task ID: The identifier of the current task.

    • Total tasks: The number of tasks the computation should be divided into.

    • Integral intervals: The number of intervals over which the integral is being evaluated.


  • Compile the serial code:

$ gcc -O3 pi.c -o pi

  • Create a job template (pi.jt), such as:

EXECUTABLE = pi
ARGUMENTS = $(TASK_ID) $(TOTAL_TASKS) 100000
STDOUT_FILE = stdout_file.$(TASK_ID)
STDERR_FILE = stderr_file.$(TASK_ID)
RANK = CPU_MHZ


  • Submit the array with gwsubmit command and specifying the number of task with option -n:


$ gwsubmit -v -t pi.jt -n 4
ARRAY ID: 0
TASK JOB
0     3
1     4
2     5
3     6

  • Use the gwwait command to wait for the jobs to complete. The <ARRAY_ID> is given by option -A.


$ gwwait -v -A 0
0    : 0
1    : 0
2    : 0
3    : 0

  • At the end, the execution of these jobs has returned some output files with the result of each execution:


stdout_file.0
stdout_file.1
stdout_file.2
stdout_file.3

  • Sum the partial results to get the value of π:

$ awk 'BEGIN {sum=0} {sum+=$1} END {printf "Pi is %0.12g\n", sum}' stdout_file.*

Pi is 3.1415926536

  1. MPI Jobs

  • Previous problem can be computed as an MPI job.
  • Here is the MPI application for computing π. Create a mpi.c file and copy inside these lines:


#include "mpi.h"

#include <stdio.h>

#include <math.h>


int main( int argc, char *argv[])

{

        int done = 0, n, myid, numprocs, i;
        double PI25DT = 3.141592653589793238462643;
        double mypi, pi, h, sum, x;
        double startwtime = 0.0, endwtime;
        int namelen;
        char processor_name[MPI_MAX_PROCESSOR_NAME];

        MPI_Init(&argc,&argv);
        MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

 MPI_Comm_rank(MPI_COMM_WORLD,&myid);

        MPI_Get_processor_name(processor_name,&namelen);

        printf("Process %d on %s\n", myid, processor_name);

        n = 100000000;

        startwtime = MPI_Wtime();

        h   = 1.0 / (double) n;
        sum = 0.0;
        for (i = myid + 1; i <= n; i += numprocs)
        {
               x = h * ((double)i - 0.5);
               sum += 4.0 / (1.0 + x*x);
        }
        mypi = h * sum;

 MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);


        if (myid == 0)
        {

printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi -

PI25DT));

endwtime = MPI_Wtime();

printf("wall clock time = %f\n", endwtime-startwtime);         

        }

 MPI_Finalize();


        return 0;

}


  • Use mpicc to compile it:

$ mpicc -O3 mpi.c -o mpi

  • Create a job template (mpi.jt) for the MPI job:

EXECUTABLE= mpi
STDOUT_FILE = stdout.${JOB_ID}
STDERR_FILE = stderr.${JOB_ID}
RANK = CPU_MHZ
TYPE = "mpi"
NP = 2


  • and then, submit it to GridWay as any other job:

$ gwsubmit -t mpi.jt

  • Check the results in stdout.${JOB_ID}.

  1. Workflow Jobs

  • GridWay can handle workflows with the following functionality:

    • Sequence, parallelism, branching and looping structures.

    • The workflow can be described in an abstract form without referring to specific resources for task execution.

    • Quality of service constraints and fault tolerance are defined at task level.

  • Job dependencies are specified by using the -d option of gwsubmit command

  • Next an example to illustrate how working with array jobs. Consider the following job structure:


    Workflow example.
  • Create the following job templates:

$ cat A.jt
EXECUTABLE=/bin/echo
ARGUMENTS="$RANDOM"
STDOUT_FILE=out.A

$ cat B.jt
EXECUTABLE=sum
ARGUMENTS=out.A 1
INPUT_FILES=out.A
STDOUT_FILE=out.B

$ cat C.jt
EXECUTABLE=sum
ARGUMENTS=out.A 1
INPUT_FILES=out.A
STDOUT_FILE=out.C

$ cat D.jt
EXECUTABLE=sum
ARGUMENTS=out.B out.C
INPUT_FILES=out.B, out.C
STDOUT_FILE=out.D

  • Submit job A:

$ gwsubmit -v -t A.jt
JOB ID: 5

  • Submit jobs B and C which depend of A, whose <JOB_ID> is 5:

$ gwsubmit -v -t B.jt -d "5"
JOB ID: 6

$ gwsubmit -v -t C.jt -d "5"
JOB ID: 7

  • Finally, submit job D which depend of B and C, whose IDs are 6 and 7:

$ gwsubmit -t D.jt -d "6 7"

  • Check the the partial results out.A, out.B and out.C, and the final result in out.D.

  1. DRMAA API

  • Users can submit jobs to GridWay, and control and monitor the execution of jobs through DRMAA. GridWay supports DRMAA v1, and  GridWay 5.12 comes with technology preview of DRMAA v2.

  • Create a DRMAA application, such as this one (ls_drmaa.c) that, for the sake of simplicity, does not include error checking:

#include <stdio.h>
#include <string.h>
#include <drmaa.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    drmaa_job_template_t   *jt;
    drmaa_attr_values_t    *rusage;
    char                   error[DRMAA_ERROR_STRING_BUFFER];
    char                   job_id[DRMAA_JOBNAME_BUFFER];
    char                   job_id_out[DRMAA_JOBNAME_BUFFER];
    char                   attr_value[DRMAA_ATTR_BUFFER];
    const char             *args[3] = {"-l", "-a", NULL};
    int                    stat;

    /* ---- Init DRMAA ---- */
    drmaa_init (NULL, error, DRMAA_ERROR_STRING_BUFFER-1);

   
/* ---- Create the job template ---- */
    drmaa_allocate_job_template(&jt, error, DRMAA_ERROR_STRING_BUFFER);
    drmaa_set_attribute(jt, DRMAA_JOB_NAME, "ht2", error, DRMAA_ERROR_STRING_BUFFER);
    drmaa_set_attribute(jt, DRMAA_REMOTE_COMMAND, "/bin/ls", error, DRMAA_ERROR_STRING_BUFFER);
    drmaa_set_vector_attribute(jt, DRMAA_V_ARGV, args, error, DRMAA_ERROR_STRING_BUFFER);
    drmaa_set_attribute(jt, DRMAA_OUTPUT_PATH, "stdout."DRMAA_GW_JOB_ID, error, DRMAA_ERROR_STRING_BUFFER);
    drmaa_set_attribute(jt, DRMAA_ERROR_PATH, "stderr."DRMAA_GW_JOB_ID, error, DRMAA_ERROR_STRING_BUFFER);


   
/* ---- Run a single job ---- */
    drmaa_run_job(job_id, DRMAA_JOBNAME_BUFFER-1, jt, error, DRMAA_ERROR_STRING_BUFFER-1);
    fprintf(stderr,"Job submitted ID: %s\n",job_id);

    /* ---- Wait until job execution ends ---- */
    drmaa_wait(job_id, job_id_out, DRMAA_JOBNAME_BUFFER-1, &stat, DRMAA_TIMEOUT_WAIT_FOREVER, &rusage, error, DRMAA_ERROR_STRING_BUFFER-1);
    drmaa_wexitstatus(&stat, stat, error, DRMAA_ERROR_STRING_BUFFER);
    fprintf(stderr,"Job finished with exit code %i, usage: %s\n", stat, job_id);

    /* ---- Get values from DRMAA string vector (start time, end time, etc.) ---- */
    while (drmaa_get_next_attr_value(rusage, attr_value, DRMAA_ATTR_BUFFER-1) != DRMAA_ERRNO_NO_MORE_ELEMENTS)
        fprintf(stderr,"\t%s\n", attr_value);

    /* ---- Destroy objects ---- */
    drmaa_release_attr_values(rusage);
    drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER-1);
    
    /* ---- Finalize DRMAA ---- */
    drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER-1);

    return 0;
}


  • Compile the code using DRMAA library v1:

$ gcc -L $GW_LOCATION/lib -I $GW_LOCATION/include -o ls_drmaa ls_drmaa.c -ldrmaa

  • Run the executable:

$ ./ls_drmaa
 Job successfully submitted ID: 6
 Job finished with exit code 0, usage: 6
     start_time=12:53:54
     exit_time=12:54:24
     cpu_time=00:00:29
     xfr_time=00:00:00

  • Check the results in stdout.${JOB_ID}, as usual.

  • DRMAA V2 is supported by GridWay since version 5.12! Here is an example of DRMAA2 program (ls_drmaa2.c):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "drmaa2.h"
#include <unistd.h>

int main(int argc, char *argv[])
{
    char          cwd[80]; 
    drmaa2_string jid;
    drmaa2_jstate jstate;
    drmaa2_string substate = (char*) malloc(50);
    drmaa2_string statestr;

    printf("==== Create a job session with given session name.\n");
    drmaa2_jsession js = drmaa2_create_jsession("mysession", NULL);

    printf("==== Creating the job template.\n");
    drmaa2_jtemplate jt = drmaa2_jtemplate_create();
    jt->jobName = strdup("ht");
    jt->remoteCommand = strdup("/bin/ls");
    getcwd(cwd, DRMAA2_ATTR_BUFFER);
    jt->workingDirectory = strdup(cwd);
  
    jt->args=drmaa2_list_create(DRMAA2_STRINGLIST,DRMAA2_UNSET_CALLBACK);
    drmaa2_list_add(jt->args,"-l");
    drmaa2_list_add(jt->args,"-a");
    drmaa2_list_add(jt->args,"/tmp");

    jt->outputPath=strdup("stdout."DRMAA2_GW_JOB_ID);
    jt->errorPath =strdup("stderr."DRMAA2_GW_JOB_ID);

    printf("==== Submitting the job.\n");
    drmaa2_j job = drmaa2_jsession_run_job(js, jt); 
    jid = drmaa2_j_get_id(job);

    drmaa2_jinfo jinfo = (drmaa2_jinfo) malloc(sizeof(drmaa2_jinfo_s));
    jstate = drmaa2_j_get_state(job,&substate);
    statestr = drmaa2_gw_strstatus(jstate);
    printf("     Job %s released.\n", jid);
    printf("     Job DRMAA2 state is: %s\n", statestr);
    printf("     Job Gridway substate is: %s\n",substate);
    printf("     Wait for job %s to finish.\n", jid);

    drmaa2_j_wait_terminated(job, DRMAA2_INFINITE_TIME);
    jinfo = drmaa2_j_get_info(job);
    printf("     Info about the job %s\n", jid);
    printf("\tjob->jobId=%s\n", jinfo->jobId);
    printf("\tjob->exitStatus=%d\n", jinfo->exitStatus);
    printf("\tjob->queueName=%s\n", jinfo->queueName);
    printf("\tjob->wallclockTime=%lld\n", (long long)jinfo->wallclockTime);
    printf("\tjob->cpuTime=%lld\n", jinfo->cpuTime);
    printf("\tjob->submissionTime=%lld\n", (long long)jinfo->submissionTime);
    printf("\tjob->dispatchTime=%lld\n", (long long)jinfo->dispatchTime);
    printf("\tjob->finishTime=%lld\n", (long long)jinfo->finishTime);

    drmaa2_jinfo_free(&jinfo);
    printf("==== Destroying job template and job session.\n");
    drmaa2_jtemplate_free(&jt);
    drmaa2_destroy_jsession("mysession");
    drmaa2_jsession_free(&js);
    printf("==== Exiting now.\n");
    return 0;
}

  • Compile the code using DRMAA library v2:

$ gcc -L $GW_LOCATION/lib -I $GW_LOCATION/include -o ls_drmaa2 ls_drmaa2.c -ldrmaa2 

  • Run the executable:

$ ./ls_drmaa2
==== Create a job session with given session name.
==== Creating the job template.
==== Submitting the job.   
==== Your job has been submitted with id: 0
     Job 0 released.    
     Job DRMAA2 state is: DRMAA2_QUEUED
     Job Gridway substate is: GW_JOB_STATE_PENDING
     Wait for job 0 to finish.
     Info about the job 0.
        job->jobId=0
        job->exitStatus=0
        job->queueName=default
        job->wallclockTime=16
        job->cpuTime=2
        job->submissiontime=1353581783
        job->dispatchTime=1353581783
        job->finishTime=1353581799
==== Destroying job template and job session.
==== Exiting now.

  • Check the results in stdout.${JOB_ID}, as usual.

  1. BES Interface

  • Users can submit jobs to GridWay, and control and monitor the execution of jobs through a BES interface, which is based on GridSAM.

  • First, delegate your credentials to a MyProxy server.
  • Create a simple JSDL (called ls.jsdl), such as:

<jsdl:JobDefinition xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl"
                    xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <jsdl:JobDescription>
  <jsdl:JobIdentification>
   <jsdl:JobName>ls</jsdl:JobName>
  </jsdl:JobIdentification>
  <jsdl:Application>
   <jsdl-posix:POSIXApplication>
    <jsdl-posix:Executable>/bin/ls</jsdl-posix:Executable>
    <jsdl-posix:Output>stdout</jsdl-posix:Output>
    <jsdl-posix:Error>stderr</jsdl-posix:Error>
   </jsdl-posix:POSIXApplication>
  </jsdl:Application>
  <jsdl:DataStaging>
   <jsdl:FileName>stdout</jsdl:FileName>
   <jsdl:CreationFlag>overwrite</jsdl:CreationFlag>
   <jsdl:Target>
    <jsdl:URI>gsiftp://gridway.fdi.ucm.es/<path_to_your_home>/output.txt</jsdl:URI>
   </jsdl:Target>
 
</jsdl:DataStaging>
  <jsdl:DataStaging>
   <jsdl:FileName>stderr</jsdl:FileName>
   <jsdl:CreationFlag>overwrite</jsdl:CreationFlag>
   <jsdl:Target>
    <jsdl:URI>gsiftp://gridway.fdi.ucm.es/<path_to_your_home>/error.txt</jsdl:URI>

   </jsdl:Target>
 
</jsdl:DataStaging>
 </jsdl:JobDescription>

 <MyProxy xmlns="urn:gridsam:myproxy">
  <ProxyServer>gridway.fdi.ucm.es</ProxyServer>
  <ProxyServerPort>7512</ProxyServerPort>
  <ProxyServerUserName>your_username</ProxyServerUserName>
  <ProxyServerPassPhrase>your_pass</ProxyServerPassPhrase>
  <ProxyServerLifetime>10000</ProxyServerLifetime>

 </MyProxy>

</jsdl:JobDefinition>


  • Users just need a BES client, such as the one provided by GridSAM.
  • Submit a single job:

$ ./gridsam.sh BESCreateActivity -s "https://gridway.fdi.ucm.es:8443/gridsam/services/bes" -j ls.jsdl >
job_id

$ cat job_id
  <?xml version="1.0" encoding="UTF-8"?>
  <EndpointReference xmlns="http://www.w3.org/2005/08/addressing">
  ...<ID>urn:gridsam:13e0999c3848481f0138484838b80001</ID>...
  </EndpointReference>

  • Follow the evolution of the job:

$ watch -n 5 ./gridsam.sh BESGetActivityStatuses -s "https://gridway.fdi.ucm.es:8443/gridsam/services/bes" -file job_id
  ...
  <?xml version="1.0" encoding="UTF-8"?>
   <GetActivityStatusesResponse xmlns="http://schemas.ggf.org/bes/2006/08/bes-factory">
   ...<ActivityStatus state="Running"/>...</GetActivityStatusesResponse>

  • Once execution is done… it is time to check the results.

  1. Monitoring Web Interface

  • Jobs and resources can also be monitored through GridWay Web interface.

  • Monitoring information about the IGE GridWay instance is available at http://gridway.fdi.ucm.es:8080/gwmap.

  • Information about hosts can also be shown clicking on “Hosts” tab from Web interface.

  • Information about jobs can also be shown clicking on “Jobs” tab from Web interface.

  • Web interface also shows information about job histories and registered users clicking on “History” and “Users” tabs respectively.

  1. Live demos

  • Submitting a MPI job through GridWay CLI:

GridWay on IGE CLI demo


  • Monitoring resources and jobs through GridWay Web Interface:

GridWay on IGE web demo