Skip to content

Transformation System Tutorial

Daniela Bauer edited this page Jul 10, 2018 · 60 revisions

This tutorial illustrates how the Transformation System can be used to execute a workflow composed of several steps, in a fully data-driven manner. The workflow used for this tutorial (see section 1.) is relatively simple, but more complex workflows can be handled, applying the same main concepts. This tutorial is a practical user-oriented guide, while for a more detailed description of the Transformation System architecture, service installation and other usage examples, we suggest to read the following documentation:

http://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/Transformation/index.html

In section 2., you will find all practical instructions needed for the tutorial.

1. Workflow description

The workflow to be executed in this tutorial is based on the mandelbrot application to create bitmap images of a Mandelbrot set:

The workflow is composed of several steps and the final result is a nice black and white bitmap image of size (7680, 4200) pixels representing a Mandelbrot set. Each step of the workflow is realized by a transformation, which consists of several jobs. Here a brief description of the different steps.

1. The first transformation (image slices production) creates several jobs, each one producing an image slice of 200 lines. In order to produce the whole image (4200 lines), 21 jobs are needed. These jobs execute the mandelbrot application with all identical parameters except the line number parameter -L, which varies from 1 to 21:

    ./mandelbrot.py -P 0.0005 -M 1000 -L 00i -N 200

where:

  • P is the "precision"
  • M is the number of iteration
  • L is the first line of the image
  • N is the number of lines to compute in the job

Each job produces a data_00i*200.txt ASCII file which is saved on the File Catalog.

2. The second transformation (image slices merging) merges the results of the first transformation grouping files by 7 and producing 3 merged files:

    ./merge_data.py

Input files are automatically downloaded at the beginning of each job, and the application merges all files available in the current directory. Each job produces a merged_data_*.txt file.

3. The third transformation (image building) produces the final bitmap image starting from the merged files produced in the previous step:

    ./build_merged_img.py

Input files are automatically downloaded at the beginning of each job, and the application builds an image from all files available in the current directory. The output image is in bitmap format.

4. Finally the fourth transformation (removal) removes the intermediate files produced by the transformations in steps 1. and 2.

2. Practical informations

For this tutorial we will use a Testbed DIRAC instance. In order to access to this instance, you should install on your client and on your browser the user certificate that you have received at the beginning of the tutorial. Then, follow the instructions below. The link to the web portal for Job and Transformation monitoring is:

https://cctbdirac01.in2p3.fr/DIRAC/

2.1 Client installation

  • Retrieve the tutorial material from:

      git clone https://github.com/arrabito/DIRAC_TS_Tutorial
    
  • Convert the .p12 user certificate that you have received using the dirac-cert-convert.sh script

  • Install the dirac client:

      wget --no-check-certificate https://github.com/DIRACGrid/DIRAC/raw/master/Core/scripts/dirac-install.py
      python dirac-install.py -r v6r19p20 -v --no-lcg-bundle -e COMDIRAC
      source bashrc # (or source cshrc)
      dirac-proxy-init -x
      dirac-configure -S Dirac-Test -C dips://cctbdirac01.in2p3.fr:9135/Configuration/Server
      dirac-proxy-init
    
  • In order to use COMDIRAC, you should customize the dcommands.conf file and copy it in your $HOME/.dirac directory (see COMDIRAC)

2.2 Transformations creation and monitoring

Before creating the actual transformations you can submit a simple mandelbrot job and inspect the result:

python submit_wms.py 1

This job is similar to those that will be created by the first transformation. Now you can start creating the transformations to execute the whole workflow. Note that thanks to the data-driven mechanism, you can create the first 3 transformations all together, without waiting for the previous one to be completed. Only the last one, which removes all intermediate produced data, must be launched of course after the third transformation is completed (all tasks 'Done'). Note that you should slightly customize the submission scripts submit_ts_step%i.py, setting the owner variable to your 'dirac username', e.g.:

########################################
# Modify here with your username 
owner = 'user02'
########################################

This is simply necessary to distiniguish the data produced by the different participants during the tutorial. For the same reason, we have introduced a special "owner" meta-data to be used in the File Catalog queries associated to the transformations.

2.2.1 Image slices production

  • Edit submit_ts_step1.py, change the owner variable and look at the different sections (Job description, Transformation definition and submission). Observe the metadata characterising the output data:

    outputMetadata = json.dumps( {"application":"mandelbrot","image_format":"ascii", "image_width":7680, "image_height":200, "owner":owner} )
    
  • Submit the transformation:

    python submit_ts_step1.py
    
  • Go to the TransformationMonitor on the web portal: https://cctbdirac01.in2p3.fr/DIRAC/. You should see your transformation (and also those of the other participants). The transformation is created but there are no associated jobs yet. Click on the transformation and go to the Action/Extend on the context menu. Here you can choose of how many jobs your transformation will be composed of. So extend the transformation by 21. Observe the status changes of the different columuns of your transformation (refresh clicking on the Submit button). When tasks are in Submitted Status, you can also click on Show Jobs to display the individual jobs. Note, that since jobs are submitted with the Production Shifter identity, you should remove the 'Owner' selection from the JobMonitor to display the jobs.

2.2.2 Image slices merging

  • Edit submit_ts_step2.py and observe how input data are attached to the transformation:

    inputMetaquery = json.dumps( {"application":"mandelbrot","image_format":"ascii", "image_width":7680, "image_height":200, "owner":owner} )
    t.setFileMask(inputMetaquery)
    

    and which metadata are attached to the output data.

  • Submit the transformation:

    python submit_ts_step2.py 
    

2.2.3 Image building

  • Edit submit_ts_step3.py and observe how input data are attached to the transformation:

    inputMetaquery = json.dumps( {"application":"mandelbrot","image_format":"ascii", "image_width":7680, "image_height":1400, "owner":owner} )
    t.setFileMask(inputMetaquery) 
    

    and which metadata are attached to the output data.

  • Submit the transformation:

    python submit_ts_step3.py 
    

2.2.4 Monitoring and result retrieval

  • Monitor the progress of the 3 transformations from the TransformationMonitor (refresh clicking the Submit button). You may need to increase the number of transformation shown per page (25 by default) and/or reorder the table by id, so that newer transformation with higher ids are shown at the top.

  • Browse the File Catalog to look at your produced files (using COMDIRAC or directly the File Catalog client):

    dls mandelbrot/images/
    
  • Observe the metadata associated to your produced files:

    dmeta ls mandelbrot/images/raw
    !image_width : 7680
    !image_height : 200
    !application : mandelbrot
    !owner : user02
    !image_format : ascii
    
  • When the Image building transformation (step 3) is completed you can retrieve the final image:

    dget mandelbrot/images/final/merged_image.bmp
    

2.2.5 Remove intermediate files

  • Edit submit_ts_step4.py and observe the Type and the Body of the transformation. In this case, no jobs are submitted to the WMS, but rather removal requests are submitted to the Request Management System.

  • Submit the transformation:

    python submit_ts_step4.py 
    
  • When this transformation is completed (all tasks 'Done'), check the File Catalog again.

3. Setting up your own server to run this tutorial on

This is shown on this page.

Clone this wiki locally