Skip to content

The GridFTP benchmark, test and transfer script.

License

Notifications You must be signed in to change notification settings

fscheiner/tgftp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tgftp - The GridFTP benchmark, test and transfer script

Contents

  1. Synopsis
  2. Features
    2.1 Logfiles
    2.2 Batchmode testing
    2.3 Pre/post commands
    2.4 Auto-tuning
    2.5 Connection test
  3. tgftp usage examples

(1) Synopsis

The tgftp tool is a wrapper script for globus-url-copy (guc) to ease testing and documentation of GridFTP performance for specific connections. The generated guc command, its output and the reached performance are logged to a file named like the following:

yyyymmdd_H:Mh_$$_testgftp.sh.log

$$ expands to the PID of the tgftp command.

Additionally the output of the guc command is put to screen.

NOTICE: The performance (transfer rate) is printed out to screen. Because this tool includes the time needed for establishing the connection (total time needed for the transfer), the transfer rates may vary from the rates that the guc command prints out - especially for short transfers. Tgftp either uses the given length/size of the transfer or determines it itself, converts it to MByte and divides this by the time the generated guc command takes to complete. tgftp uses integer devision, hence the result may be inaccurate by +/- 1 MByte. But this precision should be enough for 1 GBit/s or 10 GBit/s networks.

Detailed help is provided by the help output and the manpage of the script.


(2) Features

(2.1) Logfiles

Tgftp logs the following attributes of a test:

  • the created guc command
  • the output of the guc command
  • the start/end timestamp for a test
  • the transfer rate for a test (if successful)
  • the error message (if not successful)

Additionally a comment can be inserted for each test. The logfiles basically look like XML files, as they use XML like tags that enclose the attribute values. An example logfile output is shown below in (3).

The tgftp distribution also includes a small helper script that can extract attribute values from tgftp logfiles for later evaluation and can be used for automatic evaluations. It's called tgftp_log and detailed help is provided by the help output of the script and its manpage.

(2.2) Batchmode testing

Tgftp supports batch mode testing, meaning predefined test cases are tested one after another. A batchfile is a table containing multiple lines with tgftp parameters. The first line has to contain a shebang of the following form:

#%tgftp%v<VERSION_NUMBER>

<VERSION_NUMBER> is the three digits (separated by two .s) number tgftp will print out if called with --version. This shebang has to match the version of tgftp that is used.

Each non empty line not starting with a # is evaluated as follows, from left to right (each field separated by a ;):

source Enter a valid source address for GridFTP.

target Enter a valid target address for GridFTP.

gsiftp-params Determine the guc parameters to be used for the test.

connection-test Enter "yes" to activate and "no" to deactivate. This overwrites all other parameters (except source and target with the default values for a connection test).

timeout Enter the time in seconds after tgftp kills the guc command.

log-filename Enter the filename for the logfile.

log-comment Enter a comment for the logfile. This can also be a command like cat PRE_COMMAND_OUTPUT.txt, which enables to include output of the pre-command in the logfile (for example network params of the target system or traceroute output, etc.). As the string is evaluated by the shell, text only comments - meaning no command should be called, have to be preceded by #!

pre-command Enter the filename of the command to be executed before the test (command must be executable and path must be included).

post-command Enter the filename of the command to be executed after the test (command must be executable and path must be included).

execs Determine how often the test should be executed.

EXAMPLE:

#%tgftp%v0.5.0
#  source;target;gsiftp-params;connection-test;timeout;log-filename;log-comment;pre-command;post-command;execs
#  GridFTP tests on GridFTP sites
#  GridFTP from Site-a to Site-b
gsiftp://gridftp-host.site-a/dev/zero;gsiftp://gridftp-host.site-b/dev/null;-vb -p 16 -tcp-bs 4M -len 4G;no;30;;"# Using the internet for the transfer: Site-a to Site-b";;;
[...]

Lines started with # are evaluated as comments, hence this can be used for infile documentation.

For detailed help on tgftp's batchmode use the --help-batchfile parameter. An exemplary batchfile is shipped with the tgftp distribution. When used in conjunction with --batchfile tgftp will parse each line and initiate one test per line by default. Each test gets its own logfile named like the following:

<BATCHFILE_NAME>_#<TEST>_<EXEC>.log

With <TEST> being the number of the test and <EXEC> being the number of the execution run. If the field execs is set to a value greater than 1, a test is executed multiple times.

(2.3) Pre/post commands

With the --pre-command/--post-command parameters one can provide a command or multiple commands in a script that will be executed before/after a test. In combination with the --log-comment parameter it's possible to include test specific information in a logfile. For example network parameters, host names etc.. This is done by specifying a command as log-comment that for example cats a file created by a pre-command. The mentioned script/command after the --pre-command/--post-command parameters has to be executable and also has to include the full path to the script/command - if not in $PATH. tgftp's help output provides details about the usage of these parameters.

(2.4) Auto-tuning

Based on the batchmode functionality tgftp provides a function to automatically test a specific connection and determine the best performing guc parameter configuration for this specific connection. In total tgftp will execute 370 tests with 37 different parameter configurations (10 tests each) and after finished, print out the best performing parameter configuration. During each test 4 GByte of binary zeroes are moved from the source's memory to the target's memory.

NOTICE: Due to the boundary conditions, like workload on GridFTP machines, network utilization and especially solar flares, repeated auto tuning tests might yield different parameter configurations. This is also because an auto tuning test lasts very long (depending on the performance up to ~3 hours!) and one cannot use the network exclusively. Additionally some parameters have a smaller effect on performance than others. If multiple parameter configurations yield very similar results (+/- 19 MB/s), the first one wins, as another configuration is only evaluated as better, if it yields at least 20 MB/s more performance. It's likely that parameter configurations that yield similar performance are not independent of each other, meaning the differing parameter has only a small effect on performance. Hence the resulting best performing parameter configuration might vary between these dependent configurations. In addition this function only tests memory to memory transfers, file to file transfers can look different but should basically benefit from the same guc parameters. The recommendation is to make some educated guesses and test these manually in batch mode. Depending on the experience of the tester this could yield similar performance but require less time.

(2.5) Connection test

tgftp can also initiate so-called connection tests. During these tests only one byte is transferred from source to target and all debug output for guc is enabled. This feature comes in handy in case of firewall and port range related problems with GridFTP transfers.


(3) tgftp usage examples

Help and usage output is provided by issuing:

$ tgftp [--help]

EXAMPLE(S):

#1 This example is a memory to memory (mem2mem) test on a local machine that will transfer 4 GBytes of zeroes from memory to memory via a local GridFTP service listening on port 2812.

$ tgftp -s file:///dev/zero -t gsiftp://localhost:2812/dev/null -- -len 4G

87 MB/s

Please see "20110112_08:42h_10051_testgftp.sh.log" for details.

The created logfile looks like the following:

<GSIFTP_TRANSFER_LOG_COMMENT>

</GSIFTP_TRANSFER_LOG_COMMENT>
<GSIFTP_TRANSFER_COMMAND>
globus-url-copy -len 4G file:///dev/zero gsiftp://localhost:2812/dev/null
</GSIFTP_TRANSFER_COMMAND>
<GSIFTP_TRANSFER_COMMAND_OUTPUT>

</GSIFTP_TRANSFER_COMMAND_OUTPUT>
<GSIFTP_TRANSFER_START>
2011-01-12_08:42:37
</GSIFTP_TRANSFER_START>
<GSIFTP_TRANSFER_END>
2011-01-12_08:43:24
</GSIFTP_TRANSFER_END>
<GSIFTP_TRANSFER_RATE>
87 MB/s
</GSIFTP_TRANSFER_RATE>

GSIFTP_TRANSFER_COMMAND_OUTPUT is empty, as no -vb parameter was specified and guc then only prints out errors. The same goes for GSIFTP_TRANSFER_LOG_COMMENT as there was no comment specified for the logfile with --log-comment. To later retrieve the transfer rate from a tgftp logfile, one can use the tgftp_log tool as follows:

$ tgftp_log --get-attribute GSIFTP_TRANSFER_RATE --file 20110112_08:42h_10051_testgftp.sh.log

87 MB/s

tgftp_log can extract data from each valid attribute. To extract the contents of a specific attribute use the corresponding tag (without <, > or /). The possible attributes are listed in the help output of tgftp_log.

#2 This example will determine the congestion protocol that is used locally (on a Linux system!) before starting the test and include the output of the pre-command in the comment of the logfile.

$ tgftp \
-s file:///dev/zero \
-t gsiftp://gridftp-host.domain:2811/dev/null \
--log-comment 'cat PRE_COMMAND_OUTPUT.txt' \
--pre-command 'sysctl -n net.ipv4.tcp_congestion_control > PRE_COMMAND_OUTPUT.txt' \
-- \
-len 1M

The created logfile looks like the following:

<GSIFTP_TRANSFER_LOG_COMMENT>
cubic
</GSIFTP_TRANSFER_LOG_COMMENT>
<GSIFTP_TRANSFER_COMMAND>
globus-url-copy -len 1M file:///dev/zero gsiftp://gridftp-host.domain:2811/dev/null
</GSIFTP_TRANSFER_COMMAND>
<GSIFTP_TRANSFER_COMMAND_OUTPUT>

</GSIFTP_TRANSFER_COMMAND_OUTPUT>
<GSIFTP_TRANSFER_START>
2011-01-12_09:33:28
</GSIFTP_TRANSFER_START>
<GSIFTP_TRANSFER_END>
2011-01-12_09:33:31
</GSIFTP_TRANSFER_END>
<GSIFTP_TRANSFER_RATE>
0 MB/s
</GSIFTP_TRANSFER_RATE>

This example shows, how so-called pre-commands can be used to create specific comments for the logfiles (this comes especially handy when used in tgftp's batchmode). The example's logfile also shows that the transfer of small amounts of data leads to unexact values for the transfer rate. In this example run the transfer rate could have been more than 0 MB/s and less than 1 MB/s.

#3 This example will initiate multiple tests (one after another) with the parameter values included in tgftp_tests.csv.

$ tgftp -f tgftp_tests.csv

Please follow (2.2) for details about batchmode and batchfiles.

About

The GridFTP benchmark, test and transfer script.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 91.1%
  • Roff 8.9%