This software provides a string token parser, useful in cases where a fixed a priori template string is to be resolved at run time by some process.
pftag
is a simple app that is both a stand alone client as well as a
python module. Its main purpose is to parse template strings. A
template string is one where sub-parts of the string are tokenized by
a token marker. These tokens are resolved at execution time.
From a taxonomy perspective, pftag
is an example of a string-based
(somewhat opinionated) SGMLish parser.
For on the metal installations, pip
it:
pip install pftag
docker pull fnndsc/pftag
To use pftag
in script mode simply call the script with appropriate
CLI arguments
pftag --tag "run-%timestamp-on-%platform-%arch.log"
run-2023-03-10T13:41:58.921660-05:00-on-Linux-64bit-ELF.log
There are several ways to use pftag
in python module mode. Perhaps
the simplest is just to declare an object and instantiate with an empty
dictionary, and then call the object with the tag
to process.
If additional values need to be set in the declaration, use an
appropriate dictionary. The dictionary keys are identical to the
script CLI keys (sans the leading --
):
from pftag import pftag
str_tag:str = r'run-%timestamp-on-%platform-%arch.log'
tagger:pftag.Pftag = pftag.Pftag({})
d_tag:dict = tagger(str_tag)
# The result is in the
print(d_tag['results'])
The set of CLI arguments can also be passed in a dictionary of
{
"CLIkey1": "value1",
"CLIkey2": "value2",
}
--tag <tagString>
The tag string to process.
[--lookupDictAdd <listOfDictionaryString>]
A string list of additional named lookup dictionary tags and values to
add.
[--tagMarker <mark>]
The marker string that identifies a tag (default "%")
[--funcMarker <mark>]
The marker string that pre- and post marks a function (default "_").
[--funcArgMarker <mark>]
The marker string between function arguments and also between arg list
and function (default "|").
[--funcSep <mark>]
The marker string separating successive function/argument constructs
(default ",").
[--inputdir <inputdir>]
An optional input directory specifier. Reserverd for future use.
[--outputdir <outputdir>]
An optional output directory specifier. Reserved for future use.
[--man]
If specified, show this help page and quit.
[--verbosity <level>]
Set the verbosity level. The app is currently chatty at level 0 and level 1
provides even more information.
[--debug]
If specified, toggle internal debugging. This will break at any breakpoints
specified with 'Env.set_trace()'
[--debugTermsize <253,62>]
Debugging is via telnet session. This specifies the <cols>,<rows> size of
the terminal.
[--debugHost <0.0.0.0>]
Debugging is via telnet session. This specifies the host to which to connect.
[--debugPort <7900>]
Debugging is via telnet session. This specifies the port on which the telnet
session is listening.
Additional tag lookup structures can be added with either the CLI or directly using the python API, for example:
# CLI pftag --lookupDictAdd '[{"credentials": {"user": "Jack Johnson", "password": "123456"}}]' \ --tag "At time %timestamp, user '%user' has password '%password'."
or equivalently in python:
from pftag import pftag
# Declare the tag processor
tagger:pftag.Pftag = pftag.Pftag({})
# Add the "credentials" lookup
status:bool = tagger.lookupDict_add(
[
{"credentials":
{
"user": "Jack Johnson",
"password": "1234567"
}
}
]
)
str_tag:str = r"At time %timestamp, user '%user' has password '%password'."
# and... run it!
d_tag:dict = tagger.run(str_tag)
if d_tag['status']: print(d_tag['result'])
both should result in something similar to:
At time 2023-04-28T11:36:19.448559-04:00, user 'Jack Johnson' has password '1234567'.
For kicks, let’s hash the password to 10 chars:
# CLI pftag --lookupDictAdd '[{"credentials": {"user": "Jack Johnson", "password": "123456"}}]' \ --tag "At time %timestamp, user '%user' has password '%password' with password hash %password_md5|10_."
resulting in
At time 2023-04-28T11:57:45.217532-04:00, user 'Jack Johnson' has password '123456' with password hash e10adc3949.
The following tags are internal/reserved:
%literal : simply replace the tag with the word 'literal'. This tag is only useful in conjunction with the 'echo' function and together they provide a means to inject arbitary text typically for md5 hashing. %name : return the os.name %platform : return the platform.system() %release : return the platform.release() %machine : return the platform.machine() %arch : return the '%s' % platform.architecture() %timestamp : return the current timestamp
The lookup from any tagged string can be further processed by the following functions
md5|<chars> : perform an md5hash on the upstream, limit result to <chars> characters eg: "%timestamp_md5|4_" replace the %timestamp in the input string with an md5 hash of 4 chars of the actual timestamp. chrplc|<t>|<n> : replace <t> with <n> in the upstream input. eg: "%timestamp_chrplc|:|-_" replace the %timestamp in the input string with the actual timestamp where all ":" are replaced with "-". strmsk|<mask> : for each '*' in mask pattern use ups tream char otherwise replace with <mask> char . eg: "%platform_strmsk|l****_" replace the %platform in the input string with a string that starts with an 'l' and don't change the subsequent 4 characters. If the %platform has more than 4 characters, only return the 5 chars as masked. dcmname|<s>|<tail> : replace any upstream %VAR with a DICOM formatted name. If <s> is passed, the seed the faker module with <s> (any string) -- this guarantees that calls with that same <s> result in the same name. If <tail> is passed, then append <tail> to the name. eg: %NAME_dcmname_ may produce "BROOKS^JOHN". Each call will have a different name. However, %NAME_dcmname|foobar_ will always generate "SCHWARTZ^THOMAS". While %NAME_dcmname|foobar|^ANON will generate "SCHWARTZ^THOMAS^ANON" echo|<something> : Best used with the %literal tag for legibility, will replace the tag with <something>. Be careful of commas in the <something>. If they are to be preserved you will need to set --funcSep to something other than a comma. %literal_echo|why-are-we-here?_ will replace the %literal with "why-are-we-here". This is most useful when literal data is to obscured in a template. For instance: %literal_echo|Subject12345,md5|5_ where say "Subject12345" is privileged information but important to add to the string. In this case, we can add and then hash that literal string. In future, if we know all the privileged strings, we can easily hash and then and lookup in any `pftag` generated strings to resolve which hashes belong to which subjects.
In addition to performing a lookup on a template string token, this package can also process the lookup value in various ways. These process functions follow a Reverse Polish Notation (RPN) schema of
tag func1(args1) func2(args2) func3(args3) ...
which reading from left to right is taken as a heap from top to bottom:
tag func1(args1) func2(args2) func3(args3)
where first the <tag>
is looked up, then this lookup is processed by
<func1>
. The result is then processed by <func2>
, and so on and
so forth, each functional optionally with a set a arguments. This RPN
approach also mirrors the standard UNIX piping schema.
A function (or function list) that is to be applied to a <tag>
is
connected to the tag with a <funcMarker>
string, usually ’_’. The
final function should end with the same <funcMarker>
, so
%tag_func1,func2,...,funcN_
will apply the function list in order to the tag value lookup called “tag”; each successive evaluation consuming the result of its predecessor as input.
Some functions can accept arguments. Arguments are passed to a function
with a <funcArgMarker>
string, typically |
, that also separates
arguments:
%tag_func|a1|a2|a3_
will pass a1
, a2
, and a3
as parameters to “func”.
Finally, several functions can be chained within the _
…_
by
separating the <func>|<argList>
constructs with commas, so
pedantically
%tag_func1|a1|a2|a3,func2|b1|b2|b3_
All these special characters (tag marker, function pre- and post, arg separation, function separation) can be overriden. For instance, with a selection of
--tagMarker "@" --funcMarker "[" --funcArgMarker "," --funcSep "|"
strings can be specified as
@tag[func,a1,a2,a3|func2,b1,b2,b3[
where preference/legibilty is left to the user.
To debug, the simplest mechanism is to trigger the internal remote
telnet session with the --debug
CLI. Then, in the code, simply add
Env.set_trace()
calls where appropriate. These can remain in the
codebase (i.e. you don’t need to delete/comment them out) since they are
only live when a --debug
flag is passed.
Run unit tests using pytest
.
# In repo root dir:
pytest
-30-