High-performance filtering of to-be-logged XML. Reads, filters, formats and writes XML in a single step - drastically increasing throughput. Typical use-cases
- Filter sensitive values from logs
- technical details like passwords and so on
- sensitive personal information, for GDPR compliance and such
- Remove big elements (i..e base64 encoded binary data) from logs
- low or no informational value
- consuming unnecessary log accumulation tool resources
In a typical bare-bones system, this could translate to something like 5-10% overall performance improvement.
Features:
- High-performance filtering of XML
- Max text and/or CDATA node sizes
- Anonymize of element and/or attribute contents
- Removal of subtrees
- Indenting (pretty-printing) for use in testing
- Custom processors
- SOAP header filter
- Support for popular frameworks
- CXF
- JAX-RS
- Examples
- Spring
- CXF
The processors have all been validated to handle valid documents using the latest W3C XML test suite.
Bugs, feature suggestions and help requests can be filed with the issue-tracker.
The project is built with Maven and is available on the central Maven repository.
<dependency>
<groupId>com.github.skjolber.xml-log-filter</groupId>
<artifactId>xml-log-filter-core</artifactId>
<version>1.0.8</version>
</dependency>
See individual sub-modules for detailed usage instructions and examples.
Configuring
factory.setMaxTextNodeLength(1024);
factory.setMaxCDATANodeLength(1024);
yields output like (at a smaller max length)
<parent>
<child><![CDATA[QUJDREVGR0hJSktMTU5PUFFSU1...[TRUNCATED BY 46]]]></child>
</parent>
for CDATA and correspondingly for text nodes.
Configuring
factory.setAnonymizeFilters(new String[]{"/parent/child"}); // multiple paths supported
results in
<parent>
<child>[*****]</child>
</parent>
See below for supported XPath syntax.
Configuring
factory.setPruneFilters(new String[]{"/parent/child"}); // multiple paths supported
results in
<parent>
<child><!-- [SUBTREE REMOVED] --></child>
</parent>
See below for supported XPath syntax.
A minor subset of the XPath syntax is supported. However multiple expressions can be used at once. Namespace prefixes in the XML are simply ignored, only local names at used to determine a match. Expressions are case-sensitive.
Supported syntax:
/my/xml/element
/my/xml/@attribute
with support for wildcards;
/my/xml/*
/my/xml/@*
or a simple any-level element search
//myElement
which cannot target attributes.
Supported syntax:
/my/xml/element
with support for wildcards;
/my/xml/*
or a simple any-level element search
//myElement
The processors within this project are much faster than stock processors. This is expected as parser/serializer features have been traded for performance.
The project has DOM- and StAX-based equivalents for feature and performance comparison. Depending on the implementation, benchmarks show throughput is approximately 5x-10x compared to stock processors.
Memory use will be approximately two times the XML string size.
See this visualization and the JMH module for running detailed benchmarks.
The project is intended as a complimentary tool for use alongside XML frameworks, such as SOAP- or XML-based REST stacks. Its primary use-case is processing to-be logged XML. The project relies on the fact that such frameworks have very good error handling, like schema validation, to apply a simplified view of the XML syntax, basically handling only the happy-case of a well-formed document. The frameworks themselves detect invalid documents and handle them as raw content.
See projects
- xml-formatter for additional indenting/formatting of inner XML.
- json-log-filter for filtering of JSON.
- 1.0.8: Maintenance release
- 1.0.5: Support for CXF 3.4, with good help from TomEvers. CXF 3.3 users: Use version 1.0.4.
- 1.0.4: Maintenance release
- 1.0.3: Maintenance release
- 1.0.2: Initial Java 11 (modules) support.