An Apache NiFi processor to encode and decode data using Google Protocol Buffers schemas.
- Encode/decode Protocol Buffer messages from/to JSON format
- Read a compiled schema file (
.desc
) from disk - Use directly a raw
.proto
schema file, from disk or directly embedded in a property - Can handle embedded
.proto
files at processor level (as a processor property) or directly in a flowfile property - Support dependencies in proto files (see below)
- Provide a ready-to-use Docker image of Apache NiFi with the NiFi Protobuf Processor
A pre-packaged version of NiFi with the processor installed is available on Docker Hub. To run it just type:
docker run -p 8080:8080 whiver/nifi-protobuf:latest
Note that the -p
option publishes the port 8080 used by NiFi to the host, so that you can access the UI directly via
http://localhost:8080/nifi
.
Grab the latest release directly from the releases page and copy the .nar
file in the Apache NiFi lib
folder.
Clone this project and build the processor nar
file using Maven:
mvn compile
mvn nifi-nar:nar
Then simply copy the generated nar
file into the Apache NiFi lib
folder.
The project also includes a Dockerfile to easily build a Docker image of the project. In fact you just need to run:
mvn package
and everything should be fine ! :)
See the installation section to learn how to integrate this processor in Apache NiFi. This projects add 2 different new processors in NiFi:
ProtobufDecoder
, which decodes a Protobuf-encoded payload to different kind of structured formats ;ProtobufEncoder
, which encodes a payload in a structured format using a Protobuf schema.
In both processors, you have to specify a schema file to use for data encoding/decoding. You can do so either
processor-wide (meaning that every incoming flowfiles will be processed using the same schema) or per-flowfile. In both
cases, it is done by writing the absolute schema file path in the protobuf.schemaPath
property of the flowfile or
processor. Note that if the property is set in the flowfile, it will override the one from the processor.
I strongly recommend you to use a compiled .desc
file whenever possible, for a performance reason. This file can be
obtained by compiling the .proto
file with Google's protoc
.
However, if you cannot compile your .proto
file, you can set it directly as a schema file and set the
protobuf.compileSchema
property of the processor to tell it to compile the schema dynamically.
Important: The processor allows you to import only one schema file, so you need to package all you dependencies into one file. To do so, compile your main
.proto
file using the--include_imports
option of theprotoc
compiler. If you are using a raw.proto
file, you need to bundle all imports inside the file.
Note: if you don't have a compiled
.desc
file yet, you should take a look atprotoc
, the Protobuf compiler from Google.
For now, the only structured format the processors can process is the JSON. In the future, there should be more formats available (XML and flowfile properties are expected).
By design, this processor cannot use precompiled code to handle messages (otherwise you would have already generated them)
and wouldn't be here. So this processor is using the runtime part of the Protobuf library, which dynamically parses the files,
given a compiled schema (.desc
).
For convenience, the processor also allows you to provide a raw .proto
file but, to be used, it must be compiled, so this
is what the processor does before anything else. To avoid multiple compilation when not needed, the result file is cached,
and if you specified the schema in the processor configuration (and not in the flowfile properties), it will be directly
reused for each operation, and it will even avoid reading the schema from the disk.
So, if you can, specify the schema at the processor level to get the best performances.
This project is Free as in Freedom, so feel free to contribute by posting bug report or pull requests!
This project is licensed under the MIT license. The terms of this license can be found in the LICENSE file.