Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set pipeline foreach processor ? #1809

Closed
ThibautSF opened this issue Oct 16, 2020 · 1 comment
Closed

How to set pipeline foreach processor ? #1809

ThibautSF opened this issue Oct 16, 2020 · 1 comment

Comments

@ThibautSF
Copy link
Contributor

ThibautSF commented Oct 16, 2020

Hi,

I was able by reading the documentation (and get some other information from other issues) to understand how to implement a majority of pipeline processors.

ie create a pipeline which call ingest-attachment and thus delete the binary content

//Add pipeline for ingest-attachement
$pipeline = new Pipeline($client);
$pipeline->setId('fileattachment')->setDescription('Extract attachment information');

//Create attachment processor pipeline -> set the field where are file binaries and infinite indexed chars
$attachproc = new Attachment('contentBinary');
$attachproc->setIndexedChars(-1);

//Create remove processor pipeline -> remove the file binaries after contentBinary is indexed (reduce weight)
$removeprc = new Remove('contentBinary');

//Add processors in the correct order to the pipeline
$pipeline->addProcessor($attachproc);
$pipeline->addProcessor($removeprc);
//Create the pipeline
$response = $pipeline->create();

But now I would like to use the "foreach" processor (which I couldn't find like Attachment or Remove in Elastica\Processor)
Like in https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment-with-arrays.html

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information from arrays",
  "processors" : [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "target_field": "_ingest._value.attachment",
            "field": "_ingest._value.data"
          }
        }
      }
    }
  ]
}

Maybe with setRawProcessors(array $processors)?

Edit: figures out how to use setRawProcessors method ->

$pipeline->setRawProcessors([
    'processors' => [
        'foreach' => [
            'field' => 'attachments',
            'processor' => [
                'attachment' => [
                    'target_field' => '_ingest._value.attachment',
                    'field' => '_ingest._value.data',
                ],
            ],
        ],
    ],
]);
@ruflin
Copy link
Owner

ruflin commented Oct 21, 2020

@ThibautSF Thanks for the update to the issue. It seems you figured it out and there is #1812 as the follow up issue. Should we close this one or keep it open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants