Ability to define analyzer (and not tokenizers) in the parameter of the fulltext search. #57

philippe-levan · 2019-03-25T17:43:08Z

Elasticsearch is providing a lot of analyzers already ready for a lot of languages (italian, french,...) and we can easily define more analyzers in elasticsearch if we want to.

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/analysis-lang-analyzer.html

But in the parameters of the fulltextSearch, we can only change the tokenizer and not the analyzer.

An analyzer (called "analyzer" in the code) is defined in the file lib/Service/IndexMappingService.php, in the method generateGlobalMap. This analyzer is used to analyze the content and the combined fields.

I believe it would be better to let the user define the analyzer (and not the tokenizer) in the parameters, and then use this user provided analyzer name for the analyze of the documents contents.

Warning :

it introduces a BC Break for people that would use a custom tokenizer
I don't manage to make my code work on my development nextcloud... (so I can't to a real dev for the moment...)

So this PR is not a real PR. It is a kind of issue with a code example to make my issue clear...

Best regards,
Philippe

[master] add options to fully ignore external storage

r-fujikura · 2021-06-28T04:39:20Z

Hi, @philippe-levan @adsworth @rullzer @Gomez @SunboX.
Sorry for the sudden question. If you happen to have a free moment, I would like to ask for your help as I am having trouble with the NextCloud implementation.
I am currently developing a way to enable full text search of files in NextCloud using ElasticSearch and Fulltextsearch.
However, I'm facing one problem.
I want to use a custom analyzer (defined by myself) in Fulltextsearch, but when I set up the custom analyzer and generate the index, the index content is missing.
So I modified the source code as per the commit in this issue, but it didn't change the situation.
The search using kuromoji_tokenizer is running fine.
I would like your help on the cause of the lack of content when searching using the custom analyzer.
I really appreciate your great support.

Environment:.

NextCloud 20.0.9
ElasticSearch 7.4.2
Full text search 20.0.0
Full text search - Elasticsearch Platform 20.0.1
Full text search - Files 20.0.1

philippe-levan added 2 commits March 25, 2019 18:21

can use a user provided analyzer

dc6159c

fix call of parameter

da25c7c

R0Wi pushed a commit to R0Wi/fulltextsearch_elasticsearch that referenced this pull request Aug 8, 2020

Merge pull request nextcloud#57 from nextcloud/backport/56/master

2cbd93b

[master] add options to fully ignore external storage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to define analyzer (and not tokenizers) in the parameter of the fulltext search. #57

Ability to define analyzer (and not tokenizers) in the parameter of the fulltext search. #57

philippe-levan commented Mar 25, 2019

r-fujikura commented Jun 28, 2021

Ability to define analyzer (and not tokenizers) in the parameter of the fulltext search. #57

Are you sure you want to change the base?

Ability to define analyzer (and not tokenizers) in the parameter of the fulltext search. #57

Conversation

philippe-levan commented Mar 25, 2019

r-fujikura commented Jun 28, 2021