Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to define analyzer (and not tokenizers) in the parameter of the fulltext search. #57

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

philippe-levan
Copy link

Elasticsearch is providing a lot of analyzers already ready for a lot of languages (italian, french,...) and we can easily define more analyzers in elasticsearch if we want to.

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/analysis-lang-analyzer.html

But in the parameters of the fulltextSearch, we can only change the tokenizer and not the analyzer.

An analyzer (called "analyzer" in the code) is defined in the file lib/Service/IndexMappingService.php, in the method generateGlobalMap. This analyzer is used to analyze the content and the combined fields.

I believe it would be better to let the user define the analyzer (and not the tokenizer) in the parameters, and then use this user provided analyzer name for the analyze of the documents contents.

Warning :

  • it introduces a BC Break for people that would use a custom tokenizer
  • I don't manage to make my code work on my development nextcloud... (so I can't to a real dev for the moment...)

So this PR is not a real PR. It is a kind of issue with a code example to make my issue clear...

Best regards,
Philippe

R0Wi pushed a commit to R0Wi/fulltextsearch_elasticsearch that referenced this pull request Aug 8, 2020
[master] add options to fully ignore external storage
@r-fujikura
Copy link

Hi, @philippe-levan @adsworth @rullzer @Gomez @SunboX.
Sorry for the sudden question. If you happen to have a free moment, I would like to ask for your help as I am having trouble with the NextCloud implementation.
I am currently developing a way to enable full text search of files in NextCloud using ElasticSearch and Fulltextsearch.
However, I'm facing one problem.
I want to use a custom analyzer (defined by myself) in Fulltextsearch, but when I set up the custom analyzer and generate the index, the index content is missing.
So I modified the source code as per the commit in this issue, but it didn't change the situation.
The search using kuromoji_tokenizer is running fine.
I would like your help on the cause of the lack of content when searching using the custom analyzer.
I really appreciate your great support.

Environment:.

  • NextCloud 20.0.9
  • ElasticSearch 7.4.2
  • Full text search 20.0.0
  • Full text search - Elasticsearch Platform 20.0.1
  • Full text search - Files 20.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants