Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch index already exists exception #326

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gsoyka
Copy link

@gsoyka gsoyka commented Oct 8, 2018

In our deployment, the schema container fails on the create_index job if the schema already exists in elasticsearch. This patch resolves that issue and breaks the cycle of k8s constantly attempting to restart the job.

@orangejulius
Copy link
Member

orangejulius commented Oct 9, 2018

Hey @gsoyka,
Are you using the pelias/kubernetes helm charts? Our schema build job has an option to drop the existing schema.

I'm a little hesitant to merge this because I believe failure to drop the schema or failure to create a schema are conditions that should be errors. For example, if changing the version of the schema that you are using, failure to create the schema means any new changes wouldn't be applied. Catching that exception makes it easier to miss that.

I'm on the fence, what do you think?

@gsoyka
Copy link
Author

gsoyka commented Oct 10, 2018

I am using the pelias/kubernetes helm charts. Please forgive my ignorance around elasticsearch, but is it safe to drop and recreate an index, or will that result in data loss? My main concern here is breaking the endless loop of kubernetes retrying the schema creation job if the index already exists, but I'm completely open to other ways of accomplishing that.

@orangejulius
Copy link
Member

Dropping the index will delete all the data, so that's a valid concern. (among other things, index is the word Elasticsearch means in the same way Postgres or MySQL might use the word database, so it's analagous to dropping a database in an RDBMS) I admit myself to having spent a bit of time breaking that endless loop.

However, since the schema create job is used only to start a Pelias build, and it's usually best to start each build with a freshly-dropped index, usually losing that data has not been a concern.

Let me know if that matches with your workflows. I don't want to assume everyone does things the same way.

@gsoyka
Copy link
Author

gsoyka commented Oct 10, 2018

That makes sense, I figured dropping the index would have a similar effect as dropping a database but wasn't able to confirm that.

In terms of workflow, I came upon this issue after restarting from a failed build, as referenced by item #1 in the docs you linked as a situation when you don't need to delete the index. I'm working with the entire planet, so restarting from scratch after one data source fails isn't ideal from a time perspective. Is there a recommendation of what to do when one data source fails and the others succeed?

@orangejulius
Copy link
Member

Okay. One importer failing and you want to continue just that one is definitely in the list of cases where you want to avoid dropping the index.

How about this? Lets make a configuration option in pelias.json that controls whether or not this is a fatal error. That increases complexity here slightly, and would require a similar PR to pelias/kubernetes.

By the way, which importer was failing? We have been working on making that less likely with PRs such as pelias/openaddresses#405, so if we can help fix something, we'd like to know.

@arne-cl
Copy link
Member

arne-cl commented May 27, 2019

I just ran into the same problem on Kubernetes (I'm not using HELM, but deployed the pelias-build jobs/pods manually).

Could the code for ./bin/create_index be changed to only throw an error if the index exists AND it does
not match the schema that pelias needs? That way, the code can always run without "causing harm".

@orangejulius
Copy link
Member

Hey @arne-cl,
That's a really good idea.

I'm not sure if the schema JSON that we generate and send to Elasticsearch in this module is identical to what would come back if we ask Elasticsearch for the existing schema, but if you want to play around with it and see how much work it would be, that's definitely a feature we'd merge, and would solve this issue nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants