Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedupe on Couchbase for real time streaming json (flink) #262

Open
ashubitm opened this issue Nov 3, 2018 · 4 comments
Open

Dedupe on Couchbase for real time streaming json (flink) #262

ashubitm opened this issue Nov 3, 2018 · 4 comments

Comments

@ashubitm
Copy link

ashubitm commented Nov 3, 2018

Hi ,I am trying to dedupe real time streaming json with destination as couchbase .I am trying to do this call for dedupe from flink but not able to perform .
Can you please help with config file for couchbase and how to call our larsga/Dedupe from flink ?

@uderline
Copy link

uderline commented Nov 5, 2018

Hi,
There are no couchbase datasource - you can eventually make your own based on other datasources.
I am not familiar with couchbase and flink. Can you make a request with couchbase, then make a deduplication with the returned json ?
To call the deduplication function:

  • Make the config (properties, match listener, data source ...)
  • Start the processor
  • Make the records
  • Call the dedup function

@ashubitm
Copy link
Author

ashubitm commented Nov 5, 2018

Thanks ,
What i am trying to do is to process a stream of jsons ( source) against couchbase DB (destination).
Calling the json from couchbase may not be a gr8 idea from performance point of view and how many records to pull will be another thing .
If MongoDB can have a direct destination why not couchbase ? If you can help around this will be gr8 .
My purpose is if there are duplicates in the stream that is already present in destination i should not be saving those .

@uderline
Copy link

uderline commented Nov 5, 2018

Sorry, I misread your first post. No problem for the json streaming if you're using flink or any other tool designed for streaming.
MongoDB has a data source connection but not a destination connection. The matching records are always saved in the match listener whereas the ones with no matches are not saved. You will need to make the destination connection.

@kyriehan89
Copy link

hi @ashubitm ,

I also have the same case, need to do dedup from couchbase,have you found the solution?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants