Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add <class ThreadsStreams> object from Mashey tap-slack variant #9

Open
WayneBanksy opened this issue Mar 10, 2022 · 4 comments
Open

Comments

@WayneBanksy
Copy link

The current version of tap-slack fails to sync a parent message with all of its children. The method to execute this request from the API can be found in tap-slack/streams.py file (lines 153 - 172).

The mashey tap-slack variant (updated 16 months ago) has these lines of code under the same class object in the streams.py file lines 359 - 401 and should be added to the meltano-labs variant. It creates a full-table that has the sync of the parent message with all of its replies.

@tayloramurphy not sure who else should be tagged here!

@tayloramurphy
Copy link

@pnadolny13 @aaronsteers y'all know more about the SDK at this point than I do. Since this is basically about adding a new parent / child stream, the code would probably be much simpler than what @WayneBanksy is proposing here, right?

@aaronsteers
Copy link

@WayneBanksy - Can you explain this a bit more for me?:

The current version of tap-slack fails to sync a parent message with all of its children...
It creates a full-table that has the sync of the parent message with all of its replies.

I'm not sure I follow - can you add an example with an explanation of what is currently not included that should be? Thanks!

@aaronsteers
Copy link

aaronsteers commented Mar 11, 2022

Doing a little more digging, I see @tayloramurphy's point above. Some of the code here could in theory be replaced with a simple parent-child relationship, specifically that would be replacing the direct invocation pattern from the parent Messages stream:

Cc @stkbailey - Who probably has more context on the design decision and any "gotchas" he ran into during dev:

The threads stream is directly invoked by the Messages stream, but not via
standard parent-child relationship. Instead, parsed messages that have a
more recent "last_reply_at" timestamp will have a FULL_TABLE sync performed.

@stkbailey
Copy link
Collaborator

stkbailey commented Mar 11, 2022

I'm rusty on the specifics, but the design was inspired very much by Airbyte, which actually has a good explanation on the strategy here: https://github.com/airbytehq/airbyte/blob/e8eb7dbe0678149c0e1cc2d1e5ed4938601dbb82/airbyte-integrations/connectors/source-slack/source_slack/source.py#L204

The problem with the parent-child relationship is that threads can continue to have messages for days after the sync of the top-level message. So to determine whether any children need to be synced, you have to scrape through a few days of Slack messages once again.

But @WayneBanksy are you saying you aren't getting any data from the Threads endpoint? Have you tried extending the thread_lookback_days parameter?

th.Property(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants