Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check for and enforce RFC3339 start_date input format #1792

Open
kgpayne opened this issue Jun 27, 2023 Discussed in #855 · 2 comments
Open

feat: check for and enforce RFC3339 start_date input format #1792

kgpayne opened this issue Jun 27, 2023 Discussed in #855 · 2 comments
Milestone

Comments

@kgpayne
Copy link
Contributor

kgpayne commented Jun 27, 2023

Discussed in #855

Originally posted by aaronsteers July 23, 2022
I read "should" as a hard requirement and I have some strong opinions as of now about allowing integer datetime values for the standard start_date config input. I did not know we had any cases in the wild already where start_date was epoch/unix timestamp based and my preference is to unwind these or implement the custom fixes/overrides there in those taps.

Again, because start_date format is actually dictated by the spec (according to my reading, at least), orchestrators and users would reasonably expect to be able to send a consistent stringified date value to all taps in a project. If we diverge on this point and encourage tap developers to diverge, then we break assumptions in a way that will create silent failures and buggy behavior for our end users. For instance, trying to do an alpha-based greater-than-or-less-than comparison between a stringified integer and a stringified datetime will not fail but it will more likely just produce a wrong result. Many non-SDK taps which we cannot control are simply running a naive string comparison, which would work for the suggested standard and fail silently when mixed and matched with an epoch/integer format.

I'm not going to say tap developers can't go against the standard, but I think the SDK should keep with the safe prescriptive guidance here, along a path that provides the most consistent experience for users.

Sidebar: in a past life my own team's CI/CD environments calculated dynamic date values of "yesterday" or "48 hours ago" for faster EL tests and we sent the same input to all taps in our project. I also have return a stack overflow answer prescribing the same and giving sample code to create an environment variable containing the correct string format using Linux or Mac. In theory, this is 100% safe according to spec but our solution would have broken for any cases where the expected input format differed across taps.

I think there's value in preserving ability to have a "project-wide" and Singer-wide filtering capability using a consistent convention - again for users and also for orchestrators.

Last point, there may be intuitions to tightly couple the tap config interface to the upstream API's behavior and I would strongly discourage us from that approach. For one, it implies that the API behaviors are monolithic, where in fact each stream in a tap (or even future-added streams to the tap) is allowed to have a completely different internal data type expectation. By keeping with a standard recommendation on the config/UX side, we keep the external user interface the same regardless of internals of the upstream APIs.

Secondly, the tap users (and downstream targets!) should be reasonably isolated from internal storage preferences of the distinct quirks and idioms of every upstream API. If something should arrive and be stored as a datetime, we should send it to the target as such.

Lastly, to be clear, I'm not saying the standard stringified date format is better or more preferable to epoch or Unix time - but given that the choice is already made, and given there's a high cost of supporting different taps on different standards, I do not think we should accommodate or support a divergence in this area.

@kgpayne kgpayne added this to the v1.0 Release milestone Jun 27, 2023
@kgpayne
Copy link
Contributor Author

kgpayne commented Jun 27, 2023

Related to #759 and #1734

@kgpayne kgpayne changed the title Should the SDK accommodate for differing start_date input formats feat: check for and enforce RFC3339 start_date input format Jun 27, 2023
@edgarrmondragon
Copy link
Collaborator

Might be indirectly addressed by #2132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants