You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by aaronsteers July 23, 2022
I read "should" as a hard requirement and I have some strong opinions as of now about allowing integer datetime values for the standard start_date config input. I did not know we had any cases in the wild already where start_date was epoch/unix timestamp based and my preference is to unwind these or implement the custom fixes/overrides there in those taps.
Again, because start_date format is actually dictated by the spec (according to my reading, at least), orchestrators and users would reasonably expect to be able to send a consistent stringified date value to all taps in a project. If we diverge on this point and encourage tap developers to diverge, then we break assumptions in a way that will create silent failures and buggy behavior for our end users. For instance, trying to do an alpha-based greater-than-or-less-than comparison between a stringified integer and a stringified datetime will not fail but it will more likely just produce a wrong result. Many non-SDK taps which we cannot control are simply running a naive string comparison, which would work for the suggested standard and fail silently when mixed and matched with an epoch/integer format.
I'm not going to say tap developers can't go against the standard, but I think the SDK should keep with the safe prescriptive guidance here, along a path that provides the most consistent experience for users.
Sidebar: in a past life my own team's CI/CD environments calculated dynamic date values of "yesterday" or "48 hours ago" for faster EL tests and we sent the same input to all taps in our project. I also have return a stack overflow answer prescribing the same and giving sample code to create an environment variable containing the correct string format using Linux or Mac. In theory, this is 100% safe according to spec but our solution would have broken for any cases where the expected input format differed across taps.
I think there's value in preserving ability to have a "project-wide" and Singer-wide filtering capability using a consistent convention - again for users and also for orchestrators.
Last point, there may be intuitions to tightly couple the tap config interface to the upstream API's behavior and I would strongly discourage us from that approach. For one, it implies that the API behaviors are monolithic, where in fact each stream in a tap (or even future-added streams to the tap) is allowed to have a completely different internal data type expectation. By keeping with a standard recommendation on the config/UX side, we keep the external user interface the same regardless of internals of the upstream APIs.
Secondly, the tap users (and downstream targets!) should be reasonably isolated from internal storage preferences of the distinct quirks and idioms of every upstream API. If something should arrive and be stored as a datetime, we should send it to the target as such.
Lastly, to be clear, I'm not saying the standard stringified date format is better or more preferable to epoch or Unix time - but given that the choice is already made, and given there's a high cost of supporting different taps on different standards, I do not think we should accommodate or support a divergence in this area.
The text was updated successfully, but these errors were encountered:
kgpayne
changed the title
Should the SDK accommodate for differing start_date input formats
feat: check for and enforce RFC3339 start_date input format
Jun 27, 2023
Discussed in #855
Originally posted by aaronsteers July 23, 2022
I read "should" as a hard requirement and I have some strong opinions as of now about allowing integer datetime values for the standard
start_date
config input. I did not know we had any cases in the wild already where start_date was epoch/unix timestamp based and my preference is to unwind these or implement the custom fixes/overrides there in those taps.Again, because
start_date
format is actually dictated by the spec (according to my reading, at least), orchestrators and users would reasonably expect to be able to send a consistent stringified date value to all taps in a project. If we diverge on this point and encourage tap developers to diverge, then we break assumptions in a way that will create silent failures and buggy behavior for our end users. For instance, trying to do an alpha-based greater-than-or-less-than comparison between a stringified integer and a stringified datetime will not fail but it will more likely just produce a wrong result. Many non-SDK taps which we cannot control are simply running a naive string comparison, which would work for the suggested standard and fail silently when mixed and matched with an epoch/integer format.I'm not going to say tap developers can't go against the standard, but I think the SDK should keep with the safe prescriptive guidance here, along a path that provides the most consistent experience for users.
Sidebar: in a past life my own team's CI/CD environments calculated dynamic date values of "yesterday" or "48 hours ago" for faster EL tests and we sent the same input to all taps in our project. I also have return a stack overflow answer prescribing the same and giving sample code to create an environment variable containing the correct string format using Linux or Mac. In theory, this is 100% safe according to spec but our solution would have broken for any cases where the expected input format differed across taps.
I think there's value in preserving ability to have a "project-wide" and Singer-wide filtering capability using a consistent convention - again for users and also for orchestrators.
Last point, there may be intuitions to tightly couple the tap config interface to the upstream API's behavior and I would strongly discourage us from that approach. For one, it implies that the API behaviors are monolithic, where in fact each stream in a tap (or even future-added streams to the tap) is allowed to have a completely different internal data type expectation. By keeping with a standard recommendation on the config/UX side, we keep the external user interface the same regardless of internals of the upstream APIs.
Secondly, the tap users (and downstream targets!) should be reasonably isolated from internal storage preferences of the distinct quirks and idioms of every upstream API. If something should arrive and be stored as a datetime, we should send it to the target as such.
Lastly, to be clear, I'm not saying the standard stringified date format is better or more preferable to epoch or Unix time - but given that the choice is already made, and given there's a high cost of supporting different taps on different standards, I do not think we should accommodate or support a divergence in this area.
The text was updated successfully, but these errors were encountered: