-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CustomTimeZone
is too unstructured
#5533
Comments
I think I made my arguments in terms of the old API (@sffc it would be really nice if that could be deleted, I still don't understand which code will stay around and which won't). The equivalent to /// Marker for resolving the time zone input field.
type TimeZoneInput: Into<Option<CustomTimeZone>>; which actually is very tightly coupled to |
#5534 to address observation 3 and decouple |
`CustomTimeZone` is what it says on the lid, `icu::timezone`'s custom timezone type. It should not be used internally by `icu::datetime`, this PR relegates it to an input-only type and instead uses the four fields offset, id, metazone, and zone variant as neo fields. I further propose removing the metazone and zone variant fields in #5533.
I've mocked this up in #5535 |
The philosophy we've been using in datetime input, which has been working well, is that it is the input type's responsibility to transform itself into the inputs the formatter needs. This applies to calendar systems and time zones the same was as all other fields. A third-party type that doesn't have metazone information baked into it could use MetazoneCalculator inside of its A potential counter-argument is that datetime requires WeekCalculator internally. But, it's been a bit of a pain to deal with this extra external data marker, and we have a clean way to decouple metazone, so I'd really prefer to keep it decoupled. Actually, maybe we even want to decouple week numbering. |
But it can't, because in the |
The third-party type would use compiled data and |
What do you think about the pattern I'm suggesting for IANA -> BCP-47 mapping? |
Two things about this statement I don't think I agree with:
|
Similarly to how I don't think datetime should "manage" the MetazoneCalculator, I don't think it should manage a TimeZoneIdMapper. |
No conclusion yet. |
My preference, as discussed in #5248, is to replace the input traits with a concrete struct parameterized by |
One thing I find unfortunate about the datetime format design is that it requires extracting all of the fieldset information up front, and this applies doubly so if we switch to structs. Some of this information can be expensive to calculate. We've optimized for this, but it would be nice if datetime only fetched fieldsets when needed (so no need for prefetching week-of info when week-of isn't in the symbols). I think the current design is better here since it won't fetch stuff known to be unused at compile time, but I'm not sure if it's perfect. The flipside is that using too many traits too deeply in the impl code will lead to a lot of codesize bloat. |
But this is exactly what the type parameter lets us do and is exactly what I'm proposing. My parameterized struct suggestion does not change which fields get calculated relative to the GetField design. It just moves it to the other side of the membrane. |
I think the type parameter has gotten us the significant win of "we don't request week info if we don't need it" but it doesn't extend to not requesting info if not needed by the specific pattern as resolved by data. I think. Which is fine, it's a microoptimization.
I know, but traits let us lazily fetch if we decide to. Which I guess isn't that important in the neo design since we're already doing minimal field getting? |
The current behavior, in both 1.x and 2.0 datetime formatter, is that the fields are eagerly evaluated. There was a bug from a while back to make them lazy, but I think the type parameter solves it more cleanly.
This is accurate. |
I will wait until the legacy datetime API is deleted before commenting here. |
TimeZoneInput
is too unstructuredCustomTimeZone
is too unstructured
struct TimeZone(TimeZoneBcp47Id, Option<UtcOffset>);
struct ResolvedTimeZone(TimeZone, MetazoneId, Option<ZoneVariant>); // Formattable
struct OffsetDateTime(DateTime, UtcOffset); // Formattable, IXDTF-parseable (errors when there's no offset)
struct ZonedDateTime(DateTime, TimeZone); // IXDTF-parseable (errors when there's no IANA)
struct ResolvedZonedDateTime(ZonedDateTime, MetazoneId, Option<ZoneVariant>); // Formattable
Conclusion: Needs more work. |
Does this look right?
As a reminder, the time zone formatting styles are:
(the "partial" style is not supported yet) I think we could do this as so: /// NOT formattable
pub struct LocationTimeZone {
pub bcp47_id: TimeZoneBcp47Id,
}
/// NOT formattable
pub struct OffsetOnlyTimeZone {
pub utc_offset: UtcOffset,
}
/// NOT formattable
pub struct LocationAndOffsetTimeZone {
pub utc_offset: UtcOffset,
pub bcp47_id: TimeZoneBcp47Id,
}
pub trait TimeZoneLike {
// this is mostly a marker trait with no methods
}
impl TimeZoneLike for LocationTimeZone {}
impl TimeZoneLike for OffsetOnlyTimeZone {}
impl TimeZoneLike for LocationAndOffsetTimeZone {}
/// Formattable
pub struct ResolvedTimeZone<T: TimeZoneLike> {
// either fields determined by T or optional fields; some subset of:
utc_offset: Option<UtcOffset>,
bcp47_id: Option<TimeZoneBcp47Id>,
metazone_id: Option<MetazoneId>,
zone_variant: Option<ZoneVariant>,
}
/// Formattable
pub struct ZonedDateTime<A: AsCalendar, T: TimeZoneLike> {
pub date: Date<A>,
pub time: Time,
pub zone: ResolvedTimeZone<T>,
} |
@nordzilla @justingrant @leftmostcat for feedback on my above post about time zone data model design. |
I just read through this whole thread and I definitely support this new design. This feels much more focused on ICU4X end-user API ergonomics, as it should be. My old design mentioned in the top comment, with everything being optional, was more driven by me reading through UTS-35 Part 4 and trying to ensure that we could support all the cases without imposing unnecessary requirements on any of the cases. It looks like we have the proper calculators/mappings to support @sffc's proposed data structs. I think the functionality split among having an offset, a location, or both is a good differentiator for users. There's been a lot of activity around ICU4X time zones recently. I feel both humbled and grateful. Thank you all for your continued hard work on this project. |
Thanks @sffc for including me. I'm pleased to see the careful thought that you're putting into this design. I'll add a few comments and questions based on my (admittedly limited) understanding of ICU4X design and (better, but still not perfect) understanding of the use cases for time zones in computing, based on our work designing Temporal's ZonedDateTime type. Please feel free to correct me if I have wrong assumptions below!
What are the cases where a caller would supply both? Is this only needed for the formatting case where the time zone name is different depending on DST or non-DST? Also, what happens if the offset is not valid for this location?
Are the three structs above intended to be mainly used as inputs to methods that return a I'm asking because having 4 different structs with the same
Would it be possible to provide a bit more context about what fields and methods
How will users transform what they'll usually start with (an IANA name) to one of these structs? And how will a programmer convert back to IANA? Speaking of conversion, one of the promises that Temporal makes is that an input IANA zone is not canonicalized. If I create a ZDT with Asia/Calcutta as input, then I get Asia/Calcutta back when I ask the ZDT for its time zone ID, even though the canonical IANA name is Asia/Kolkata. If an ICU4X ZDT always canonicalizes IANA IDs to BCP-47 IDs, then how are you expecting that callers like Temporal implementations handle the need to retain the original case-normalized IANA ID?
Will this type have methods that allow arithmetic (e.g. add one month) and other means of creating derived values (e.g. the same time, but on Jan 1 2025) ? Or will creating derived values only be supported for location-based time zones which are DST-safe when creating derived values? If the former, then I worry that the proposed design is not opinionated enough to dissuade developers from making a very common timezone-related programming error: assuming that if you derive values from an offset-only ZDT, then the results will be accurate for the same location that the original timestamp was measured. (Even though the developer doesn't actually know the location, only the offset!). In Temporal, we handle this issue by allowing ZDTs to be created with either location-based or offset-only zones, but we don't allow parsing ISO 8601/RFC 3339 strings into a ZonedDateTime. A user who wants a ZDT with an offset-only zone needs to jump through more hoops, by first parsing into an Instant and then providing a time zone to convert to a ZDT. This added hurdle makes it less likely that a programmer will accidentally run into the footgun of deriving values from an offset-only ZDT. Another option that may be more appropriate for ICU4X could be to create three types:
Another, simpler option could be to rename Another option could be to make sure that any method that can create a ZDT using an offset-only time zone has a scary name, e.g. |
Thanks as usual for your thoughtful reply! A few things:
So a key piece of this proposal, which I realize I didn't make super clear, is what set of data we have available. The proposal is that we have enough data to map from a location-based (IANA) time zone and an offset to all types of display names, without needing the TZDB at runtime. This allows us to offload the TZDB to some other library. So, yes, the caller needs to supply both if they want all formatting styles available. If they have only one but not the other, a subset of formatting styles will be available, listed in the table in #5533 (comment).
There is no CLDR time zone style that falls back between an offset-based format and a location-based format. The "specific" style uses both pieces of data, and it will fall back to an offset-based format if there is no match for the given offset and location.
Yes. We should reconsider how things are named; for now, I'm just trying to get the data models right. Maybe we don't need these structs at all, if they are always just provided as arguments to constructors that return ResolvedTimeZone. But, they need to exist in the type system, so I thought we might as well use them as input structs.
Sure, I edited the post above. The trait is mostly used in the type system so that the formatting functions can raise compile errors if a time zone with the wrong subset of resolved fields is passed to an incompatible formatter.
We already have TimeZoneIdMapper for this. We could definitely make more helper functions. It also handles canonicalization. I think I had you review it at some point a while ago. If clients need to retain the original IANA IDs, they can… do so externally. It's just a single string. I don't think there's much ICU4X could do internally to make that any more efficient.
It was not my intention to support arithmetic in We do have a separate plain
That's a good point and I didn't think of that before. I think this new model forces us to at least have a different parse function for each of the three time zone types. We can name them like:
And the parse functions will fail if the time zone contained in the string does not match the function signature's expectation.
Based on the decision in the formatting API design (#5269 (comment)), I prefer having one type with a type parameter in order to reduce the number of functions we need to write on the formatters. A design with a similar data model but a more user-centric design might be: // Marker types (can't be constructed)
pub enum Location {}
pub enum OffsetOnly {}
pub enum LocationAndOffset {}
// Trait for the marker types
pub trait TimeZoneLike {}
impl TimeZoneLike for Location {}
impl TimeZoneLike for OffsetOnly {}
impl TimeZoneLike for LocationAndOffset {}
/// Formattable
pub struct TimeZone<T: TimeZoneLike> {
utc_offset: Option<UtcOffset>,
bcp47_id: Option<TimeZoneBcp47Id>,
metazone_id: Option<MetazoneId>,
zone_variant: Option<ZoneVariant>,
}
impl<T: TimeZoneLike> TimeZone<T> {
/// Creates a new time zone based on a UTC offset.
pub fn new_offset_only(offset: UtcOffset) -> Self { ... }
}
pub struct TimeZoneFactory {
data: DataPayload<MetazoneAndZoneVariantData>
}
impl TimeZoneFactory {
/// Creates a new time zone based on a location and UTC offset.
/// Most clients should use this time zone constructor if possible.
pub fn get_time_zone(offset: UtcOffset, bcp47_id: TimeZoneBcp47id) -> TimeZone<LocationAndOffset> { ... }
/// Creates a new time zone based on a location only.
pub fn get_location_only_time_zone(bcp47_id: TimeZoneBcp47Id) -> TimeZone<LocationOnly> { ... }
}
/// Formattable
pub struct ZonedDateTime<A: AsCalendar, T: TimeZoneLike> {
pub date: Date<A>,
pub time: Time,
pub zone: TimeZone<T>,
} |
@sffc thanks so much for your thorough explanations above. It's much clearer to me now what you're trying to accomplish, and overall this seems like a good direction.
If this remains a formatting-only type, I'd very strongly suggest renaming it to something like ZonedDateTimeFormatter or something like that to make its role super-clear. Otherwise, any developer who's familiar with ECMAScript might assume that the ICU4X ZDT performs the same jobs as the Temporal ZDT, which would be a mistaken assumption.
In Rust where optimization is paramount, storing as a variable-length string would be a bad decision given that the (case-normalized) strings can be represented as a 10-bit unsigned integer index into a list of ~600 IANA IDs that will, at current rates of expansion, not grow above 10 bits for 100+ years. IMO there should be some way for callers to turn an IANA ID into a 10-bit (16-bit is probably fine) value and then to reconstitute that value back into a string. Asking every IANA-using caller to do that on their own seems wasteful, esp. given that callers won't have the full list of strings without calling ICU4X to get it. Should ICU4X help with this string=>index=>string transformation for IANA names? Or is there a better way to handle this problem? |
Hmm, I like calling it
Made a separate issue for this feature request: #5610 |
I think the case you want to avoid is someone Googling for
Out of curiosity why To me the former name connotes "A superset of ZDT that is also formattable" (because "formattable" is an adjective that modifies the noun "ZonedDateTime") while the latter connotes "A formatter that uses ZDT as its data model" (because "ZonedDateTime" is the adjective-ish thng that modifies the noun "Formatter"). I assume that the latter is closer to your intent with this type?
Thanks! I'll comment over there. |
I feel like good docs can solve this problem, recommending a different crate if you need arithmetic. And ICU4X's ZonedDateTime is not wrong; it just doesn't support that one piece of functionality. |
My strong opinion is that we should not use Also, for developers who are looking for i18n, having something in the type name that's clearly formatting-related will also help with discovery. I guess I don't understand the reason why we'd want to call it ZonedDateTime. The only advantage I can think of is to increase traffic to the ICU4C ZDT-formatting type to help drive developer awareness that time-zone-aware formatting is possible in Rust. But I think we get almost all of that benefit from a type called ZonedDateTimeFormatter, without the downsides. What am I missing? |
Well, we have already been shipping I'm not aware of another high-level Rust ecosystem type that plans to support all of Temporal ZonedDateTime. There is Maybe eventually ICU4X A "formatter" is something that owns formatting data and maps "formattable inputs" to "formatted outputs". ZonedDateTime is not a formatter because it doesn't own formatting data. It is a formattable input to a formatter. |
Just to highlight this in case it's not obvious: everything we're talking about is namespaced.
So there's no ambiguity that you are using ICU4X's "ZonedDateTime", which is self-evidently the i18n-forward type. Even if you put the imports at the top of your file, you are still being explicit when you import the type. One direction I'm not necessarily opposed to is emphasizing the namespace in the type name: IntlZonedDateTime, for instance. That's mostly a novel concept for us, though. |
ICU4X's ZonedDateTime supports more than just formatting; it also supports IXDTF and calendar conversions. It's only "add 1 month" type operations that it doesn't support yet, and that's extremely obvious: there is no method for it. |
Just to list the bikeshed names in one place:
|
Other than arithmetic, are there any other gaps between Temporal.ZonedDateTime and ICU4X's corresponding type? Don't need every method, mostly just trying to understand the use cases covered and not covered.
Could you explain this a bit? Why is it nice? I'm not suggesting this is a bad choice, just trying to understand better. Also, for callers who want to implement arithmetic for non-Gregorian calendars, is there a Rust crate that does this today? If not, what's the plan for arithmetic for these calendars?
What would be the output of these parsing methods if parsing succeeds? A ZDT instance?
This is the one I'd be a bit skeptical about, because wouldn't it effectively foreclose a future ability to add arithmetic cleanly? Another choice could be to create a separate This also might make it clearer to callers that there's a difference between a time zone (something that can be used to derive other ZDTs and localize into a wide range of formats, including the most familiar formats for most end users like "Eastern Standard Time") and an offset (something that can be used to parse an Instant string into its constituent parts and localize into limited set of formats that exclude familiar formats that most end-users prefer) IMO one of the best decisions we made in Temporal was to align type naming with data model. Would it be good to do this in ICU4X too? Or are traits the idiomatic way to do this in Rust?
Unlike for offsets where it's not possible to determine the location time zone from an offset, it *is* possible to determine an offset from a location-only time zone. (Using a disambiguation strategy, either hardcoded like |
Another model: struct TimeZone(TimeZoneBcp47Id);
// "[Europe/Zurich]" -> TimeZone("chzrh")
// "[Future/Zone]" -> TimeZone("unk")
struct CalculatedTimeZone(TimeZoneBcp47Id, UtcOffset);
// "+01:00[Europe/Zurich]" -> CalculatedTimeZone("chzrh", +1)
// "+01:00[Future/Zone]" -> CalculatedTimeZone("unk", +1)
// "+01:00" -> CalculatedTimeZone("unk", +1)
struct Metazone(Option<MetazoneId>, TimeZone);
// TimeZone("chzrh") + timestamp --MetazoneCalculator--> Metazone(Some("cet"), TimeZone("chzrh"))
// TimeZone("unk") + timestamp --MetazoneCalculator-> Metazone(None, TimeZone("unk"))
struct CalculatedMetazone(Option<MetazoneId>, ZoneVariant, CalculatedTimeZone);
// CalculatedTimeZone("chzrh", +1) + timestamp --MetazoneCalculator--> CalculatedMetazone(Some("cet"), Standard, TimeZone("chzrh"))
// CalculatedTimeZone("unk", +1) + timestamp --MetazoneCalculator--> CalculatedMetazone(None, Standard, TimeZone("unk")) The tzdb can be used to go from |
Hmm actually the |
Of the 4 fields, all are ones the user may or may not have by the time they get to ICU4X, with some more likely than others. Some example input sets:
Any anything that says "time zone" could also be an unknown future time zone. The output sets, after processing with CLDR data and timestamp, are
(Even ignoring the metazone inputs, it seems that there are enough weird combinations that the simplistic model that CustomTimeZone currently supports might be just fine? Maybe with some improvements to ergonomics and making it harder for fields to drift out of sync.) |
Time Zone Focus MeetingKey Questions:
References:
Does ICU4X offer timezone calculations?Pros and cons of including TZDB:
Discussion:
@sffc's Proposal:
Approval: @robertbastian, @sffc, @nekevss @justingrant @Manishearth Can ICU4X
|
Some more discussion with @sffc and @robertbastian pub struct TimeZoneInfo {
/// An ID or Etc/Unknown
pub id: TimeZoneBcp47Id,
/// The offset from UTC, if known
pub offset: Option<UtcOffset>,
/// The daylight/standard variant, if known
pub variant: Option<ZoneVariant>,
/// The local datetime for name lookup calculations, if known
pub local_time: Option<(Date<Iso>, Time)>,
}
pub(crate) struct ResolvedTimeZone {
/// From input
pub id: TimeZoneBcp47Id,
/// From input
pub offset: Option<UtcOffset>,
/// From input, or calculated if not present
pub variant: ZoneVariant,
/// Calculated (None if there is no metazone)
pub metazone_id: Option<MetazoneId>,
} // actually not needed if we go from TimeZoneInfo to ExtractedInput directly
pub(crate) struct ExtractedInput {
// ...
/// None if input type doesn't have time zone
pub id: Option<TimeZoneBcp47Id>,
/// None if input type doesn't have time zone
/// Some(None) if there is a time zone but offset is not known
pub offset: Option<Option<UtcOffset>>,
/// None if input type has neither the variant nor the 3-tuple time zone, local time, and offset
pub variant: Option<ZoneVariant>,
/// None if input type doesn't have time zone or local time
/// Some(None) if there is a time zone but the time zone has no metazone at the local time
pub metazone_id: Option<Option<MetazoneId>>,
} Error cases:
@robertbastian: the fields required should be for the whole fallback chain, because otherwise you get data-driven exceptions @robertbastian: Location and Offset don't need a ResolvedTimeZone; they could accept TimeZoneInfo. The other three formats should check for missing inputs if we do this. |
The
TimeZoneInput
trait combines too much information, and using it as the time zone inZonedDateTimeFormatter
yields an unusable API.Obervation 1
Datetime libraries deal only with IANA names and offsets:
jiff::Timezone
:iana()
,to_offset(Timestamp)
chrono_tz::Tz
:name()
,offset_from_utc_datetime(DateTime<Utc>)
(chrono doesn't have a timestamp type)-06:00
,+5:00[America/Vancouver]
,[America/Vancouver]
The concept of metazones, and zone variants is a CLDR-internal concept.1
Observation 2
The BCP47 ID is superior to the IANA ID. It is fixed width and alias-free. Using it internally is the right choice.
Observation 3
The formatters currently use
ExtractedTimeZoneInput
andCustomTimeZone
interchangeably, as those types have the same fields. This should be cleaned up (it should beExtractedTimeZoneInput
where it isExtractedDate/TimeInput
for the other fields). For the sake of this discussion, please assume that this cleanup happened.Proposal
I propose changing the
TimeZoneInput
trait to replacetime_zone_id
bylocation
, with a different signature, and remove themetazone_id
andzone_variant
methods.Passing an optional
&TimeZoneIdMapper
, which is managed by the formatter (and enabled with a method likewith_iana_support[_unstable]
) to thelocation
methods removes the need for trait implementers to manage their ownTimeZoneIdMapper
. Currently implementingTimeZoneInput
on, for example,jiff::Timezone
is not possible; it would require a newtype that wraps(jiff::Timezone, &TimezoneIdMapper)
, and a way to construct these fromjiff:Timezone
s, some kind ofTimeZoneIdMapperHolder
. Yuck. This is a problem for most implementers, see observation 1.The second change is the removal of
metazone_id
andzone_variant
. These fields need aMetazoneCalculator
, and, more importantly, aDateTime
to be computed. If they only needed the metazone calculator, we could follow the same approach as with the timezone mapper, but because they need to be computed at a datetime, this is very display-oriented data and not really part of the "identity" of a timezone. In addition to this, the concept of a metazone is not universal, it was invented by CLDR to deduplicate non-location display names. Exposing this directly to users/trait implementers requires more CLDR knowledge than necessary.Instead, the formatter should load a
MetazoneCalculator
whenever it loads a non-location style. The metazone calculation can then be done just for the lookup in the non-location display names.This new trait is implementable by time zone abstractions in the ecosystem (
jiff::Timezone
,chrono_tz::Tz
).We could then remove the
metazone
andzone_variant
fields fromicu::timezone::CustomTimeZone
as well. They are already not populated when parsing an IXDTF string, which I consider the main use case ofCustomTimeZone
, and need to be manually computed from aMetazoneCalculator
and a datetime. I also support this change.Footnotes
TZDB has a DST flag, but this is not generally useful, as Ireland for example sets it in the winter. It just means "this is the other offset for this location", and is more of a zone-calculation implementation detail. ↩
The text was updated successfully, but these errors were encountered: