-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a bitstype just for IANA Variable TimeZones #335
base: master
Are you sure you want to change the base?
Conversation
@@ -46,7 +46,10 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT)) | |||
tz, class = get!(TIME_ZONE_CACHE, str) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omus input on the best way to rewrite this function would apprecated.
name(a::IANATimeZone) = name(backing_timezone(a)) | ||
transitions(tz::IANATimeZone) = transitions(backing_timezone(tz)) | ||
|
||
# TODO: should i just make this check the fields of VariableTimeZone and just delegate all? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be addressed
13, 5, 43, 3, 355, 12, 168, 148, 90, 179, | ||
4, 1, 315, 3, 3, 3, 2, 37, 5, 65, | ||
22, 320, 245, 1, 1681, 1681, 1681, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should i rewrap this?
10 per line?
Up to the line limit?
1 per line?
|
||
transits = transitions(tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be premature optimization to pull this out to be done before.
Idea is to avoid relooking up the backing_timezone
for IANATimeZone
multiple times
@@ -163,3 +163,7 @@ function last_valid(local_dt::DateTime, tz::VariableTimeZone) | |||
possible = interpret(local_dt, tz, Local) | |||
return isempty(possible) ? first(shift_gap(local_dt, tz)) : last(possible) | |||
end | |||
|
|||
|
|||
first_valid(dt::DateTime, tz::IANATimeZone, args...) = _do_and_rewrap(first_valid, dt, tz, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probnably _do_and_rewrap
needs to be renamed
|
||
const IANA_TIMEZONES = Vector{VariableTimeZone}(undef, IANA_TABLE_SIZE) | ||
|
||
# TODO: maybe fill this during build(), probably by generating a julia file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it is fine, since it only queries what files exist, using timezone_names
rather than reading them.
Maybe timezone_names
(or __init__
) should do the validation that TimeZones.jl has build properly and has the TZDATA.COMPILED_DIR
.
Which would maybe clean up the TimeZone
constructor a bit.
src/types/ianatimezone.jl
Outdated
len >= 19 && (hval += asso_values[units[19]]) | ||
len >= 12 && (hval += asso_values[units[12]]) | ||
len >= 11 && (hval += asso_values[units[11]]) | ||
len >= 9 && (hval += asso_values[units[9] + 1]) | ||
len >= 8 && (hval += asso_values[units[8]]) | ||
len >= 6 && (hval += asso_values[units[6] + 1]) | ||
len >= 4 && (hval += asso_values[units[4]]) | ||
len >= 2 && (hval += asso_values[units[2] + 1]) | ||
len >= 1 && (hval += asso_values[units[1]]) | ||
len > 0 && (hval += asso_values[units[end]]) # add the last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len >= 19 && (hval += asso_values[units[19]]) | |
len >= 12 && (hval += asso_values[units[12]]) | |
len >= 11 && (hval += asso_values[units[11]]) | |
len >= 9 && (hval += asso_values[units[9] + 1]) | |
len >= 8 && (hval += asso_values[units[8]]) | |
len >= 6 && (hval += asso_values[units[6] + 1]) | |
len >= 4 && (hval += asso_values[units[4]]) | |
len >= 2 && (hval += asso_values[units[2] + 1]) | |
len >= 1 && (hval += asso_values[units[1]]) | |
len > 0 && (hval += asso_values[units[end]]) # add the last | |
@inbounds begin | |
len >= 19 && (hval += asso_values[units[19]]) | |
len >= 12 && (hval += asso_values[units[12]]) | |
len >= 11 && (hval += asso_values[units[11]]) | |
len >= 9 && (hval += asso_values[units[9] + 1]) | |
len >= 8 && (hval += asso_values[units[8]]) | |
len >= 6 && (hval += asso_values[units[6] + 1]) | |
len >= 4 && (hval += asso_values[units[4]]) | |
len >= 2 && (hval += asso_values[units[2] + 1]) | |
len >= 1 && (hval += asso_values[units[1]]) | |
len > 0 && (hval += asso_values[units[end]]) # add the last | |
end |
This saves some time and simplifies the native code but relies on each code unit being in the range 1:125
.
Might be worth it just to pad the array of asso_values
to cover the domain of UInt8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This saves some time and simplifies the native code but relies on each code unit being in the range 1:125
I tried this and didn't notice any improvement.
How were you benchmarking it?
Might be worth it just to pad the array of asso_values to cover the domain of UInt8
I tried this and it made it way worse. I think because the tuple gets too lange for julia to be willing to do some optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> @btime for x in ["America/Winnipeg", "Etc/UTC", "Canada/Central"]
perfect_hash(x)
end
71.438 ns (1 allocation: 112 bytes)
julia> @btime for x in ["America/Winnipeg", "Etc/UTC", "Canada/Central"]
perfect_hash_inbounds(x)
end
66.137 ns (1 allocation: 112 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running @btime
on the whole testset in test/types/ianatimezone.jl
With the benchmark
@btime for x in ("America/Winnipeg", "Etc/UTC", "Canada/Central")
TimeZones.perfect_hash(x)
end
Noting that i got rid of the vector in the benchmark code since that was dominating.
In julia 1.6
- current code without the inbounds annotation|: 38.097 ns (0 allocations: 0 bytes)
- with inbounds (not safe): 33.217 ns (0 allocations: 0 bytes)
- with inbounds + make assoc_values bigger: 79.305 ns (0 allocations: 0 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- with inbounds and assoc to 255: 44.585 ns (0 allocations: 0 bytes)
growing assoc to 127 seems to help: fikkiwubg wutgh tgat:
- with just assoc to 127 : 28.334 ns (0 allocations: 0 bytes)
- with inbounds ((Not safe): 23.236 ns (0 allocations: 0 bytes)
- with inbounds and checking based on bitmask: 47.035 ns (0 allocations: 0 bytes)
- with inbounds and checking based in <: 36.359 ns (0 allocations: 0 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm i don't trust my benchmarks
Timing for 127 + checking based on <: sometimes is 36ns sometimes is 48ns.
I think we should do that though, and i will push that code
This is yet another PR relating to #271
It doesn't solve it totally,
It needs #327 to handling FixedTimeZones, and #332 to make the ZonedDateTime parametric.
But with those 3 it would solve everything.
It is most similar to #287 / #323
but with rather than a dynamically created lookup table as timezones are used, it has a static (but lazy) look up table.
This table always maps the same integer id, to the same timezone.
This means we don't need to worry about serialization.
Here is the C code for the perfect hashing that
gperf
generated.Right now this code is kinda gross.
Especially around loading it.
We have 2 caches, and idk we probably only need 1.