2010-09-20 22:08:53 +02:00
|
|
|
src/timezone/README
|
2008-03-21 14:23:29 +01:00
|
|
|
|
2016-03-28 21:10:17 +02:00
|
|
|
This is a PostgreSQL adapted version of the IANA timezone library from
|
2008-03-21 14:23:29 +01:00
|
|
|
|
2017-11-25 21:30:11 +01:00
|
|
|
https://www.iana.org/time-zones
|
2004-04-30 06:09:23 +02:00
|
|
|
|
2017-11-25 21:30:11 +01:00
|
|
|
The latest version of the timezone data and library source code is
|
|
|
|
available right from that page. It's best to get the merged file
|
|
|
|
tzdb-NNNNX.tar.lz, since the other archive formats omit tzdata.zi.
|
|
|
|
Historical versions, as well as release announcements, can be found
|
2016-03-28 21:10:17 +02:00
|
|
|
elsewhere on the site.
|
2011-10-28 05:09:05 +02:00
|
|
|
|
2016-03-28 21:10:17 +02:00
|
|
|
Since time zone rules change frequently in some parts of the world,
|
|
|
|
we should endeavor to update the data files before each PostgreSQL
|
|
|
|
release. The code need not be updated as often, but we must track
|
|
|
|
changes that might affect interpretation of the data files.
|
2008-02-16 22:16:04 +01:00
|
|
|
|
2004-10-24 17:01:54 +02:00
|
|
|
|
2016-03-28 21:10:17 +02:00
|
|
|
Time Zone data
|
|
|
|
==============
|
2004-10-24 17:01:54 +02:00
|
|
|
|
2017-11-25 21:30:11 +01:00
|
|
|
We distribute the time zone source data as-is under src/timezone/data/.
|
|
|
|
Currently, we distribute just the abbreviated single-file format
|
|
|
|
"tzdata.zi", to reduce the size of our tarballs as well as churn
|
|
|
|
in our git repo. Feeding that file to zic produces the same compiled
|
|
|
|
output as feeding the bulkier individual data files would do.
|
2006-07-25 05:51:23 +02:00
|
|
|
|
2017-11-25 21:30:11 +01:00
|
|
|
While data/tzdata.zi can just be duplicated when updating, manual effort
|
|
|
|
is needed to update the time zone abbreviation lists under tznames/.
|
2013-03-23 23:47:22 +01:00
|
|
|
These need to be changed whenever new abbreviations are invented or the
|
|
|
|
UTC offset associated with an existing abbreviation changes. To detect
|
|
|
|
if this has happened, after installing new files under data/ do
|
2014-02-12 23:29:19 +01:00
|
|
|
make abbrevs.txt
|
2013-03-23 23:47:22 +01:00
|
|
|
which will produce a file showing all abbreviations that are in current
|
|
|
|
use according to the data/ files. Compare this to known_abbrevs.txt,
|
|
|
|
which is the list that existed last time the tznames/ files were updated.
|
|
|
|
Update tznames/ as seems appropriate, then replace known_abbrevs.txt
|
2016-03-28 21:10:17 +02:00
|
|
|
in the same commit. Usually, if a known abbreviation has changed meaning,
|
|
|
|
the appropriate fix is to make it refer to a long-form zone name instead
|
|
|
|
of a fixed GMT offset.
|
2010-04-15 13:00:45 +02:00
|
|
|
|
2016-09-05 02:02:16 +02:00
|
|
|
The core regression test suite does some simple validation of the zone
|
|
|
|
data and abbreviations data (notably by checking that the pg_timezone_names
|
|
|
|
and pg_timezone_abbrevs views don't throw errors). It's worth running it
|
|
|
|
as a cross-check on proposed updates.
|
|
|
|
|
2010-04-15 13:00:45 +02:00
|
|
|
When there has been a new release of Windows (probably including Service
|
|
|
|
Packs), the list of matching timezones need to be updated. Run the
|
|
|
|
script in src/tools/win32tzlist.pl on a Windows machine running this new
|
|
|
|
release and apply any new timezones that it detects. Never remove any
|
|
|
|
mappings in case they are removed in Windows, since we still need to
|
|
|
|
match properly on the old version.
|
2016-03-28 21:10:17 +02:00
|
|
|
|
|
|
|
|
|
|
|
Time Zone code
|
|
|
|
==============
|
|
|
|
|
2020-10-23 03:15:22 +02:00
|
|
|
The code in this directory is currently synced with tzcode release 2020d.
|
2016-03-28 21:10:17 +02:00
|
|
|
There are many cosmetic (and not so cosmetic) differences from the
|
|
|
|
original tzcode library, but diffs in the upstream version should usually
|
|
|
|
be propagated to our version. Here are some notes about that.
|
|
|
|
|
|
|
|
For the most part we want to use the upstream code as-is, but there are
|
|
|
|
several considerations preventing an exact match:
|
|
|
|
|
|
|
|
* For readability/maintainability we reformat the code to match our own
|
|
|
|
conventions; this includes pgindent'ing it and getting rid of upstream's
|
|
|
|
overuse of "register" declarations. (It used to include conversion of
|
|
|
|
old-style function declarations to C89 style, but thank goodness they
|
|
|
|
fixed that.)
|
|
|
|
|
|
|
|
* We need the code to follow Postgres' portability conventions; this
|
2020-02-21 09:14:03 +01:00
|
|
|
includes relying on configure's results rather than hand-hacked
|
2020-06-18 00:29:29 +02:00
|
|
|
#defines (see private.h in particular).
|
|
|
|
|
|
|
|
* Similarly, avoid relying on <stdint.h> features that may not exist on old
|
|
|
|
systems. In particular this means using Postgres' definitions of the int32
|
|
|
|
and int64 typedefs, not int_fast32_t/int_fast64_t. Likewise we use
|
|
|
|
PG_INT32_MIN/MAX not INT32_MIN/MAX. (Once we desupport all PG versions
|
|
|
|
that don't require C99, it'd be practical to rely on <stdint.h> and remove
|
|
|
|
this set of diffs; but that day is not yet.)
|
2016-03-28 21:10:17 +02:00
|
|
|
|
|
|
|
* Since Postgres is typically built on a system that has its own copy
|
|
|
|
of the <time.h> functions, we must avoid conflicting with those. This
|
|
|
|
mandates renaming typedef time_t to pg_time_t, and similarly for most
|
|
|
|
other exposed names.
|
|
|
|
|
2018-10-31 14:47:53 +01:00
|
|
|
* zic.c's typedef "lineno" is renamed to "lineno_t", because having
|
|
|
|
"lineno" in our typedefs list would cause unfortunate pgindent behavior
|
|
|
|
in some other files where we have variables named that.
|
|
|
|
|
2016-03-28 21:10:17 +02:00
|
|
|
* We have exposed the tzload() and tzparse() internal functions, and
|
|
|
|
slightly modified the API of the former, in part because it now relies
|
|
|
|
on our own pg_open_tzfile() rather than opening files for itself.
|
|
|
|
|
Remove support for timezone "posixrules" file.
The IANA tzcode library has a feature to read a time zone file named
"posixrules" and apply the daylight-savings transition dates and times
therein, when it is given a POSIX-style time zone specification that
lacks an explicit transition rule. However, there's a problem with
that code: it doesn't work for dates past the Y2038 time_t rollover.
(Effectively, all times beyond that point are treated as standard
time.) The IANA crew regard this feature as legacy, so their plan is
to remove it not fix it. The time frame in which that will happen
is unclear, but presumably it'll happen well before 2038.
Moreover, effective with the next IANA data update (probably this
fall), the recommended default will be to not install a "posixrules"
file in the first place. The time frame in which tzdata packagers
might adopt that suggestion is likewise unclear, but at least some
platforms will probably do it in the next year or so. While we could
ignore that recommendation so far as PG-supplied tzdata trees are
concerned, builds using --with-system-tzdata will be subject to
whatever the platform's tzdata packager decides to do.
Thus, whether or not we do anything, some increasing fraction of
Postgres users will be exposed to the behavior observed when there
is no "posixrules" file; and if we do nothing, we'll have essentially
no control over the timing of that change.
The best thing to do to ameliorate the uncertainty seems to be to
proactively remove the posixrules-reading feature. If we do that in
a scheduled release then at least we can release-note the behavioral
change, rather than having users be surprised by it after a routine
tzdata update.
The change in question is fairly minor anyway: to be affected,
you have to be using a POSIX-style timezone spec, it has to not
have an explicit rule, and it has to not be one of the four traditional
continental-USA zone names (EST5EDT, CST6CDT, MST7MDT, or PST8PDT),
as those are special-cased. Since the default "posixrules" file
provides USA DST rules, the number of people who are likely to find
such a zone spec useful is probably quite small. Moreover, the
fallback behavior with no explicit rule and no "posixrules" file is to
apply current USA rules, so the only thing that really breaks is the
DST transitions in years before 2007 (and you get the countervailing
fix that transitions after 2038 will be applied).
Now, some installations might have replaced the "posixrules" file,
allowing e.g. EU rules to be applied to a POSIX-style timezone spec.
That won't work anymore. But it's not exactly clear why this solution
would be preferable to using a regular named zone. In any case, given
the Y2038 issue, we need to be pushing users to stop depending on this.
Back-patch into v13; it hasn't been released yet, so it seems OK to
change its behavior. (Personally I think we ought to back-patch
further, but I've been outvoted.)
Discussion: https://postgr.es/m/1390.1562258309@sss.pgh.pa.us
Discussion: https://postgr.es/m/20200621211855.6211-1-eggert@cs.ucla.edu
2020-06-30 00:55:01 +02:00
|
|
|
* tzparse() is adjusted to never try to load the TZDEFRULES zone.
|
Improve performance of timezone loading, especially pg_timezone_names view.
tzparse() would attempt to load the "posixrules" timezone database file on
each call. That might seem like it would only be an issue when selecting a
POSIX-style zone name rather than a zone defined in the timezone database,
but it turns out that each zone definition file contains a POSIX-style zone
string and tzload() will call tzparse() to parse that. Thus, when scanning
the whole timezone file tree as we do in the pg_timezone_names view,
"posixrules" was read repetitively for each zone definition file. Fix
that by caching the file on first use within any given process. (We cache
other zone definitions for the life of the process, so there seems little
reason not to cache this one as well.) This probably won't help much in
processes that never run pg_timezone_names, but even one additional SET
of the timezone GUC would come out ahead.
An even worse problem for pg_timezone_names is that pg_open_tzfile()
has an inefficient way of identifying the canonical case of a zone name:
it basically re-descends the directory tree to the zone file. That's not
awful for an individual "SET timezone" operation, but it's pretty horrid
when we're inspecting every zone in the database. And it's pointless too
because we already know the canonical spelling, having just read it from
the filesystem. Fix by teaching pg_open_tzfile() to avoid the directory
search if it's not asked for the canonical name, and backfilling the
proper result in pg_tzenumerate_next().
In combination these changes seem to make the pg_timezone_names view
about 3x faster to read, for me. Since a scan of pg_timezone_names
has up to now been one of the slowest queries in the regression tests,
this should help some little bit for buildfarm cycle times.
Back-patch to all supported branches, not so much because it's likely
that users will care much about the view's performance as because
tracking changes in the upstream IANA timezone code is really painful
if we don't keep all the branches in sync.
Discussion: https://postgr.es/m/27962.1493671706@sss.pgh.pa.us
2017-05-03 03:50:35 +02:00
|
|
|
|
2016-03-28 21:10:17 +02:00
|
|
|
* There's a fair amount of code we don't need and have removed,
|
|
|
|
including all the nonstandard optional APIs. We have also added
|
|
|
|
a few functions of our own at the bottom of localtime.c.
|
|
|
|
|
|
|
|
* In zic.c, we have added support for a -P (print_abbrevs) switch, which
|
|
|
|
is used to create the "abbrevs.txt" summary of currently-in-use zone
|
|
|
|
abbreviations that was described above.
|
|
|
|
|
|
|
|
|
|
|
|
The most convenient way to compare a new tzcode release to our code is
|
|
|
|
to first run the tzcode source files through a sed filter like this:
|
|
|
|
|
|
|
|
sed -r \
|
|
|
|
-e 's/^([ \t]*)\*\*([ \t])/\1 *\2/' \
|
|
|
|
-e 's/^([ \t]*)\*\*$/\1 */' \
|
|
|
|
-e 's|^\*/| */|' \
|
|
|
|
-e 's/\bregister[ \t]//g' \
|
2019-04-27 01:46:26 +02:00
|
|
|
-e 's/\bATTRIBUTE_PURE[ \t]//g' \
|
2020-06-18 00:29:29 +02:00
|
|
|
-e 's/int_fast32_t/int32/g' \
|
|
|
|
-e 's/int_fast64_t/int64/g' \
|
|
|
|
-e 's/intmax_t/int64/g' \
|
|
|
|
-e 's/INT32_MIN/PG_INT32_MIN/g' \
|
|
|
|
-e 's/INT32_MAX/PG_INT32_MAX/g' \
|
|
|
|
-e 's/INTMAX_MIN/PG_INT64_MIN/g' \
|
|
|
|
-e 's/INTMAX_MAX/PG_INT64_MAX/g' \
|
2016-03-28 21:10:17 +02:00
|
|
|
-e 's/struct[ \t]+tm\b/struct pg_tm/g' \
|
|
|
|
-e 's/\btime_t\b/pg_time_t/g' \
|
2018-10-31 14:47:53 +01:00
|
|
|
-e 's/lineno/lineno_t/g' \
|
2016-03-28 21:10:17 +02:00
|
|
|
|
|
|
|
and then run them through pgindent. (The first three sed patterns deal
|
|
|
|
with conversion of their block comment style to something pgindent
|
|
|
|
won't make a hash of; the remainder address other points noted above.)
|
|
|
|
After that, the files can be diff'd directly against our corresponding
|
2019-07-18 00:26:23 +02:00
|
|
|
files. Also, it's typically helpful to diff against the previous tzcode
|
|
|
|
release (after processing that the same way), and then try to apply the
|
|
|
|
diff to our files. This will take care of most of the changes
|
|
|
|
mechanically.
|