They were macros previously, but recent callsite additions made Coverity
complain about one of the assertions being always true. This change
could have been made a long time ago, but the Coverity complain broke
the inertia.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/202203241021.uts52sczx3al@alvherre.pgsql
Up to now, the planner estimated the size of a recursive query's
worktable as 10 times the size of the non-recursive term. It's hard
to see how to do significantly better than that automatically, but
we can give users control over the multiplier to allow tuning for
specific use-cases. The default behavior remains the same.
Simon Riggs
Discussion: https://postgr.es/m/CANbhV-EuaLm4H3g0+BSTYHEGxJj3Kht0R+rJ8vT57Dejnh=_nA@mail.gmail.com
Bildfarm member prairiedog reported a pgbench TAP test failure after
commit: 4a39f87acd. This commit attempts
to fix some copy&paste errors introduced in the previous commit.
Author: Yugo Nagata
Reported-by: Tom Lane
Discussion: https://postgr.es/m/2775989.1648060014%40sss.pgh.pa.us
hba.c is growing big, and more contents are planned for it. In order to
prepare for this future work, this commit moves all the code related to
the system function processing the contents of pg_hba.conf,
pg_hba_file_rules() to a new file called hbafuncs.c, which will be used
as the location for the SQL portion of the authentication file parsing.
While on it, HbaToken, the structure holding a string token lexed from a
configuration file related to authentication, is renamed to a more
generic AuthToken, as it gets used not only for pg_hba.conf, but also
for pg_ident.conf. TokenizedLine is now named TokenizedAuthLine.
The size of hba.c is reduced by ~12%.
Author: Julien Rouhaud
Reviewed-by: Aleksander Alekseev, Michael Paquier
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
It seems possible for the condition being tested to be true in
production, and nobody would never know (except when some data
eventually becomes corrupt?).
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m//202109040001.zky3wgv2qeqg@alvherre.pgsql
Commit ffd53659c4 messed up the
mechanism that was being used to pass parameters to LogStreamerMain()
on Windows. It worked on Linux because only Windows was using threads.
Repair by moving the additional parameters added by that commit into
the 'logstreamer_param' struct.
Along the way, fix a compiler warning on builds without HAVE_LIBZ.
Discussion: http://postgr.es/m/CA+TgmoY5=AmWOtMj3v+cySP2rR=Bt6EGyF_joAq4CfczMddKtw@mail.gmail.com
Invalidate abortedRecPtr and missingContrecPtr after a missing
continuation record is successfully skipped on a standby. This fixes a
PANIC caused when a recently promoted standby attempts to write an
OVERWRITE_RECORD with an LSN of the previously read aborted record.
Backpatch to 10 (all stable versions).
Author: Sami Imseih <simseih@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/44D259DE-7542-49C4-8A52-2AB01534DCA9@amazon.com
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker.
Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
When serialization or deadlock errors are reported by backend, allow
to retry and continue the benchmarking. For this purpose new options
"--max-tries", "--failures-detailed" and "--verbose-errors" are added.
Transactions with serialization errors or deadlock errors will be
repeated after rollbacks until they complete successfully or reach the
maximum number of tries (specified by the --max-tries option), or the
maximum time of tries (specified by the --latency-limit option).
These options can be specified at the same time. It is not possible to
use an unlimited number of tries (--max-tries=0) without the
--latency-limit option or the --time option. By default the option
--max-tries is set to 1, which means transactions with
serialization/deadlock errors are not retried. If the last try fails,
this transaction will be reported as failed, and the client variables
will be set as they were before the first run of this transaction.
Statistics on retries and failures are printed in the progress,
transaction / aggregation logs and in the end with other results (all
and for each script). Also retries and failures are printed
per-command with average latency by using option
(--report-per-command, -r).
Option --failures-detailed prints group failures by basic types
(serialization failures / deadlock failures).
Option --verbose-errors prints distinct reports on errors and failures
(errors without retrying) by type with detailed information like which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock failures.
Patch originally written by Marina Polyakova then Yugo Nagata
inherited the discussion and heavily modified the patch to make it
commitable.
Authors: Yugo Nagata, Marina Polyakova
Reviewed-by: Fabien Coelho, Tatsuo Ishii, Alvaro Herrera, Kevin Grittner, Andres Freund, Arthur Zakirov, Alexander Korotkov, Teodor Sigaev, Ildus Kurbangaliev
Discussion: https://postgr.es/m/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru
This introduces some of the building blocks used by the SQL/JSON
constructor and query functions. Specifically, it provides node
executor and grammar support for the FORMAT JSON [ENCODING foo]
clause, and values decorated with it, and for the RETURNING clause.
The following SQL/JSON patches will leverage these.
Nikita Glukhov (who probably deserves an award for perseverance).
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup. Erik Rijkers, Zihong Yu and
Himanshu Upadhyaya.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Commit 90efa2f556 caused some issues with EXEC_BACKEND builds and with
force_parallel_mode = regress setups. For the first issue we no longer
test if the module has been preloaded, and in fact we don't preload it,
but simply LOAD it in the test script. For the second issue we suppress
error messages emanating from parallel workers.
Mark Dilger
Discussion: https://postgr.es/m/7f6d54a1-4024-3b6e-e3ec-26cd394aac9e@dunslane.net
When cross-building to windows, or building with mingw on windows, the build
could fail with
x86_64-w64-mingw32-gcc: error: win32ver.o: No such file or director
because pg_dumpall didn't depend on WIN32RES, but it's recipe references
it. The build nevertheless succeeded most of the time, due to
pg_dump/pg_restore having the required dependency, causing win32ver.o to be
built.
Reported-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJeekpUPWW6yCVdf9=oBAcCp86RrBivo4Y4cwazAzGPng@mail.gmail.com
Backpatch: 10-, omission present on all live branches
This includes tests of both the newly added name type object access
hooks and the older Oid type hooks, and provides a useful example
of how to use the hooks.
Mark Dilger, based on some code from Joshua Brindle.
Discussion: https://postgr.es/m/47F87A0E-C0E5-43A6-89F6-D403F2B45175@enterprisedb.com
A security invoker view checks permissions for accessing its
underlying base relations using the privileges of the user of the
view, rather than the privileges of the view owner. Additionally, if
any of the base relations are tables with RLS enabled, the policies of
the user of the view are applied, rather than those of the view owner.
This allows views to be defined without giving away additional
privileges on the underlying base relations, and matches a similar
feature available in other database systems.
It also allows views to operate more naturally with RLS, without
affecting the assignments of policies to users.
Christoph Heiss, with some additional hacking by me. Reviewed by
Laurenz Albe and Wolfgang Walther.
Discussion: https://postgr.es/m/b66dd6d6-ad3e-c6f2-8b90-47be773da240%40cybertec.at
This issue is environment-sensitive, where the SSL tests could fail in
various way by feeding on defaults provided by sslcert, sslkey,
sslrootkey, sslrootcert, sslcrl and sslcrldir coming from a local setup,
as of ~/.postgresql/ by default. Horiguchi-san has reported two
failures, but more advanced testing from me (aka inclusion of garbage
SSL configuration in ~/.postgresql/ for all the configuration
parameters) has showed dozens of failures that can be triggered in the
whole test suite.
History has showed that we are not good when it comes to address such
issues, fixing them locally like in dd87799, and such problems keep
appearing. This commit strengthens the entire test suite to put an end
to this set of problems by embedding invalid default values in all the
connection strings used in the tests. The invalid values are prefixed
in each connection string, relying on the follow-up values passed in the
connection string to enforce any invalid value previously set. Note
that two tests related to CRLs are required to fail with certain pre-set
configurations, but we can rely on enforcing an empty value instead
after the invalid set of values.
Reported-by: Kyotaro Horiguchi
Reviewed-by: Andrew Dunstan, Daniel Gustafsson, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20220316.163658.1122740600489097632.horikyota.ntt@gmail.com
backpatch-through: 10
This feature allows skipping the transaction on subscriber nodes.
If incoming change violates any constraint, logical replication stops
until it's resolved. Currently, users need to either manually resolve the
conflict by updating a subscriber-side database or by using function
pg_replication_origin_advance() to skip the conflicting transaction. This
commit introduces a simpler way to skip the conflicting transactions.
The user can specify LSN by ALTER SUBSCRIPTION ... SKIP (lsn = XXX),
which allows the apply worker to skip the transaction finished at
specified LSN. The apply worker skips all data modification changes within
the transaction.
Author: Masahiko Sawada
Reviewed-by: Takamichi Osumi, Hou Zhijie, Peter Eisentraut, Amit Kapila, Shi Yu, Vignesh C, Greg Nancarrow, Haiying Tang, Euler Taveira
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
Now that 13619598f1 has split pgstat up into multiple files it isn't quite as
hard to come up with a sensible order for pgstat.[ch]. Inconsistent naming
makes it still not quite right looking, but that's work for another commit.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
The planner needs to treat GroupingFunc like Aggref for many purposes,
in particular with respect to processing of the argument expressions,
which are not to be evaluated at runtime. A few places hadn't gotten
that memo, notably including subselect.c's processing of outer-level
aggregates. This resulted in assertion failures or wrong plans for
cases in which a GROUPING() construct references an outer aggregation
level.
Also fix missing special cases for GroupingFunc in cost_qual_eval
(resulting in wrong cost estimates for GROUPING(), although it's
not clear that that would affect plan shapes in practice) and in
ruleutils.c (resulting in excess parentheses in pretty-print mode).
Per bug #17088 from Yaoguang Chen. Back-patch to all supported
branches.
Richard Guo, Tom Lane
Discussion: https://postgr.es/m/17088-e33882b387de7f5c@postgresql.org
pgstat.c is very long, and it's hard to find an order that makes sense and is
likely to be maintained over time. Splitting the different pieces into
separate files makes that a lot easier.
With a few exceptions, this commit just moves code around. Those exceptions
are:
- adding file headers for new files
- removing 'static' from functions
- adapting pgstat_assert_is_up() to work across TUs
- minor comment adjustments
git diff --color-moved=dimmed-zebra is very helpful separating code movement
from code changes.
The next commit in this series will reorder pgstat.[ch] contents to be a bit
more coherent.
Earlier revisions of this patch had "global" statistics (archiver, bgwriter,
checkpointer, replication slots, SLRU, WAL) in one file, because each seemed
small enough. However later commits will increase their size and their
aggregate size is not insubstantial. It also just seems easier to split each
type of statistic into its own file.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
For GENERATED columns, we record all dependencies of the generation
expression as AUTO dependencies of the column itself. This means
that the generated column is silently dropped if any dependency
is removed, even if CASCADE wasn't specified. This is at least
a POLA violation, but I think it's actually based on a misreading
of the standard. The standard does say that you can't drop a
dependent GENERATED column in RESTRICT mode; but that's buried down
in a subparagraph, on a different page from some pseudocode that
makes it look like an AUTO drop is being suggested.
Change this to be more like the way that we handle regular default
expressions, ie record the dependencies as NORMAL dependencies of
the pg_attrdef entry. Also, make the pg_attrdef entry's dependency
on the column itself be INTERNAL not AUTO. That has two effects:
* the column will go away, not just lose its default, if any
dependency of the expression is dropped with CASCADE. So we
don't need any special mechanism to make that happen.
* it provides an additional cross-check preventing someone from
dropping the default expression without dropping the column.
catversion bump because of change in the contents of pg_depend
(which also requires a change in one information_schema view).
Per bug #17439 from Kevin Humphreys. Although this is a longstanding
bug, it seems impractical to back-patch because of the need for
catalog contents changes.
Discussion: https://postgr.es/m/17439-7df4421197e928f0@postgresql.org
This is a pure refactoring commit: there isn't (I hope) any functional
change.
StoreAttrDefault and RemoveAttrDefault[ById] are moved from heap.c,
reducing the size of that overly-large file by about 300 lines.
I took the opportunity to trim unused #includes from heap.c, too.
Two new functions for translating between a pg_attrdef OID and the
relid/attnum of the owning column are created by extracting ad-hoc
code from objectaddress.c. This already removes one copy of said
code, and a follow-on bug fix will create more callers.
The only other function directly manipulating pg_attrdef is
AttrDefaultFetch. I judged it was better to leave that in relcache.c,
since it shares special concerns about recursion and error handling
with the rest of that module.
Discussion: https://postgr.es/m/651168.1647451676@sss.pgh.pa.us
DROP INDEX needs to lock the index's table before the index itself,
else it will deadlock against ordinary queries that acquire the
relation locks in that order. This is correctly mechanized for
plain indexes by RangeVarCallbackForDropRelation; but in the case of
a partitioned index, we neglected to lock the child tables in advance
of locking the child indexes. We can fix that by traversing the
inheritance tree and acquiring the needed locks in RemoveRelations,
after we have acquired our locks on the parent partitioned table and
index.
While at it, do some refactoring to eliminate confusion between
the actual and expected relkind in RangeVarCallbackForDropRelation.
We can save a couple of syscache lookups too, by having that function
pass back info that RemoveRelations will need.
Back-patch to v11 where partitioned indexes were added.
Jimmy Yih, Gaurab Dey, Tom Lane
Discussion: https://postgr.es/m/BYAPR05MB645402330042E17D91A70C12BD5F9@BYAPR05MB6454.namprd05.prod.outlook.com
The old name was overly generic. An upcoming commit moves relation stats
handling into its own file, making pgstat_initstats() look even more out of
place.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
A later commit will make the check more complicated than the
current (rel)->pgstat_info != NULL. It also just seems nicer to have a central
copy of the logic, even while still simple.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Valgrind animal skink shows a crash in this new code. I couldn't
reproduce the problem locally, but going by blind code inspection,
initializing insert_destrel should be sufficient to fix the problem.
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6 (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f17844104 (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com
createdb() didn't check for collation attributes validity, which has
to be done explicitly on ICU < 54. It also forgot to close the ICU collator
opened during the check which leaks some memory.
To fix both, add a new check_icu_locale() that does all the appropriate
verification and close the ICU collator.
initdb also had some partial check for ICU < 54. To have consistent error
reporting across major ICU versions, and get rid of the need to include ucol.h,
remove the partial check there. The backend will report an error if needed
during the post-boostrap iniitialization phase.
Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://www.postgresql.org/message-id/20220319041459.qqqiqh335sga5ezj@jrouhaud
A later commit will move the handling of the different kinds of stats into
separate files. By splitting out WAL handling in this commit that later move
will just move code around without other changes.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
pgstat_report_stat() handles several types of stats, yet relation stats have
so far been handled directly in pgstat_report_stat().
A later commit will move the handling of the different kinds of stats into
separate files. By splitting out relation handling in this commit that later
move will just move code around without other changes.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
b048326 has added support for SET ACCESS METHOD in ALTER TABLE, but it
has missed a few things for materialized views:
- No documentation for this clause on the ALTER MATERIALIZED VIEW page.
- psql tab completion missing.
- No regression tests.
This commit closes the gap on all the points listed above.
Author: Yugo Nagata
Discussion: https://postgr.es/m/20220316133337.5dc9740abfa24c25ec9f67f5@sraoss.co.jp
The clauses SET TABLESPACE and ALL IN TABLESPACE are supported in ALTER
MATERIALIZED VIEW for a long time, and they behave mostly like ALTER
TABLE by reusing the same code paths, but there were zero tests for
them. This commit closes the gap with new tests in tablespace.sql.
Author: Yugo Nagata
Discussion: https://postgr.es/m/20220316133337.5dc9740abfa24c25ec9f67f5@sraoss.co.jp
The output of table_to_xmlschema() and allied functions includes
a regex describing valid values for these types ... but the regex
was itself invalid, as it failed to escape a literal "+" sign.
Report and fix by Renan Soares Lopes. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/7f6fabaa-3f8f-49ab-89ca-59fbfe633105@me.com
Otherwise, the database encoding varies depending on the user's
environment, and so the test might fail depending on whether ICU
likes the encoding. In particular, the test fails completely
if the prevailing locale is C.
Teach xlogreader.c to decode the WAL into a circular buffer. This will
support optimizations based on looking ahead, to follow in a later
commit.
* XLogReadRecord() works as before, decoding records one by one, and
allowing them to be examined via the traditional XLogRecGetXXX()
macros and certain traditional members like xlogreader->ReadRecPtr.
* An alternative new interface XLogReadAhead()/XLogNextRecord() is
added that returns pointers to DecodedXLogRecord objects so that it's
now possible to look ahead in the WAL stream while replaying.
* In order to be able to use the new interface effectively while
streaming data, support is added for the page_read() callback to
respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of
waiting for more data to arrive.
No direct user of the new interface is included in this commit, though
XLogReadRecord() uses it internally. Existing code doesn't need to
change, except in a few places where it was accessing reader internals
directly and now needs to go through accessor macros.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
lz4frame.h was getting declared after the headers specific to Postgres,
but it needs to be included between postgres_fe.h and the internal
headers.
Issue introduced by babbbb5.
Reported-by: Justin Prysby
Discussion: https://postgr.es/m/20220317111220.GI28503@telsasoft.com
If a RowExpr is marked as returning a named composite type, we aren't
going to consult its colnames list; we'll use the attribute names
shown for the type in pg_attribute. Hence, skip storing that list,
to save a few nanoseconds when copying the expression tree around.
Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
In commit bf7ca1587, I had the bright idea that we could make the
result of a whole-row Var (that is, foo.*) track any column aliases
that had been applied to the FROM entry the Var refers to. However,
that's not terribly logically consistent, because now the output of
the Var is no longer of the named composite type that the Var claims
to emit. bf7ca1587 tried to handle that by changing the output
tuple values to be labeled with a blessed RECORD type, but that's
really pretty disastrous: we can wind up storing such tuples onto
disk, whereupon they're not readable by other sessions.
The only practical fix I can see is to give up on what bf7ca1587
tried to do, and say that the column names of tuples produced by
a whole-row Var are always those of the underlying named composite
type, query aliases or no. While this introduces some inconsistencies,
it removes others, so it's not that awful in the abstract. What *is*
kind of awful is to make such a behavioral change in a back-patched
bug fix. But corrupt data is worse, so back-patched it will be.
(A workaround available to anyone who's unhappy about this is to
introduce an extra level of sub-SELECT, so that the whole-row Var is
referring to the sub-SELECT's output and not to a named table type.
Then the Var is of type RECORD to begin with and there's no issue.)
Per report from Miles Delahunty. The faulty commit dates to 9.5,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/2950001.1638729947@sss.pgh.pa.us
Restructure things so that the functions which update the global
variables shared_map and local_map are separate from the functions
which just read and write relation map files without touching any
global variables.
In the new structure of things, write_relmap_file() writes a relmap
file but no longer performs global variable updates. A symmetric
function read_relmap_file() that just reads a file without changing
any global variables is added, and load_relmap_file(), which does
change the global variables, uses it as a subroutine.
Because write_relmap_file() no longer updates shared_map and
local_map, that logic is moved to perform_relmap_update(). However,
no similar logic is added to relmap_redo() even though it also calls
write_relmap_file(). That's because recovery must not rely on the
contents of the relation map, and therefore there is no need to
initialize it. In fact, doing so seems like a mistake, because we
might then manage to rely on the in-memory map where we shouldn't.
Patch by me, based on earlier work by Dilip Kumar. Reviewed by
Ashutosh Sharma.
Discussion: http://postgr.es/m/CA+TgmobQLgrt4AXsc0ru7aFFkzv=9fS-Q_yO69=k9WY67RCctg@mail.gmail.com
When publishing changes through a artition root, we should use the row
filter for the top-most ancestor. The relation may be added to multiple
publications, using different ancestors, and 52e4f0cd47 handled this
incorrectly. With c91f71b9dc we find the correct top-most ancestor, but
the code tried to fetch the row filter from all publications, including
those using a different ancestor etc. No row filter can be found for
such publications, which was treated as replicating all rows.
Similarly to c91f71b9dc, this seems to be a rare issue in practice. It
requires multiple publications including the same partitioned relation,
through different ancestors.
Fixed by only passing publications containing the top-most ancestor to
pgoutput_row_filter_init(), so that treating a missing row filter as
replicating all rows is correct.
Report and fix by me, test case by Hou zj. Reviews and improvements by
Amit Kapila.
Author: Tomas Vondra, Hou zj, Amit Kapila
Reviewed-by: Amit Kapila, Hou zj
Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
Create subroutines ExecUpdatePrologue / ExecUpdateAct /
ExecUpdateEpilogue, and similar for ExecDelete.
Introduce a new struct to be used internally in nodeModifyTable.c,
dubbed ModifyTableContext, which contains all context information needed
to perform these operations, as well as ExecInsert and others.
This allows using a different schedule and a different way of evaluating
the results of these operations, which can be exploited by a later
commit introducing support for MERGE. It also makes ExecUpdate and
ExecDelete proper shorter and (hopefully) simpler.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/202202271724.4z7xv3cf46kv@alvherre.pgsql
This adds the option to use ICU as the default locale provider for
either the whole cluster or a database. New options for initdb,
createdb, and CREATE DATABASE are used to select this.
Since some (legacy) code still uses the libc locale facilities
directly, we still need to set the libc global locale settings even if
ICU is otherwise selected. So pg_database now has three
locale-related fields: the existing datcollate and datctype, which are
always set, and a new daticulocale, which is only set if ICU is
selected. A similar change is made in pg_collation for consistency,
but in that case, only the libc-related fields or the ICU-related
field is set, never both.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
Using this system function with an in-place tablespace (created when
allow_in_place_tablespaces is enabled by specifying an empty string as
location) caused a failure when using readlink(), as the tablespace is,
in this case, not a symbolic link in pg_tblspc/ but a directory.
Rather than getting a failure, the commit changes
pg_tablespace_location() so as a relative path to the data directory is
returned for in-place tablespaces, to make a difference between
tablespaces created when allow_in_place_tablespaces is enabled or not.
Getting a path rather than an empty string that would match the CREATE
TABLESPACE command in this case is more useful for tests that would like
to rely on this function.
While on it, a regression test is added for this case. This is simple
to add in the main regression test suite thanks to regexp_replace() to
mask the part of the tablespace location dependent on its OID.
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Thomas Munro
Discussion: https://postgr.es/m/YiG1RleON1WBcLnX@paquier.xyz
Commit 83fd4532a7 allowed publishing of changes via ancestors, for
publications defined with publish_via_partition_root. But the way
the ancestor was determined in get_rel_sync_entry() was incorrect,
simply updating the same variable. So with multiple publications,
replicating different ancestors, the outcome depended on the order
of publications in the list - the value from the last loop was used,
even if it wasn't the top-most ancestor.
This is a probably rare situation, as in most cases publications do
not overlap, so each partition has exactly one candidate ancestor
to replicate as and there's no ambiguity.
Fixed by tracking the "ancestor level" for each publication, and
picking the top-most ancestor. Adds a test case, verifying the
correct ancestor is used for publishing the changes and that this
does not depend on order of publications in the list.
Older releases have another bug in this loop - once all actions are
replicated, the loop is terminated, on the assumption that inspecting
additional publications is unecessary. But that misses the fact that
those additional applications may replicate different ancestors.
Fixed by removal of this break condition. We might still terminate the
loop in some cases (e.g. when replicating all actions and the ancestor
is the partition root).
Backpatch to 13, where publish_via_partition_root was introduced.
Initial report and fix by me, test added by Hou zj. Reviews and
improvements by Amit Kapila.
Author: Tomas Vondra, Hou zj, Amit Kapila
Reviewed-by: Amit Kapila, Hou zj
Discussion: https://postgr.es/m/d26d24dd-2fab-3c48-0162-2b7f84a9c893%40enterprisedb.com
Commands like ALTER TABLE SET TABLESPACE may leave files for the next
checkpoint to clean up. If such files are not removed by the time DROP
TABLESPACE is called, we request a checkpoint so that they are deleted.
However, there is presently a window before checkpoint start where new
unlink requests won't be scheduled until the following checkpoint. This
means that the checkpoint forced by DROP TABLESPACE might not remove the
files we expect it to remove, and the following ERROR will be emitted:
ERROR: tablespace "mytblspc" is not empty
To fix, add a call to AbsorbSyncRequests() just before advancing the
unlink cycle counter. This ensures that any unlink requests forwarded
prior to checkpoint start (i.e., when ckpt_started is incremented) will
be processed by the current checkpoint. Since AbsorbSyncRequests()
performs memory allocations, it cannot be called within a critical
section, so we also need to move SyncPreCheckpoint() to before
CreateCheckPoint()'s critical section.
This is an old bug, so back-patch to all supported versions.
Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
If we run out of space in the checkpointer sync request queue (which is
hopefully rare on real systems, but common with very small buffer pool),
we wait for it to drain. While waiting, we should report that as a wait
event so that users know what is going on, and also handle postmaster
death, since otherwise the loop might never terminate if the
checkpointer has exited.
Back-patch to 12. Although the problem exists in earlier releases too,
the code is structured differently before 12 so I haven't gone any
further for now, in the absence of field complaints.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
The checkpointer shouldn't ignore its latch. Other backends may be
waiting for it to drain the request queue. Hopefully real systems don't
have a full queue often, but the condition is reached easily when
shared_buffers is small.
This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.
Back-patch only to 14. Even though the problem exists in earlier
branches too, it's hard to hit there. In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
Commit 3500ccc39b allowed for base backup
targets, meaning that we could do something with the backup other than
send it to the client, but all of those targets had to be baked in to
the core code. This commit makes it possible for extensions to define
additional backup targets.
Patch by me, reviewed by Abhijit Menon-Sen.
Discussion: http://postgr.es/m/CA+TgmoaqvdT-u3nt+_kkZ7bgDAyqDB0i-+XOMmr5JN2Rd37hxw@mail.gmail.com
These tests were added recently, but older code tests USE_LZ4 rathr
than HAVE_LIBLZ4, so let's follow the established precedent. It
also seems more consistent with the intent of the configure tests,
since I think that the USE_* symbols are intended to correspond to
what the user requested, and the HAVE_* symbols to what configure
found while probing.
Discussion: http://postgr.es/m/CA+Tgmoap+hTD2-QNPJLH4tffeFE8MX5+xkbFKMU3FKBy=ZSNKA@mail.gmail.com
This system function was being triggered once in the main regression
test suite to check its SRF configuration, and more in other test
modules but nothing checked the behavior of the options missing_ok and
include_dot_dirs. This commit adds some tests for both options, to
avoid mistakes if this code is manipulated in the future.
Extracted from a larger patch by the same author, with a few tweaks by
me.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20191227170220.GE12890@telsasoft.com
Previously, pg_basebackup from a cluster that contained an 'in-place'
tablespace, as introduced by commit 7170f215, would produce a harmless
warning on Unix and fail completely on Windows.
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com
The upper case versions "OF", "TZH", and "TZM" are already supported,
and all other format codes that are supported in upper case are also
supported in lower case, so we should support these as well for
consistency.
Nitin Jadhav, with a tiny cosmetic change by me. Reviewed by Suraj
Kharage and David Zhang.
Discussion: http://postgr.es/m/CAMm1aWZ-oZyKd75+8D=VJ0sAoSwtdXWLP-MAWD4D8R1Dgandzw@mail.gmail.com
Logical replication apply workers for a subscription can easily get stuck
in an infinite loop of attempting to apply a change, triggering an error
(such as a constraint violation), exiting with the error written to the
subscription server log, and restarting.
To partially remedy the situation, this patch adds a new subscription
option named 'disable_on_error'. To be consistent with old behavior, this
option defaults to false. When true, both the tablesync worker and apply
worker catch any errors thrown and disable the subscription in order to
break the loop. The error is still also written in the logs.
Once the subscription is disabled, users can either manually resolve the
conflict/error or skip the conflicting transaction by using
pg_replication_origin_advance() function. After resolving the conflict,
users need to enable the subscription to allow apply process to proceed.
Author: Osumi Takamichi and Mark Dilger
Reviewed-by: Greg Nancarrow, Vignesh C, Amit Kapila, Wang wei, Tang Haiying, Peter Smith, Masahiko Sawada, Shi Yu
Discussion : https://postgr.es/m/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com
Commit 872770fd6c taught VACUUM VERBOSE and autovacuum logging to
display the total number of pages scanned by VACUUM. This information
was also displayed as a percentage of rel_pages in parenthesis, which
makes it easy to spot trends over time and across tables.
The instrumentation displayed "0 scanned (0.00% of total)" for totally
empty tables. Tweak the instrumentation: have it show "0 scanned
(100.00% of total)" for empty tables instead. This approach is clearer
and more consistent.
Starting in cc50080a82 create_index test fails when run with
synchronous_commit=off. synchronous_commit=off delays when hint bits may be
set. Some plans change depending on the number of all-visible pages, which in
turn can be influenced by the delayed hint bits.
Force synchronous_commit to `on` in test_setup.sql. Not very satisfying, but
there's no obvious alternative.
Reported-By: Aleksander Alekseev <aleksander@timescale.com>
Author: Andres Freund <andres@anarazel.de>
Author: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/CAJ7c6TPJNof1Q+vJsy3QebgbPgXdu2ErPvYkBdhD6_Ckv5EZRg@mail.gmail.com
VACUUM's rel_pages field indicates the size of the target heap rel just
after the table_relation_vacuum() operation began. There are specific
expectations around how rel_pages can be related to other nearby state.
In particular, the range of rel_pages must contain every tuple in the
relation whose tuple headers might contain an XID < OldestXmin.
Consistently refer to the field as rel_pages to make this clearer and
more discoverable.
This is follow-up work to commit 73f6ec3d from earlier today.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220311031351.sbge5m2bpvy2ttxg@alap3.anarazel.de
Explain the relationship between vacuumlazy.c's vistest and OldestXmin
cutoffs. These closely related cutoffs are different in subtle but
important ways. Also document a closely related rule: we must establish
rel_pages _after_ OldestXmin to ensure that no XID < OldestXmin can be
missed by lazy_scan_heap().
It's easier to explain these issues by initializing everything together,
so consolidate initialization of vacrel state. Now almost every vacrel
field is initialized by heap_vacuum_rel(). The only remaining exception
is the dead_items array, which is still managed by lazy_scan_heap() due
to interactions with how we initialize parallel VACUUM.
Also move the process that updates pg_class entries for each index into
heap_vacuum_rel(), and adjust related assertions. All pg_class updates
now take place after lazy_scan_heap() returns, which seems clearer.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WznYsUxVT156rCQ+q=YD4S4=1M37hWvvHLz-H1pwSM8-Ew@mail.gmail.com
We called the argument totally_frozen in its function prototype as well
as in code comments, even though totally_frozen_p was used in the
function definition. Standardize on totally_frozen.
Commit 8b069ef5d changed this function to look at pg_constraint.conindid
rather than searching pg_depend. That was a good performance improvement,
but it failed to preserve the exact semantics. The old code would only
return an index that was "owned by" (internally dependent on) the
specified constraint, whereas the new code will also return indexes that
are just referenced by foreign key constraints. This confuses ALTER
TABLE, which was implicitly expecting the previous semantics, into
failing with errors like
ERROR: relation 146621 has multiple clustered indexes
or
ERROR: "pk_attbl" is not an index for table "atref"
We can fix this without reverting the performance improvement by adding
a contype check in get_constraint_index(). Another way could be to
make ALTER TABLE check it, but I'm worried that extension code could
also have subtle dependencies on the old semantics.
Tom Lane and Japin Li, per bug #17409 from Holly Roberts.
Back-patch to v14 where the error crept in.
Discussion: https://postgr.es/m/17409-52871dda8b5741cb@postgresql.org
Fail with a suitable error message instead. We can't inject the backup
manifest into the output tarfile without decompressing it, and if
we did that, we'd have to recompress the tarfile afterwards to produce
the result the user is expecting. While we have enough infrastructure
in pg_basebackup now to accomplish that whole series of steps without
much additional code, it seems like excessively surprising behavior.
The user probably did not select server-side compression with the idea
that the client was going to end up decompressing it and then
recompressing.
Report from Justin Pryzby. Fix by me.
Discussion: http://postgr.es/m/CA+Tgmob6Rnjz-Qv32h3yJn8nnUkLhrtQDAS4y5AtsgtorAFHRA@mail.gmail.com
wal_compression gains a new value, "zstd", to allow the compression of
full-page images using the compression method of the same name.
Compression is done using the default level recommended by the library,
as of ZSTD_CLEVEL_DEFAULT = 3. Some benchmarking has shown that it
could make sense to use a level lower for the FPI compression, like 1 or
2, as the compression rate did not change much with a bit less CPU
consumed, but any tests done would only cover few scenarios so it is
hard to come to a clear conclusion. Anyway, there is no reason to not
use the default level instead, which is the level recommended by the
library so it should be fine for most cases.
zstd outclasses easily pglz, and is better than LZ4 where one wants to
have more compression at the cost of extra CPU but both are good enough
in their own scenarios, so the choice between one or the other of these
comes to a study of the workload patterns and the schema involved,
mainly.
This commit relies heavily on 4035cd5, that reshaped the code creating
and restoring full-page writes to be aware of the compression type,
making this integration straight-forward.
This patch borrows some early work from Andrey Borodin, though the patch
got a complete rewrite.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220222231948.GJ9008@telsasoft.com
Per project policy, all system and library headers need to be declared
in the backend code after "postgres.h" and before the internal headers,
but 4035cd5 broke this policy when adding support for LZ4 in
wal_compression.
Noticed while reviewing the patch to add support for zstd in this area.
This only impacts HEAD, so there is no need for a back-patch.