Commit Graph

710 Commits

Author SHA1 Message Date
Tom Lane de447bb8e6 Suppress warning about stack_base_ptr with late-model GCC.
GCC 12 complains that set_stack_base is storing the address of
a local variable in a long-lived pointer.  This is an entirely
reasonable warning (indeed, it just helped us find a bug);
but that behavior is intentional here.  We can work around it
by using __builtin_frame_address(0) instead of a specific local
variable; that produces an address a dozen or so bytes different,
in my testing, but we don't care about such a small difference.
Maybe someday a compiler lacking that function will start to issue
a similar warning, but we'll worry about that when it happens.

Patch by me, per a suggestion from Andres Freund.  Back-patch to
v12, which is as far back as the patch will go without some pain.
(Recently-established project policy would permit a back-patch as
far as 9.2, but I'm disinclined to expend the work until GCC 12
is much more widespread.)

Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us
2022-02-17 22:46:01 -05:00
Peter Geoghegan 8f388f6f55 Increase hash_mem_multiplier default to 2.0.
Double the default setting for hash_mem_multiplier, from 1.0 to 2.0.
This setting makes hash-based executor nodes use twice the usual
work_mem limit.

The PostgreSQL 15 release notes should have a compatibility note about
this change.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzndc_ROk6CY-bC6p9O53q974Y0Ey4WX8jcPbuTZYM4Q3A@mail.gmail.com
2022-02-16 18:41:52 -08:00
Andres Freund 2f6501fa3c Move replication slot release to before_shmem_exit().
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.

To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.

Patch by Masahiko Sawada, with very minor polishing by me.

I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.

Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-14 17:08:17 -08:00
Peter Eisentraut 37851a8b83 Database-level collation version tracking
This adds to database objects the same version tracking that collation
objects have.  There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.

This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported.  But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com
2022-02-14 08:27:26 +01:00
Robert Haas aa64f23b02 Remove MaxBackends variable in favor of GetMaxBackends() function.
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.

Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.

Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
2022-02-08 15:53:19 -05:00
Robert Haas 5ef1eefd76 Allow archiving via loadable modules.
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.

Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.

Nathan Bossart, with a little bit of kibitzing by me.

Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
2022-02-03 14:05:02 -05:00
Alvaro Herrera b3d7d6e462
Remove xloginsert.h from xlog.h
xlog.h is directly and indirectly #included in a lot of places.  With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
2022-01-30 12:25:24 -03:00
Michael Paquier 5ecd0183fb Fix comments about bgworker registration before MaxBackends initialization
Since 6bc8ef0b, InitializeMaxBackends() has used max_worker_processes
instead of adapting MaxBackends to the number of background workers
registered by modules loaded in shared_preload_libraries (at this time,
bgworkers were only static, but gained dynamic capabilities as a matter
of supporting parallel queries meaning that a control cap was
necessary).

Some comments referred to the past registration logic, making them
confusing and incorrect, so fix these.

Some of the out-of-core modules that could be loaded in this path
sometimes like to manipulate dynamically some of the resource-related
GUCs for their own needs, this commit adds a note about that.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20220127181815.GA551692@nathanxps13
2022-01-29 10:47:36 +09:00
Peter Eisentraut 54637508f8 Change collate and ctype fields to type text
This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically only a few bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
2022-01-27 08:54:25 +01:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Robert Haas fa0e03c15a Remove InitXLOGAccess().
It's not great that RecoveryInProgress() calls InitXLOGAccess(),
because a status inquiry function typically shouldn't have the side
effect of performing initializations. We could fix that by calling
InitXLOGAccess() from some other place, but instead, let's remove it
altogether.

One thing InitXLogAccess() did is initialize wal_segment_size, but it
doesn't need to do that. In the postmaster, PostmasterMain() calls
LocalProcessControlFile(), and all child processes will inherit that
value -- except in EXEC_BACKEND bulds, but then each backend runs
SubPostmasterMain() which also calls LocalProcessControlFile().

The other thing InitXLOGAccess() did is update RedoRecPtr and
doPageWrites, but that's not critical, because all code that uses
them will just retry if it turns out that they've changed. The
only difference is that most code will now see an initial value that
is definitely invalid instead of one that might have just been way
out of date, but that will only happen once per backend lifetime,
so it shouldn't be a big deal.

Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres
Freund, Heikki Linnakangas, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-12-13 09:58:36 -05:00
Andres Freund 3b34645678 Initialize backend status reporting during bootstrap.
This allows a later commit to reduce the number of branches in performance
sensitive functions during normal running, compared to a very minor saving
during bootstrapping.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_Yeg+vh6SHNEo1+=O7e-BPX35cU0XQM=YwQRnkFyv_y+w@mail.gmail.com
2021-11-19 08:43:12 -08:00
Robert Haas e51c46991f Move InitXLogInsert() call from InitXLOGAccess() to BaseInit().
At present, there is an undocumented coding rule that you must call
RecoveryInProgress(), or do something else that results in a call
to InitXLogInsert(), before trying to write WAL. Otherwise, the
WAL construction buffers won't be initialized, resulting in
failures.

Since it's not good to rely on a status inquiry function like
RecoveryInProgress() having the side effect of initializing
critical data structures, instead do the initialization eariler,
when the backend first starts up.

Patch by me. Reviewed by Nathan Bossart and Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
2021-11-16 09:43:17 -05:00
Andres Freund 1d5135f004 Fix typo.
Reported-By: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/YRIlNQhLNfx555Nx@paquier.xyz
2021-08-13 05:44:03 -07:00
Andres Freund 675c945394 Move temporary file cleanup to before_shmem_exit().
As reported by a few OSX buildfarm animals there exist at least one path where
temporary files exist during AtProcExit_Files() processing. As temporary file
cleanup causes pgstat reporting, the assertions added in ee3f8d3d3a caused
failures.

This is not an OSX specific issue, we were just lucky that timing on OSX
reliably triggered the problem.  The known way to cause this is a FATAL error
during perform_base_backup() with a MANIFEST used - adding an elog(FATAL)
after InitializeBackupManifest() reliably reproduces the problem in isolation.

The problem is that the temporary file created in InitializeBackupManifest()
is not cleaned up via resource owner cleanup as WalSndResourceCleanup()
currently is only used for non-FATAL errors. That then allows to reach
AtProcExit_Files() with existing temporary files, causing the assertion
failure.

To fix this problem, move temporary file cleanup to a before_shmem_exit() hook
and add assertions ensuring that no temporary files are created before / after
temporary file management has been initialized / shut down. The cleanest way
to do so seems to be to split fd.c initialization into two, one for plain file
access and one for temporary file access.

Right now there's no need to perform further fd.c cleanup during process exit,
so I just renamed AtProcExit_Files() to BeforeShmemExit_Files(). Alternatively
we could perform another pass through the files to check that no temporary
files exist, but the added assertions seem to provide enough protection
against that.

It might turn out that the assertions added in ee3f8d3d3a will cause too much
noise - in that case we'll have to downgrade them to a WARNING, at least
temporarily.

This commit is not necessarily the best approach to address this issue, but it
should resolve the buildfarm failures. We can revise later.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210807190131.2bm24acbebl4wl6i@alap3.anarazel.de
2021-08-07 19:20:47 -07:00
Andres Freund a1bb3d5dbe Schedule ShutdownXLOG() in single user mode using before_shmem_exit().
Previously on_shmem_exit() was used. The upcoming shared memory stats patch
uses DSM segments to store stats, which can not be used after the
dsm_backend_shutdown() call in shmem_exit().  There does not seem to be any
reason to do ShutdownXLOG() via on_shmem_exit(), so change it.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
Discussion: https://postgr.es/m/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de
2021-08-06 19:10:32 -07:00
Andres Freund ee3f8d3d3a pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV.
Previously pgstat_initialize() was called in InitPostgres() and
AuxiliaryProcessMain(). As it turns out there was at least one case where we
reported stats before pgstat_initialize() was called, see
AutoVacWorkerMain()'s intentionally early call to pgstat_report_autovac().

This turns out to not be a problem with the current pgstat implementation as
pgstat_initialize() only registers a shutdown callback. But in the shared
memory based stats implementation we are working towards pgstat_initialize()
has to do more work.

After b406478b87 BaseInit() is a central place where initialization shared by
normal backends and auxiliary backends can be put. Obviously BaseInit() is
called before InitPostgres() registers ShutdownPostgres. Previously
ShutdownPostgres was the first before_shmem_exit callback, now that's commonly
pgstats. That should be fine.

Previously pgstat_initialize() was not called in bootstrap mode, but there
does not appear to be a need for that. It's now done unconditionally.

To detect future issues like this, assertions are added to a few places
verifying that the pgstat subsystem is initialized and not yet shut down.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-06 19:05:59 -07:00
Andres Freund b406478b87 process startup: Always call Init[Auxiliary]Process() before BaseInit().
For EXEC_BACKEND InitProcess()/InitAuxiliaryProcess() needs to have been
called well before we call BaseInit(), as SubPostmasterMain() needs LWLocks to
work. Having the order of initialization differ between platforms makes it
unnecessarily hard to understand the system and to add initialization points
for new subsystems without a lot of duplication.

To be able to change the order, BaseInit() cannot trigger
CreateSharedMemoryAndSemaphores() anymore - obviously that needs to have
happened before we can call InitProcess(). It seems cleaner to create shared
memory explicitly in single user/bootstrap mode anyway.

After this change the separation of bufmgr initialization into
InitBufferPoolAccess() / InitBufferPoolBackend() is not meaningful anymore so
the latter is removed.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 15:36:59 -07:00
Andres Freund 07bf378509 process startup: Centralize pgwin32_signal_initialize() calls.
For one, the existing location lead to somewhat awkward code in main(). For
another, the new location is easier to understand anyway.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
2021-08-05 12:36:06 -07:00
Fujii Masao 43620e3286 Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.

Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.

When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.

There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.

Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 13:44:15 +09:00
Andres Freund e1025044cd Split backend status and progress related functionality out of pgstat.c.
Backend status (supporting pg_stat_activity) and command
progress (supporting pg_stat_progress*) related code is largely
independent from the rest of pgstat.[ch] (supporting views like
pg_stat_all_tables that accumulate data over time). See also
a333476b92.

This commit doesn't rename the function names to make the distinction
from the rest of pgstat_ clearer - that'd be more invasive and not
clearly beneficial. If we were to decide to do such a rename at some
point, it's better done separately from moving the code as well.

Robert's review was of an earlier version.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-03 11:42:52 -07:00
Thomas Munro c30f54ad73 Detect POLLHUP/POLLRDHUP while running queries.
Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension (also adopted by at
least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
don't have a reliable way to know if a connection was closed without
actually trying to read or write.

In future we might consider trying to send a no-op/heartbeat message
instead, but that could require protocol changes.

Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2021-04-03 09:02:41 +13:00
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Michael Paquier f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Thomas Munro a042ba2ba7 Introduce symbolic names for FeBeWaitSet positions.
Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:10:16 +13:00
Thomas Munro 6a2a70a020 Use signalfd(2) for epoll latches.
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe.  Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:12:02 +13:00
Peter Eisentraut 0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 622ae4621e Fix assorted issues in backend's GSSAPI encryption support.
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync.  Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.

Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.

Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.

Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)

Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus().  This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.

Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.

Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.

Per report from Mikael Gustavsson.  Back-patch to v12 where
this code was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:44:17 -05:00
Stephen Frost dc11f31a1a Add GSS information to connection authorized log message
GSS information (if used) such as if the connection was authorized using
GSS or if it was encrypted using GSS, and perhaps most importantly, what
the GSS principal used for the authentication was, is extremely useful
but wasn't being included in the connection authorized log message.

Therefore, add to the connection authorized log message that
information, in a similar manner to how we log SSL information when SSL
is used for a connection.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CALDaNm2N1385_Ltoo%3DS7VGT-ESu_bRQa-sC1wg6ikrM2L2Z49w%40mail.gmail.com
2020-12-02 14:41:53 -05:00
Peter Eisentraut bdc4edbea6 Move catalog index declarations
Move the system catalog index declarations from catalog/indexing.h to
the respective parent tables' catalog/pg_*.h files.  The original
reason for having it split was that the old genbki system produced the
output in the order of the catalog files it read, so all the indexing
stuff needed to come separately.  But this is no longer the case, and
keeping it together makes more sense.

Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com
2020-11-07 12:26:24 +01:00
Tom Lane 44fc6e259b Centralize setup of SIGQUIT handling for postmaster child processes.
We decided that the policy established in commit 7634bd4f6 for
the bgwriter, checkpointer, walwriter, and walreceiver processes,
namely that they should accept SIGQUIT at all times, really ought
to apply uniformly to all postmaster children.  Therefore, get
rid of the duplicative and inconsistent per-process code for
establishing that signal handler and removing SIGQUIT from BlockSig.
Instead, make InitPostmasterChild do it.

The handler set up by InitPostmasterChild is SignalHandlerForCrashExit,
which just summarily does _exit(2).  In interactive backends, we
almost immediately replace that with quickdie, since we would prefer
to try to tell the client that we're dying.  However, this patch is
changing the behavior of autovacuum (both launcher and workers), as
well as walsenders.  Those processes formerly also used quickdie,
but AFAICS that was just mindless copy-and-paste: they don't have
any interactive client that's likely to benefit from being told this.

The stats collector continues to be an outlier, in that it thinks
SIGQUIT means normal exit.  That should probably be changed for
consistency, but there's another patch set where that's being
dealt with, so I didn't do so here.

Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us
2020-09-16 16:04:36 -04:00
Tom Lane 0038f94387 Fix postmaster's behavior during smart shutdown.
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit.  No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:

* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive.  There is more work needed in that area IMO.)

* Autovacuum ceases to function.  We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess.  In the worst case
the system could reach a forced shutdown to prevent XID wraparound.

* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.

Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections.  Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain.  A subsidiary state variable tracks
whether or not we're letting in new connections in those states.

This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes.  I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.

Back-patch to 9.6 where parallel query was added.  In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.

Per report from Bharath Rupireddy.

Patch by me, reviewed by Thomas Munro

Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
2020-08-14 13:26:57 -04:00
Andres Freund dc7420c2c9 snapshot scalability: Don't compute global horizons while building snapshots.
To make GetSnapshotData() more scalable, it cannot not look at at each proc's
xmin: While snapshot contents do not need to change whenever a read-only
transaction commits or a snapshot is released, a proc's xmin is modified in
those cases. The frequency of xmin modifications leads to, particularly on
higher core count systems, many cache misses inside GetSnapshotData(), despite
the data underlying a snapshot not changing. That is the most
significant source of GetSnapshotData() scaling poorly on larger systems.

Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
thresholds as it has so far. But we don't really have to: The horizons don't
actually change that much between GetSnapshotData() calls. Nor are the horizons
actually used every time a snapshot is built.

The trick this commit introduces is to delay computation of accurate horizons
until there use and using horizon boundaries to determine whether accurate
horizons need to be computed.

The use of RecentGlobal[Data]Xmin to decide whether a row version could be
removed has been replaces with new GlobalVisTest* functions.  These use two
thresholds to determine whether a row can be pruned:
1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
   are definitely still visible.
2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
   definitely be removed
GetSnapshotData() updates definitely_needed to be the xmin of the computed
snapshot.

When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
definitely_needed) the boundaries can be recomputed to be more accurate. As it
is not cheap to compute accurate boundaries, we limit the number of times that
happens in short succession.  As the boundaries used by
GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
GetSnapshotData()), it is likely that further test can benefit from an earlier
computation of accurate horizons.

To avoid regressing performance when old_snapshot_threshold is set (as that
requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
computation of the limited horizon, and the triggering of errors (with
SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
tuples.

This commit just removes the accesses to PGXACT->xmin from
GetSnapshotData(), but other members of PGXACT residing in the same
cache line are accessed. Therefore this in itself does not result in a
significant improvement. Subsequent commits will take advantage of the
fact that GetSnapshotData() now does not need to access xmins anymore.

Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the tests
currently are not meaningful, and it seems best to address them separately.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-12 16:03:49 -07:00
Thomas Munro 3347c982ba Use a long lived WaitEventSet for WaitLatch().
Create LatchWaitSet at backend startup time, and use it to implement
WaitLatch().  This avoids repeated epoll/kqueue setup and teardown
system calls.

Reorder SubPostmasterMain() slightly so that we restore the postmaster
pipe and Windows signal emulation before we reach InitPostmasterChild(),
to make this work in EXEC_BACKEND builds.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2020-07-30 17:40:00 +12:00
Peter Geoghegan d6c08e29e7 Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem.  It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.

The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit.  It is intended to enable admin
tuning of the executor's memory usage.  Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).

The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value.  This means that hash-based nodes continue to apply
work_mem in the traditional way by default.

hash_mem_multiplier is generally useful.  However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0).  While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited.  Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).

hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem.  This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit.  Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).

The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation.  That can be taken care of by a later commit.

Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 14:14:58 -07:00
Andres Freund e07633646a code: replace 'master' with 'leader' where appropriate.
Leader already is the more widely used terminology, but a few places
didn't get the message.

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:58:32 -07:00
Peter Geoghegan 947456a823 Initialize work_mem using current guc.c default.
Do the same for the maintenance_work_mem global variable.

Oversight in commit 848ae330a4, which increased the previous defaults
for work_mem and maintenance_work_mem by 4X.
2020-07-02 16:34:54 -07:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Andres Freund fc3f4453a2 Recompute stack base in forked postmaster children.
This is for the benefit of running postgres under the rr
debugger. When using rr signal handlers running while a syscall is
active use an alternative stack. As e.g. bgworkers are started from
within signal handlers, the forked backend then has a different stack
base than postmaster. Previously that subsequently lead to those
processes triggering spurious "stack depth limit exceeded" errors.

Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de
2020-04-05 18:23:30 -07:00
Peter Eisentraut 8e8a0becb3 Unify several ways to tracking backend type
Add a new global variable MyBackendType that uses the same BackendType
enum that was previously only used by the stats collector.  That way
several duplicate ways of checking what type a particular process is
can be simplified.  Since it's no longer just for stats, move to
miscinit.c and rename existing functions to match the expanded
purpose.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-13 14:01:10 +01:00
Peter Eisentraut bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Peter Eisentraut 1c91838181 Clean up order in miscinit.c a bit
The code around InitPostmasterChild() from commit 31c453165b somehow
ended up in the middle of a block of code related to "User ID state".
Move it into its own block instead.
2020-03-11 13:51:55 +01:00
Tom Lane 481c8e9232 Assume that we have utime() and <utime.h>.
These are required by POSIX since SUSv2, and no live platforms fail
to provide them.  On Windows, utime() exists and we bring our own
<utime.h>, so we're good there too.  So remove the configure probes
and ad-hoc substitute code.  We don't need to check for utimes()
anymore either, since that was only used as a substitute.

In passing, make the Windows build include <sys/utime.h> only where
we need it, not everywhere.

This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code.  I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.

Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
2020-02-21 14:30:47 -05:00
Thomas Munro 701a51fd4e Use pg_pwrite() in more places.
This removes some lseek() system calls.

Author: Thomas Munro
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGJ%2BoHhnvqjn3%3DHro7xu-YDR8FPr0FL6LF35kHRX%3D_bUzg%40mail.gmail.com
2020-02-11 17:50:22 +13:00
Alvaro Herrera 15d13e8291 Make vacuum buffer counters 64 bits wide
Using 32 bit counters means they can now realistically wrap around when
vacuuming extremely large tables.  Because they're signed integers,
stats printed by vacuum look very odd when they do.

We'd love to backpatch this, but refrain because the variables are
exported and could cause third-party code to break.

Reviewed-by: Julien Rouhaud, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/20200131205926.GA16367@alvherre.pgsql
2020-02-05 16:59:29 -03:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Robert Haas 16a4e4aecd Extend the ProcSignal mechanism to support barriers.
A new function EmitProcSignalBarrier() can be used to emit a global
barrier which all backends that participate in the ProcSignal
mechanism must absorb, and a new function WaitForProcSignalBarrier()
can be used to wait until all relevant backends have in fact
absorbed the barrier.

This can be used to coordinate global state changes, such as turning
checksums on while the system is running.

There's no real client of this mechanism yet, although two are
proposed, but an enum has to have at least one element, so this
includes a placeholder type (PROCSIGNAL_BARRIER_PLACEHOLDER) which
should be replaced by the first real client of this mechanism to
get committed.

Andres Freund and Robert Haas, reviewed by Daniel Gustafsson and,
in earlier versions, by Magnus Hagander.

Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
2019-12-19 14:56:20 -05:00
Robert Haas 7dbfea3c45 Partially deduplicate interrupt handling for background processes.
Where possible, share signal handler code and main loop interrupt
checking. This saves quite a bit of code and should simplify
maintenance, too.

This commit intends not to change the way anything works, even
though that might allow more code to be unified. It does unify
a bunch of individual variables into a ShutdownRequestPending
flag that has is now used by a bunch of different process types,
though.

Patch by me, reviewed by Andres Freund and Daniel Gustafsson.

Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
2019-12-17 13:14:28 -05:00
Amit Kapila 2d0fdfacce Fix typos in miscinit.c.
Commit f13ea95f9e moved the description of postmaster.pid file contents
from miscadmin.h to pidfile.h, but missed to update the comments in
miscinit.c.

Author: Hadi Moshayedi
Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CAK=1=WpYEM9x3LGkaxgXaxeYQjnkdW8XLsxrYRTE2Gq-H83FMw@mail.gmail.com
2019-12-09 08:39:34 +05:30
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Andres Freund 01368e5d9d Split all OBJS style lines in makefiles into one-line-per-entry style.
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.

By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.

Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
2019-11-05 14:41:07 -08:00
Michael Paquier 9555cc8d2b Revert hooks for session start and end, take two
The location of the session end hook has been chosen so as it is
possible to allow modules to do their own transactions, however any
trying to any any subsystem which went through before_shmem_exit()
would cause issues, limiting the pluggability of the hook.

Per discussion with Tom Lane and Andres Freund.

Discussion: https://postgr.es/m/18722.1569906636@sss.pgh.pa.us
2019-10-02 09:55:27 +09:00
Michael Paquier e788bd924c Add hooks for session start and session end, take two
These hooks can be used in loadable modules.  A simple test module is
included.

The first attempt was done with cd8ce3a but we lacked handling for
NO_INSTALLCHECK in the MSVC scripts (problem solved afterwards by
431f1599) so the buildfarm got angry.  This also fixes a couple of
issues noticed upon review compared to the first attempt, so the code
has slightly changed, resulting in a more simple test module.

Author: Fabrízio de Royes Mello, Yugo Nagata
Reviewed-by: Andrew Dunstan, Michael Paquier, Aleksandr Parfenov
Discussion: https://postgr.es/m/20170720204733.40f2b7eb.nagata@sraoss.co.jp
Discussion: https://postgr.es/m/20190823042602.GB5275@paquier.xyz
2019-10-01 12:15:25 +09:00
Tom Lane 7de19fbc0b Use data directory inode number, not port, to select SysV resource keys.
This approach provides a much tighter binding between a data directory
and the associated SysV shared memory block (and SysV or named-POSIX
semaphores, if we're using those).  Key collisions are still possible,
but only between data directories stored on different filesystems,
so the situation should be negligible in practice.  More importantly,
restarting the postmaster with a different port number no longer
risks failing to identify a relevant shared memory block, even when
postmaster.pid has been removed.  A standalone backend is likewise
much more certain to detect conflicting leftover backends.

(In the longer term, we might now think about deprecating the port as
a cluster-wide value, so that one postmaster could support sockets
with varying port numbers.  But that's for another day.)

The hazards fixed here apply only on Unix systems; our Windows code
paths already use identifiers derived from the data directory path
name rather than the port.

src/test/recovery/t/017_shm.pl, which intends to test key-collision
cases, has been substantially rewritten since it can no longer use
two postmasters with identical port numbers to trigger the case.
Instead, use Perl's IPC::SharedMem module to create a conflicting
shmem segment directly.  The test script will be skipped if that
module is not available.  (This means that some older buildfarm
members won't run it, but I don't think that that results in any
meaningful coverage loss.)

Patch by me; thanks to Noah Misch and Peter Eisentraut for discussion
and review.

Discussion: https://postgr.es/m/16908.1557521200@sss.pgh.pa.us
2019-09-05 13:31:46 -04:00
Tom Lane 1cff1b95ab Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link.  We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells.  That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.

In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links.  Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header.  (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)

Of course this is not without downsides.  The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.

For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state.  We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value.  (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.

The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext().  The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell().  However, we no longer
need a previous-cell pointer to delete a list cell efficiently.  Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)

There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents.  To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen.  This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).

There are two notable API differences from the previous code:

* lnext() now requires the List's header pointer in addition to the
current cell's address.

* list_delete_cell() no longer requires a previous-cell argument.

These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.

Programmers should be aware of these significant performance changes:

* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.

* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.)  Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long.  Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.

* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index.  These have the
data-movement penalty explained above, but there's no search penalty.

* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists.  The second argument is
now declared "const List *" to reflect that it isn't changed.

This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests.  As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.

Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit.  Code using those should have been gone a dozen years ago.

Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 13:41:58 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Noah Misch c098509927 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer stop if shmat() of an old segment fails with EACCES.  A
postmaster will no longer recycle segments pertaining to other data
directories.  That's good for production, but it's bad for integration
tests that crash a postmaster and immediately delete its data directory.
Such a test now leaks a segment indefinitely.  No "make check-world"
test does that.  win32_shmem.c already avoided all these problems.  In
9.6 and later, enhance PostgresNode to facilitate testing.  Back-patch
to 9.4 (all supported versions).

Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com
2019-04-12 22:36:38 -07:00
Noah Misch 82150a05be Revert "Consistently test for in-use shared memory."
This reverts commits 2f932f71d9,
16ee6eaf80 and
6f0e190056.  The buildfarm has revealed
several bugs.  Back-patch like the original commits.

Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com
2019-04-05 00:00:52 -07:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Noah Misch 2f932f71d9 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer recycle segments pertaining to other data directories.
That's good for production, but it's bad for integration tests that
crash a postmaster and immediately delete its data directory.  Such a
test now leaks a segment indefinitely.  No "make check-world" test does
that.  win32_shmem.c already avoided all these problems.  In 9.6 and
later, enhance PostgresNode to facilitate testing.  Back-patch to 9.4
(all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com
2019-04-03 17:03:46 -07:00
Noah Misch 5a907404b5 Update HINT for pre-existing shared memory block.
One should almost always terminate an old process, not use a manual
removal tool like ipcrm.  Removal of the ipcclean script eleven years
ago (39627b1ae6) and its non-replacement
corroborate that manual shm removal is now a niche goal.  Back-patch to
9.4 (all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20180812064815.GB2301738@rfd.leadboat.com
2019-03-31 19:32:48 -07:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Tom Lane 52985e4fea Revert "Increase the default vacuum_cost_limit from 200 to 2000"
This reverts commit bd09503e63.

Per discussion, it seems like what we should do instead is to
reduce the default value of autovacuum_vacuum_cost_delay by the
same factor.  That's functionally equivalent as long as the
platform can accurately service the smaller delay request, which
should be true on anything released in the last 10 years or more.
And smaller, more-closely-spaced delays are better in terms of
providing a steady I/O load.

Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
2019-03-10 15:05:25 -04:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Andrew Dunstan bd09503e63 Increase the default vacuum_cost_limit from 200 to 2000
The original 200 default value was set back in f425b605f4 when the cost
delay settings were first added.  Hardware has improved quite a bit since
then and we've also made improvements such as sorting buffers during
checkpoints (9cd00c457e) which should result in less random writes.

This low default value was reportedly causing problems with badly
configured servers and in the absence of a native method to remove
excessive bloat from tables without incurring an AccessExclusiveLock, this
often made cleaning up the damage caused by badly configured auto-vacuums
difficult.

It seems more likely that someone will notice that auto-vacuum is running
too quickly than too slowly, so let's go all out and multiple the default
value for the setting by 10.  With the default vacuum_cost_page_dirty and
autovacuum_vacuum_cost_delay (assuming a page size of 8192 bytes), this
allows autovacuum a theoretical maximum dirty write rate of around 39MB/s
instead of just 3.9MB/s.

Author: David Rowley

Discussion: https://postgr.es/m/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_rYvw@mail.gmail.com
2019-03-06 09:10:12 -05:00
Michael Paquier ea92368cd1 Move max_wal_senders out of max_connections for connection slot handling
Since its introduction, max_wal_senders is counted as part of
max_connections when it comes to define how many connection slots can be
used for replication connections with a WAL sender context.  This can
lead to confusion for some users, as it could be possible to block a
base backup or replication from happening because other backend sessions
are already taken for other purposes by an application, and
superuser-only connection slots are not a correct solution to handle
that case.

This commit makes max_wal_senders independent of max_connections for its
handling of PGPROC entries in ProcGlobal, meaning that connection slots
for WAL senders are handled using their own free queue, like autovacuum
workers and bgworkers.

One compatibility issue that this change creates is that a standby now
requires to have a value of max_wal_senders at least equal to its
primary.  So, if a standby created enforces the value of
max_wal_senders to be lower than that, then this could break failovers.
Normally this should not be an issue though, as any settings of a
standby are inherited from its primary as postgresql.conf gets normally
copied as part of a base backup, so parameters would be consistent.

Author: Alexander Kukushkin
Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii
Kliukin
Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-12 10:07:56 +09:00
Andres Freund e7cc78ad43 Remove superfluous tqual.h includes.
Most of these had been obsoleted by 568d4138c / the SnapshotNow
removal.

This is is preparation for moving most of tqual.[ch] into either
snapmgr.h or heapam.h, which in turn is in preparation for pluggable
table AMs.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 12:15:02 -08:00
Andres Freund e0c4ec0728 Replace uses of heap_open et al with the corresponding table_* function.
Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:37 -08:00
Andres Freund 0944ec54de Don't include genam.h from execnodes.h and relscan.h anymore.
This is the genam.h equivalent of 4c850ecec6 (which removed
heapam.h from a lot of other headers).  There's still a few header
includes of genam.h, but not from central headers anymore.

As a few headers are not indirectly included anymore, execnodes.h and
relscan.h need a few additional includes. Some of the depended on
types were replacable by using the underlying structs, but e.g. for
Snapshot in execnodes.h that'd have gotten more invasive than
reasonable in this commit.

Like the aforementioned commit 4c850ecec6, this requires adding new
genam.h includes to a number of backend files, which likely is also
required in a few external projects.

Author: Andres Freund
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 17:02:12 -08:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Andres Freund 578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Thomas Munro 197e4af9d5 Refactor pid, random seed and start time initialization.
Background workers, including parallel workers, were generating
the same sequence of numbers in random().  This showed up as DSM
handle collisions when Parallel Hash created multiple segments,
but any code that calls random() in background workers could be
affected if it cares about different backends generating different
numbers.

Repair by making sure that all new processes initialize the seed
at the same time as they set MyProcPid and MyStartTime in a new
function InitProcessGlobals(), called by the postmaster, its
children and also standalone processes.  Also add a new high
resolution MyStartTimestamp as a potentially useful by-product,
and remove SessionStartTime from struct Port as it is now
redundant.

No back-patch for now, as the known consequences so far are just
a bunch of harmless shm_open(O_EXCL) collisions.

Author: Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D2eJj_6%3DB%2B2tEpGu2nf1BjthCf9nXXUouYvJJ4C5WSwhg%40mail.gmail.com
2018-10-19 13:59:28 +13:00
Stephen Frost 8bddc86400 Add application_name to connection authorized msg
The connection authorized message has quite a bit of useful information
in it, but didn't include the application_name (when provided), so let's
add that as it can be very useful.

Note that at the point where we're emitting the connection authorized
message, we haven't processed GUCs, so it's not possible to get this by
using log_line_prefix (which pulls from the GUC).  There's also
something to be said for having this included in the connection
authorized message and then not needing to repeat it for every line, as
having it in log_line_prefix would do.

The GUC cleans the application name to pure-ascii, so do that here too,
but pull out the logic for cleaning up a string into its own function
in common and re-use it from those places, and check_cluster_name which
was doing the same thing.

Author: Don Seiler <don@seiler.us>
Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com
2018-09-28 19:04:50 -04:00
Michael Paquier ba16aade33 Switch flags tracking pending interrupts to sig_atomic_t
Those previously used bool, which should be safe on any modern
platforms, however the C standard is clear that it is better to use
sig_atomic_t for variables manipulated in signal handlers.  This commit
adds at the same time PGDLLIMPORT to ClientConnectionLost.

Author: Michael Paquier
Reviewed-by: Tom Lane, Chris Travers, Andres Freund
Discussion: https://postgr.es/m/20180925011311.GD1354@paquier.xyz
2018-09-27 07:47:20 +09:00
Noah Misch d18f6674bd Initialize random() in bootstrap/stand-alone postgres and in initdb.
This removes a difference between the standard IsUnderPostmaster
execution environment and that of --boot and --single.  In a stand-alone
backend, "SELECT random()" always started at the same seed.

On a system capable of using posix shared memory, initdb could still
conclude "selecting dynamic shared memory implementation ... sysv".
Crashed --boot or --single postgres processes orphaned shared memory
objects having names that collided with the not-actually-random names
that initdb probed.  The sysv fallback appeared after ten crashes of
--boot or --single postgres.  Since --boot and --single are rare in
production use, systems used for PostgreSQL development are the
principal candidate to notice this symptom.

Back-patch to 9.3 (all supported versions).  PostgreSQL 9.4 introduced
dynamic shared memory, but 9.3 does share the "SELECT random()" problem.

Reviewed by Tom Lane and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20180915221546.GA3159382@rfd.leadboat.com
2018-09-23 22:56:39 -07:00
Tom Lane 3cb646264e Use a ResourceOwner to track buffer pins in all cases.
Historically, we've allowed auxiliary processes to take buffer pins without
tracking them in a ResourceOwner.  However, that creates problems for error
recovery.  In particular, we've seen multiple reports of assertion crashes
in the startup process when it gets an error while holding a buffer pin,
as for example if it gets ENOSPC during a write.  In a non-assert build,
the process would simply exit without releasing the pin at all.  We've
gotten away with that so far just because a failure exit of the startup
process translates to a database crash anyhow; but any similar behavior
in other aux processes could result in stuck pins and subsequent problems
in vacuum.

To improve this, institute a policy that we must *always* have a resowner
backing any attempt to pin a buffer, which we can enforce just by removing
the previous special-case code in resowner.c.  Add infrastructure to make
it easy to create a process-lifespan AuxProcessResourceOwner and clear
out its contents at appropriate times.  Replace existing ad-hoc resowner
management in bgwriter.c and other aux processes with that.  (Thus, while
the startup process gains a resowner where it had none at all before, some
other aux process types are replacing an ad-hoc resowner with this code.)
Also use the AuxProcessResourceOwner to manage buffer pins taken during
StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap
process or a standalone backend rather than a true auxiliary process.

In passing, remove some other ad-hoc resource owner creations that had
gotten cargo-culted into various other places.  As far as I can tell
that was all unnecessary, and if it had been necessary it was incomplete,
due to lacking any provision for clearing those resowners later.
(Also worth noting in this connection is that a process that hasn't called
InitBufferPoolBackend has no business accessing buffers; so there's more
to do than just add the resowner if we want to touch buffers in processes
not covered by this patch.)

Although this fixes a very old bug, no back-patch, because there's no
evidence of any significant problem in non-assert builds.

Patch by me, pursuant to a report from Justin Pryzby.  Thanks to
Robert Haas and Kyotaro Horiguchi for reviews.

Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com
2018-07-18 12:15:16 -04:00
Thomas Munro e8d9caa436 Accept invalidation messages in InitializeSessionUserId().
If the authentication method modified the system catalogs through a
separate database connection (say, to create a new role on the fly),
make sure syscache sees the changes before we try to find the user.

Author: Thomas Munro
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3_h0_cgmz5PMyab4xk_OFrg6G5VCN%3DnF4chFXM9iFOqA%40mail.gmail.com
2018-07-13 16:19:15 +12:00
Thomas Munro 9f09529952 Use signals for postmaster death on Linux.
Linux provides a way to ask for a signal when your parent process dies.
Use that to make PostmasterIsAlive() very cheap.

Based on a suggestion from Andres Freund.

Author: Thomas Munro, Heikki Linnakangas
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi
Discussion: https://postgr.es/m/20180411002643.6buofht4ranhei7k%40alap3.anarazel.de
2018-07-11 12:47:06 +12:00
Tom Lane 1d96c1b91a Fix incorrect ordering of operations in pg_resetwal and pg_rewind.
Commit c37b3d08c dropped its added GetDataDirectoryCreatePerm call into
the wrong place in pg_resetwal.c, namely after the chdir to DataDir.
That broke invocations using a relative path, as reported by Tushar Ahuja.
We could have left it where it was and changed the argument to be ".",
but that'd result in a rather confusing error message in event of a
failure, so re-ordering seems like a better solution.

Similarly reorder operations in pg_rewind.c.  The issue there is that
it doesn't seem like a good idea to do any actual operations before the
not-root check (on Unix) or the restricted token acquisition (on Windows).
I don't know that this is an actual bug, but I'm definitely not convinced
that it isn't, either.

Assorted other code review for c37b3d08c and da9b580d8: fix some
misspelled or otherwise badly worded comments, put the #include for
<sys/stat.h> where it actually belongs, etc.

Discussion: https://postgr.es/m/aeb9c3a7-3c3f-a57f-1a18-c8d4fcdc2a1f@enterprisedb.com
2018-05-23 10:59:55 -04:00
Stephen Frost e2b83ff556 Fix for globals.c- c.h must come first
Commit da9b580 mistakenly put a system header before postgres.h (which
includes c.h).  That can cause portability issues and broke (at least)
builds with older Windows compilers.

Discovered by Mark Dilger.

Discussion: https://postgr.es/m/BF04A27A-D132-4927-A80A-BAD18695E954@gmail.com
2018-05-18 21:20:27 -04:00
Stephen Frost c37b3d08ca Allow group access on PGDATA
Allow the cluster to be optionally init'd with read access for the
group.

This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.

The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates.  This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.

Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein.  If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required.  Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.

Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.

Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Stephen Frost da9b580d89 Refactor dir/file permissions
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).

Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).

Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.

Authors: David Steele <david@pgmasters.net>,
         Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Magnus Hagander eed1ce72e1 Allow background workers to bypass datallowconn
THis adds a "flags" field to the BackgroundWorkerInitializeConnection()
and BackgroundWorkerInitializeConnectionByOid(). For now only one flag,
BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore
datallowconn.
2018-04-05 19:02:45 +02:00
Teodor Sigaev 857f9c36cd Skip full index scan during cleanup of B-tree indexes when possible
Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete
calls and one amvacuumcleanup call. When workload on particular table
is append-only, then autovacuum isn't intended to touch this table. However,
user may run vacuum manually in order to fill visibility map and get benefits
of index-only scans. Then ambulkdelete wouldn't be called for indexes
of such table (because no heap tuples were deleted), only amvacuumcleanup would
be called In this case, amvacuumcleanup would perform full index scan for
two objectives: put recyclable pages into free space map and update index
statistics.

This patch allows btvacuumclanup to skip full index scan when two conditions
are satisfied: no pages are going to be put into free space map and index
statistics isn't stalled. In order to check first condition, we store
oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then
there are some recyclable pages. In order to check second condition we store
number of heap tuples observed during previous full index scan by cleanup.
If fraction of newly inserted tuples is less than
vacuum_cleanup_index_scale_factor, then statistics isn't considered to be
stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default).

This patch bumps B-tree meta-page version. Upgrade of meta-page is performed
"on the fly": during VACUUM meta-page is rewritten with new version. No special
handling in pg_upgrade is required.

Author: Masahiko Sawada, Alexander Korotkov
Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov
Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 19:29:00 +03:00
Tom Lane 4e0c743c18 Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections.
We were independently checking ReservedBackends < MaxConnections and
max_wal_senders < MaxConnections, but because walsenders aren't allowed
to use superuser-reserved connections, that's really the wrong thing.
Correct behavior is to insist on ReservedBackends + max_wal_senders being
less than MaxConnections.  Fix the code and associated documentation.

This has been wrong for a long time, but since the situation probably
hardly ever arises in the field (especially pre-v10, when the default
for max_wal_senders was zero), no back-patch.

Discussion: https://postgr.es/m/28271.1520195491@sss.pgh.pa.us
2018-03-08 11:25:26 -05:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Peter Eisentraut c1869542b3 Use abstracted SSL API in server connection log messages
The existing "connection authorized" server log messages used OpenSSL
API calls directly, even though similar abstracted API calls exist.
Change to use the latter instead.

Change the function prototype for the functions that return the TLS
version and the cipher to return const char * directly instead of
copying into a buffer.  That makes them slightly easier to use.

Add bits= to the message.  psql shows that, so we might as well show the
same information on the client and server.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-26 09:50:46 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andrew Dunstan 98d54bb779 Back out the session_start and session_end hooks feature.
It's become apparent during testing that there are problems with at
least the testing regime. I don't think we should have it without a
working test regime, and the difficulties might indicate implementation
problems anyway, so I'm backing out the whole thing until that's sorted
out.

This reverts commits 7459484 9989f92 cd8ce3a
2017-11-16 11:35:02 -05:00
Andrew Dunstan cd8ce3a22c Add hooks for session start and session end
These hooks can be used in loadable modules. A simple test module is
included.

Discussion:  https://postgr.es/m/20170720204733.40f2b7eb.nagata@sraoss.co.jp

Fabrízio de Royes Mello  and Yugo Nagata
Reviewed by Michael Paquier and Aleksandr Parfenov
2017-11-15 10:16:34 -05:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Robert Haas 8b304b8b72 Remove replacement selection sort.
At the time replacement_sort_tuples was introduced, there were still
cases where replacement selection sort noticeably outperformed using
quicksort even for the first run.  However, those cases seem to have
evaporated as a result of further improvements made since that time
(and perhaps also advances in CPU technology).  So remove replacement
selection and the controlling GUC entirely.  This makes tuplesort.c
noticeably simpler and probably paves the way for further
optimizations someone might want to do later.

Peter Geoghegan, with review and testing by Tomas Vondra and me.

Discussion: https://postgr.es/m/CAH2-WzmmNjG_K0R9nqYwMq3zjyJJK+hCbiZYNGhAy-Zyjs64GQ@mail.gmail.com
2017-09-29 10:25:44 -04:00
Andres Freund cc5f81366c Add support for coordinating record typmods among parallel workers.
Tuples can have type RECORDOID and a typmod number that identifies a blessed
TupleDesc in a backend-private cache.  To support the sharing of such tuples
through shared memory and temporary files, provide a typmod registry in
shared memory.

To achieve that, introduce per-session DSM segments, created on demand when a
backend first runs a parallel query.  The per-session DSM segment has a
table-of-contents just like the per-query DSM segment, and initially the
contents are a shared record typmod registry and a DSA area to provide the
space it needs to grow.

State relating to the current session is accessed via a Session object
reached through global variable CurrentSession that may require significant
redesign further down the road as we figure out what else needs to be shared
or remodelled.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-09-14 19:59:21 -07:00
Tom Lane f13ea95f9e Change pg_ctl to detect server-ready by watching status in postmaster.pid.
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log.  We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.

In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse.  In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read.  This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f).  pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.

In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order.  This is needed
on Windows where the SHMEM_KEY line will never be written at all.  We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.

Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway.  (Yes, I found that
out the hard way.)

While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious.  This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else.  For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec().  (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)

In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.

Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
2017-06-28 17:31:32 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00