Harden postgres_fdw tests against unexpected cache flushes.

postgres_fdw will close its remote session if an sinval cache reset
occurs, since it's possible that that means some FDW parameters
changed.  We had two tests that were trying to ensure that the
session remains alive by setting debug_discard_caches = 0; but
that's not sufficient.  Even though the tests seem stable enough
in the buildfarm, they flap a lot under CI.

In the first test, which is checking the ability to recover from
a lost connection, we can stabilize the results by just not
caring whether pg_terminate_backend() finds a victim backend.
If a reset did happen, there won't be a session to terminate
anymore, but the test can proceed anyway.  (Arguably, we are
then not testing the unintentional-disconnect case, but as long
as that scenario is exercised in most runs I think it's fine;
testing the reset-driven case is of value too.)

In the second test, which is trying to verify the application_name
displayed in pg_stat_activity by a remote session, we had a race
condition in that the remote session might go away before we can
fetch its pg_stat_activity entry.  We can close that race and make
the test more certainly test what it intends to by arranging things
so that the remote session itself fetches its pg_stat_activity entry
(based on PID rather than a somewhat-circular assumption about the
application name).

Both tests now demonstrably pass under debug_discard_caches = 1,
so we can remove that hack.

Back-patch into relevant back branches.

Discussion: https://postgr.es/m/20230226194340.u44bkfgyz64c67i6@awork3.anarazel.de
This commit is contained in:
Tom Lane 2023-02-27 16:29:51 -05:00
parent 46647cc4b8
commit 53fe7e6cb8
2 changed files with 64 additions and 81 deletions

View File

@ -9926,11 +9926,6 @@ WARNING: there is no transaction in progress
-- Change application_name of remote connection to special one
-- so that we can easily terminate the connection later.
ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check');
-- If debug_discard_caches is active, it results in
-- dropping remote connections after every transaction, making it
-- impossible to test termination meaningfully. So turn that off
-- for this test.
SET debug_discard_caches = 0;
-- Make sure we have a remote connection.
SELECT 1 FROM ft1 LIMIT 1;
?column?
@ -9939,13 +9934,12 @@ SELECT 1 FROM ft1 LIMIT 1;
(1 row)
-- Terminate the remote connection and wait for the termination to complete.
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
-- (If a cache flush happens, the remote connection might have already been
-- dropped; so code this step in a way that doesn't fail if no connection.)
DO $$ BEGIN
PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
WHERE application_name = 'fdw_retry_check';
pg_terminate_backend
----------------------
t
(1 row)
END $$;
-- This query should detect the broken connection when starting new remote
-- transaction, reestablish new connection, and then succeed.
BEGIN;
@ -9958,13 +9952,10 @@ SELECT 1 FROM ft1 LIMIT 1;
-- If we detect the broken connection when starting a new remote
-- subtransaction, we should fail instead of establishing a new connection.
-- Terminate the remote connection and wait for the termination to complete.
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
DO $$ BEGIN
PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
WHERE application_name = 'fdw_retry_check';
pg_terminate_backend
----------------------
t
(1 row)
END $$;
SAVEPOINT s;
-- The text of the error might vary across platforms, so only show SQLSTATE.
\set VERBOSITY sqlstate
@ -9972,7 +9963,6 @@ SELECT 1 FROM ft1 LIMIT 1; -- should fail
ERROR: 08006
\set VERBOSITY default
COMMIT;
RESET debug_discard_caches;
-- =============================================================================
-- test connection invalidation cases and postgres_fdw_get_connections function
-- =============================================================================
@ -11629,74 +11619,66 @@ HINT: There are no valid options in this context.
-- ===================================================================
-- test postgres_fdw.application_name GUC
-- ===================================================================
--- Turn debug_discard_caches off for this test to make sure that
--- the remote connection is alive when checking its application_name.
SET debug_discard_caches = 0;
-- To avoid race conditions in checking the remote session's application_name,
-- use this view to make the remote session itself read its application_name.
CREATE VIEW my_application_name AS
SELECT application_name FROM pg_stat_activity WHERE pid = pg_backend_pid();
CREATE FOREIGN TABLE remote_application_name (application_name text)
SERVER loopback2
OPTIONS (schema_name 'public', table_name 'my_application_name');
SELECT count(*) FROM remote_application_name;
count
-------
1
(1 row)
-- Specify escape sequences in application_name option of a server
-- object so as to test that they are replaced with status information
-- expectedly.
-- expectedly. Note that we are also relying on ALTER SERVER to force
-- the remote session to be restarted with its new application name.
--
-- Since pg_stat_activity.application_name may be truncated to less than
-- NAMEDATALEN characters, note that substring() needs to be used
-- at the condition of test query to make sure that the string consisting
-- of database name and process ID is also less than that.
ALTER SERVER loopback2 OPTIONS (application_name 'fdw_%d%p');
SELECT 1 FROM ft6 LIMIT 1;
?column?
----------
1
(1 row)
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
SELECT count(*) FROM remote_application_name
WHERE application_name =
substring('fdw_' || current_database() || pg_backend_pid() for
current_setting('max_identifier_length')::int);
pg_terminate_backend
----------------------
t
count
-------
1
(1 row)
-- postgres_fdw.application_name overrides application_name option
-- of a server object if both settings are present.
ALTER SERVER loopback2 OPTIONS (SET application_name 'fdw_wrong');
SET postgres_fdw.application_name TO 'fdw_%a%u%%';
SELECT 1 FROM ft6 LIMIT 1;
?column?
----------
1
(1 row)
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
SELECT count(*) FROM remote_application_name
WHERE application_name =
substring('fdw_' || current_setting('application_name') ||
CURRENT_USER || '%' for current_setting('max_identifier_length')::int);
pg_terminate_backend
----------------------
t
count
-------
1
(1 row)
RESET postgres_fdw.application_name;
-- Test %c (session ID) and %C (cluster name) escape sequences.
SET postgres_fdw.application_name TO 'fdw_%C%c';
SELECT 1 FROM ft6 LIMIT 1;
?column?
----------
1
(1 row)
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
ALTER SERVER loopback2 OPTIONS (SET application_name 'fdw_%C%c');
SELECT count(*) FROM remote_application_name
WHERE application_name =
substring('fdw_' || current_setting('cluster_name') ||
to_hex(trunc(EXTRACT(EPOCH FROM (SELECT backend_start FROM
pg_stat_get_activity(pg_backend_pid()))))::integer) || '.' ||
to_hex(pg_backend_pid())
for current_setting('max_identifier_length')::int);
pg_terminate_backend
----------------------
t
count
-------
1
(1 row)
--Clean up
RESET postgres_fdw.application_name;
RESET debug_discard_caches;
-- ===================================================================
-- test parallel commit
-- ===================================================================

View File

@ -3100,18 +3100,16 @@ ROLLBACK;
-- so that we can easily terminate the connection later.
ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check');
-- If debug_discard_caches is active, it results in
-- dropping remote connections after every transaction, making it
-- impossible to test termination meaningfully. So turn that off
-- for this test.
SET debug_discard_caches = 0;
-- Make sure we have a remote connection.
SELECT 1 FROM ft1 LIMIT 1;
-- Terminate the remote connection and wait for the termination to complete.
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
-- (If a cache flush happens, the remote connection might have already been
-- dropped; so code this step in a way that doesn't fail if no connection.)
DO $$ BEGIN
PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
WHERE application_name = 'fdw_retry_check';
END $$;
-- This query should detect the broken connection when starting new remote
-- transaction, reestablish new connection, and then succeed.
@ -3121,8 +3119,10 @@ SELECT 1 FROM ft1 LIMIT 1;
-- If we detect the broken connection when starting a new remote
-- subtransaction, we should fail instead of establishing a new connection.
-- Terminate the remote connection and wait for the termination to complete.
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
DO $$ BEGIN
PERFORM pg_terminate_backend(pid, 180000) FROM pg_stat_activity
WHERE application_name = 'fdw_retry_check';
END $$;
SAVEPOINT s;
-- The text of the error might vary across platforms, so only show SQLSTATE.
\set VERBOSITY sqlstate
@ -3130,8 +3130,6 @@ SELECT 1 FROM ft1 LIMIT 1; -- should fail
\set VERBOSITY default
COMMIT;
RESET debug_discard_caches;
-- =============================================================================
-- test connection invalidation cases and postgres_fdw_get_connections function
-- =============================================================================
@ -3850,38 +3848,45 @@ ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS (nonexistent 'fdw');
-- ===================================================================
-- test postgres_fdw.application_name GUC
-- ===================================================================
--- Turn debug_discard_caches off for this test to make sure that
--- the remote connection is alive when checking its application_name.
SET debug_discard_caches = 0;
-- To avoid race conditions in checking the remote session's application_name,
-- use this view to make the remote session itself read its application_name.
CREATE VIEW my_application_name AS
SELECT application_name FROM pg_stat_activity WHERE pid = pg_backend_pid();
CREATE FOREIGN TABLE remote_application_name (application_name text)
SERVER loopback2
OPTIONS (schema_name 'public', table_name 'my_application_name');
SELECT count(*) FROM remote_application_name;
-- Specify escape sequences in application_name option of a server
-- object so as to test that they are replaced with status information
-- expectedly.
-- expectedly. Note that we are also relying on ALTER SERVER to force
-- the remote session to be restarted with its new application name.
--
-- Since pg_stat_activity.application_name may be truncated to less than
-- NAMEDATALEN characters, note that substring() needs to be used
-- at the condition of test query to make sure that the string consisting
-- of database name and process ID is also less than that.
ALTER SERVER loopback2 OPTIONS (application_name 'fdw_%d%p');
SELECT 1 FROM ft6 LIMIT 1;
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
SELECT count(*) FROM remote_application_name
WHERE application_name =
substring('fdw_' || current_database() || pg_backend_pid() for
current_setting('max_identifier_length')::int);
-- postgres_fdw.application_name overrides application_name option
-- of a server object if both settings are present.
ALTER SERVER loopback2 OPTIONS (SET application_name 'fdw_wrong');
SET postgres_fdw.application_name TO 'fdw_%a%u%%';
SELECT 1 FROM ft6 LIMIT 1;
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
SELECT count(*) FROM remote_application_name
WHERE application_name =
substring('fdw_' || current_setting('application_name') ||
CURRENT_USER || '%' for current_setting('max_identifier_length')::int);
RESET postgres_fdw.application_name;
-- Test %c (session ID) and %C (cluster name) escape sequences.
SET postgres_fdw.application_name TO 'fdw_%C%c';
SELECT 1 FROM ft6 LIMIT 1;
SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
ALTER SERVER loopback2 OPTIONS (SET application_name 'fdw_%C%c');
SELECT count(*) FROM remote_application_name
WHERE application_name =
substring('fdw_' || current_setting('cluster_name') ||
to_hex(trunc(EXTRACT(EPOCH FROM (SELECT backend_start FROM
@ -3889,10 +3894,6 @@ SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
to_hex(pg_backend_pid())
for current_setting('max_identifier_length')::int);
--Clean up
RESET postgres_fdw.application_name;
RESET debug_discard_caches;
-- ===================================================================
-- test parallel commit
-- ===================================================================