Improve isolationtester's timeout management.

isolationtester.c had a hard-wired limit of 3 minutes per test step.
It now emerges that this isn't quite enough for some of the slowest
buildfarm animals.  This isn't the first time we've had to raise
this limit (cf. 1db439ad4), so let's make it configurable.  This
patch raises the default to 5 minutes, and introduces an environment
variable PGISOLATIONTIMEOUT that can be set if more time is needed,
following the precedent of PGCTLTIMEOUT.

Also, modify isolationtester so that when the timeout is hit,
it explicitly reports having sent a cancel.  This makes the regression
failure log considerably more intelligible.  (In the worst case, a
timed-out test might actually be reported as "passing" without this
extra output, so arguably this is a bug fix in itself.)

In passing, update the README file, which had apparently not gotten
touched when we added "make check" support here.

Back-patch to 9.6; older versions don't have comparable timeout logic.

Discussion: https://postgr.es/m/22964.1575842935@sss.pgh.pa.us
This commit is contained in:
Tom Lane 2019-12-09 14:31:57 -05:00
parent a7472d2b7a
commit eadfafda04
2 changed files with 48 additions and 14 deletions

View File

@ -11,18 +11,26 @@ comes from the fact that the original motivation was to test the
serializable isolation level; but tests for other sorts of concurrent serializable isolation level; but tests for other sorts of concurrent
behaviors have been added as well. behaviors have been added as well.
To run the tests, you need to have a server running at the default port You can run the tests against the current build tree by typing
expected by libpq. (You can set PGPORT and so forth in your environment make check
to control this.) Then run Alternatively, you can run against an existing installation by typing
make installcheck make installcheck
To run just specific test(s), you can do something like (This will contact a server at the default port expected by libpq.
You can set PGPORT and so forth in your environment to control this.)
To run just specific test(s) against an installed server,
you can do something like
./pg_isolation_regress fk-contention fk-deadlock ./pg_isolation_regress fk-contention fk-deadlock
(look into the specs/ subdirectory to see the available tests). (look into the specs/ subdirectory to see the available tests).
The prepared-transactions test requires the server's The prepared-transactions test requires the server's
max_prepared_transactions parameter to be set to at least 3; therefore it max_prepared_transactions parameter to be set to at least 3; therefore it
is not run by default. To include it in the test run, use is not run by default. To include it in the test run, use
make check-prepared-txns
or
make installcheck-prepared-txns make installcheck-prepared-txns
after making sure the server configuration is correct (see TEMP_CONFIG
to adjust this in the "check" case).
To define tests with overlapping transactions, we use test specification To define tests with overlapping transactions, we use test specification
files with a custom syntax, which is described in the next section. To add files with a custom syntax, which is described in the next section. To add
@ -39,6 +47,11 @@ pg_isolation_regress is a tool similar to pg_regress, but instead of using
psql to execute a test, it uses isolationtester. It accepts all the same psql to execute a test, it uses isolationtester. It accepts all the same
command-line arguments as pg_regress. command-line arguments as pg_regress.
By default, isolationtester will wait at most 300 seconds (5 minutes)
for any one test step to complete. If you need to adjust this, set
the environment variable PGISOLATIONTIMEOUT to the desired timeout
in seconds.
Test specification Test specification
================== ==================
@ -108,10 +121,10 @@ Each step may contain commands that block until further action has been taken
deadlock). A test that uses this ability must manually specify valid deadlock). A test that uses this ability must manually specify valid
permutations, i.e. those that would not expect a blocked session to execute a permutations, i.e. those that would not expect a blocked session to execute a
command. If a test fails to follow that rule, isolationtester will cancel it command. If a test fails to follow that rule, isolationtester will cancel it
after 180 seconds. If the cancel doesn't work, isolationtester will exit after PGISOLATIONTIMEOUT seconds. If the cancel doesn't work, isolationtester
uncleanly after a total of 200 seconds of wait time. Testing invalid will exit uncleanly after a total of twice PGISOLATIONTIMEOUT. Testing
permutations should be avoided because they can make the isolation tests take invalid permutations should be avoided because they can make the isolation
a very long time to run, and they serve no useful testing purpose. tests take a very long time to run, and they serve no useful testing purpose.
Note that isolationtester recognizes that a command has blocked by looking Note that isolationtester recognizes that a command has blocked by looking
to see if it is shown as waiting in the pg_locks view; therefore, only to see if it is shown as waiting in the pg_locks view; therefore, only

View File

@ -34,6 +34,10 @@ static int nconns = 0;
/* In dry run only output permutations to be run by the tester. */ /* In dry run only output permutations to be run by the tester. */
static int dry_run = false; static int dry_run = false;
/* Maximum time to wait before giving up on a step (in usec) */
static int64 max_step_wait = 300 * USECS_PER_SEC;
static void run_testspec(TestSpec *testspec); static void run_testspec(TestSpec *testspec);
static void run_all_permutations(TestSpec *testspec); static void run_all_permutations(TestSpec *testspec);
static void run_all_permutations_recurse(TestSpec *testspec, int nsteps, static void run_all_permutations_recurse(TestSpec *testspec, int nsteps,
@ -66,6 +70,7 @@ int
main(int argc, char **argv) main(int argc, char **argv)
{ {
const char *conninfo; const char *conninfo;
const char *env_wait;
TestSpec *testspec; TestSpec *testspec;
int i, int i,
j; j;
@ -110,6 +115,14 @@ main(int argc, char **argv)
else else
conninfo = "dbname = postgres"; conninfo = "dbname = postgres";
/*
* If PGISOLATIONTIMEOUT is set in the environment, adopt its value (given
* in seconds) as the max time to wait for any one step to complete.
*/
env_wait = getenv("PGISOLATIONTIMEOUT");
if (env_wait != NULL)
max_step_wait = ((int64) atoi(env_wait)) * USECS_PER_SEC;
/* Read the test spec from stdin */ /* Read the test spec from stdin */
spec_yyparse(); spec_yyparse();
testspec = &parseresult; testspec = &parseresult;
@ -779,7 +792,7 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
td += (int64) current_time.tv_usec - (int64) start_time.tv_usec; td += (int64) current_time.tv_usec - (int64) start_time.tv_usec;
/* /*
* After 180 seconds, try to cancel the query. * After max_step_wait microseconds, try to cancel the query.
* *
* If the user tries to test an invalid permutation, we don't want * If the user tries to test an invalid permutation, we don't want
* to hang forever, especially when this is running in the * to hang forever, especially when this is running in the
@ -787,7 +800,7 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
* failing, but remaining permutations and tests should still be * failing, but remaining permutations and tests should still be
* OK. * OK.
*/ */
if (td > 180 * USECS_PER_SEC && !canceled) if (td > max_step_wait && !canceled)
{ {
PGcancel *cancel = PQgetCancel(conn); PGcancel *cancel = PQgetCancel(conn);
@ -796,7 +809,15 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
char buf[256]; char buf[256];
if (PQcancel(cancel, buf, sizeof(buf))) if (PQcancel(cancel, buf, sizeof(buf)))
{
/*
* print to stdout not stderr, as this should appear
* in the test case's results
*/
printf("isolationtester: canceling step %s after %d seconds\n",
step->name, (int) (td / USECS_PER_SEC));
canceled = true; canceled = true;
}
else else
fprintf(stderr, "PQcancel failed: %s\n", buf); fprintf(stderr, "PQcancel failed: %s\n", buf);
PQfreeCancel(cancel); PQfreeCancel(cancel);
@ -804,16 +825,16 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
} }
/* /*
* After 200 seconds, just give up and die. * After twice max_step_wait, just give up and die.
* *
* Since cleanup steps won't be run in this case, this may cause * Since cleanup steps won't be run in this case, this may cause
* later tests to fail. That stinks, but it's better than waiting * later tests to fail. That stinks, but it's better than waiting
* forever for the server to respond to the cancel. * forever for the server to respond to the cancel.
*/ */
if (td > 200 * USECS_PER_SEC) if (td > 2 * max_step_wait)
{ {
fprintf(stderr, "step %s timed out after 200 seconds\n", fprintf(stderr, "step %s timed out after %d seconds\n",
step->name); step->name, (int) (td / USECS_PER_SEC));
exit(1); exit(1);
} }
} }