Commit Graph

4212 Commits

Author SHA1 Message Date
Peter Eisentraut
4d7e5a5db0 Remove use of deprecated Autoconf define
Change from HAVE_TM_ZONE to HAVE_STRUCT_TM_TM_ZONE.
2019-10-07 16:47:23 +02:00
Michael Paquier
41a6de41ed Fix confusing error caused by connection parameter channel_binding
When using a client compiled without channel binding support (linking to
OpenSSL 1.0.1 or older) to connect to a server which supports channel
binding (linking to OpenSSL 1.0.2 or newer), libpq would generate a
confusing error message with channel_binding=require for an SSL
connection, where the server sends back SCRAM-SHA-256-PLUS:
"channel binding is required, but server did not offer an authentication
method that supports channel binding."

This is confusing because the server did send a SASL mechanism able to
support channel binding, but libpq was not able to detect that
properly.

The situation can be summarized as followed for the case described in
the previous paragraph for the SASL mechanisms used with the various
modes of channel_binding:
1) Client supports channel binding.
1-1) channel_binding = disable => OK, with SCRAM-SHA-256.
1-2) channel_binding = prefer => OK, with SCRAM-SHA-256-PLUS.
1-3) channel_binding = require => OK, with SCRAM-SHA-256-PLUS.
2) Client does not support channel binding.
2-1) channel_binding = disable => OK, with SCRAM-SHA-256.
2-2) channel_binding = prefer => OK, with SCRAM-SHA-256.
2-3) channel_binding = require => failure with new error message,
instead of the confusing one.
This commit updates case 2-3 to generate a better error message.  Note
that the SSL TAP tests are not impacted as it is not possible to test
with mixed versions of OpenSSL for the backend and libpq.

Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Jeff Davis, Tom Lane
Discussion: https://postgr.es/m/24857.1569775891@sss.pgh.pa.us
2019-10-01 10:56:27 +09:00
Tom Lane
2c97f73468 Fix bogus order of error checks in new channel_binding code.
Coverity pointed out that it's pretty silly to check for a null pointer
after we've already dereferenced the pointer.  To fix, just swap the
order of the two error checks.  Oversight in commit d6e612f83.
2019-09-29 12:35:53 -04:00
Michael Paquier
55282fa20f Remove code relevant to OpenSSL 0.9.6 in be/fe-secure-openssl.c
HEAD supports OpenSSL 0.9.8 and newer versions, and this code likely got
forgotten as its surrounding comments mention an incorrect version
number.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/20190927032311.GB8485@paquier.xyz
2019-09-28 15:22:49 +09:00
Jeff Davis
d6e612f837 Add libpq parameter 'channel_binding'.
Allow clients to require channel binding to enhance security against
untrusted servers.

Author: Jeff Davis
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/227015d8417f2b4fef03f8966dbfa5cbcc4f44da.camel%40j-davis.com
2019-09-23 14:03:35 -07:00
Peter Eisentraut
72c48c3fc3 Remove removed file from nls.mk
part of revert "Add DECLARE STATEMENT support to ECPG."
2019-09-21 23:23:51 +02:00
Tom Lane
96b6c82c9d Revert "Add DECLARE STATEMENT support to ECPG."
This reverts commit bd7c95f0c1,
along with assorted follow-on fixes.  There are some questions
about the definition and implementation of that statement, and
we don't have time to resolve them before v13 release.  Rather
than ship the feature and then have backwards-compatibility
concerns constraining any redesign, let's remove it for now
and try again later.

Discussion: https://postgr.es/m/TY2PR01MB2443EC8286995378AEB7D9F8F5B10@TY2PR01MB2443.jpnprd01.prod.outlook.com
2019-09-20 12:47:37 -04:00
Peter Eisentraut
e1c8743e6c GSSAPI error message improvements
Make the error messages around GSSAPI encryption a bit clearer.  Tweak
some messages to avoid plural problems.

Also make a code change for clarity.  Using "conf" for "confidential"
is quite confusing.  Using "conf_state" is perhaps not much better but
that's what the GSSAPI documentation uses, so there is at least some
hope of understanding it.
2019-09-19 15:09:49 +02:00
Peter Eisentraut
74f2a8aa27 Revert change of ecpglib major version
The major version of ecpglib was changed in
bd7c95f0c1, apparently without
justification.  Revert this, since nothing has changed in this library
except some added functions.

Discussion: https://www.postgresql.org/message-id/flat/48ee4c56-e1df-b39d-2cad-c7d80b120eb5%402ndquadrant.com
2019-09-19 09:04:20 +02:00
Alvaro Herrera
b438e7e7a1 Restructure libpq code to remove some duplicity
There was some duplicate code to run SHOW transaction_read_only to
determine whether the server is read-write or read-only.  Reduce it by
adding another state to the state machine.

Author: Hari Babu Kommi
Reviewed-by: Takayuki Tsunakawa, Álvaro Herrera
Discussion: https://postgr.es/m/CAJrrPGe_qgdbbN+yBgEVpd+YLHXXjTruzk6RmTMhqrFig+32ag@mail.gmail.com
2019-09-10 12:14:24 -03:00
Peter Eisentraut
c5bc7050af Message style fixes 2019-09-06 22:54:02 +02:00
Peter Eisentraut
5599f40d25 libpq: ccache -> credential cache
The term "ccache" is overloaded.  Let's be more clear, in case someone
other than a Kerberos wizard has to read this code.
2019-09-06 09:15:35 +02:00
Peter Eisentraut
74a308cf52 Use explicit_bzero
Use the explicit_bzero() function in places where it is important that
security information such as passwords is cleared from memory.  There
might be other places where it could be useful; this is just an
initial collection.

For platforms that don't have explicit_bzero(), provide various
fallback implementations.  (explicit_bzero() itself isn't standard,
but as Linux/glibc, FreeBSD, and OpenBSD have it, it's the most common
spelling, so it makes sense to make that the invocation point.)

Discussion: https://www.postgresql.org/message-id/flat/42d26bde-5d5b-c90d-87ae-6cab875f73be%402ndquadrant.com
2019-09-05 08:30:42 +02:00
Tom Lane
b61a5e6a1f Cosmetic improvements for options-handling code in ECPGconnect().
The comment describing the string format was a lie.  Make it agree with
reality, add/improve some other comments, fix coding style for loops with
empty bodies.  Also add an Assert that we counted parameters correctly,
because the spread-out logic for that looks pretty fragile.

No actual bugs fixed here, so no need to back-patch.

Discussion: https://postgr.es/m/848B1649C8A6274AA527C4472CA11EDD5FC70CBE@G01JPEXMBYT02
2019-08-31 13:37:10 -04:00
Tom Lane
b1907d6882 Set application_name per-test in isolation and ecpg tests.
Commit a4327296d taught pg_regress proper to do this, but
missed the opportunity to do likewise in the isolationtester
and ecpg variants of pg_regress.  Seems like this might be
helpful for tracking down issues exposed by those tests.
2019-08-27 19:49:09 -04:00
Michael Paquier
c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Tom Lane
927f34ce8a Avoid conflicts with library versions of inet_net_ntop() and friends.
Prefix inet_net_ntop and sibling routines with "pg_" to ensure that
they aren't mistaken for C-library functions.  This fixes warnings
from cpluspluscheck on some platforms, and should help reduce reader
confusion everywhere, since our functions aren't exactly interchangeable
with the library versions (they may have different ideas about address
family codes).

This shouldn't be fixing any actual bugs, unless somebody's linker
is misbehaving, so no need to back-patch.

Discussion: https://postgr.es/m/20518.1559494394@sss.pgh.pa.us
2019-08-18 19:27:23 -04:00
Tom Lane
5c66e99178 Fix failure-to-compile-standalone in ecpg's dt.h.
This has to have <time.h>, or the references to "struct tm" don't
mean what they should.

We have some other recently-introduced issues of the same ilk,
but this one seems old.  No backpatch though, as it's only a
latent problem for most purposes.
2019-08-18 17:51:35 -04:00
Michael Paquier
66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Michael Paquier
b8f2da0ac5 Refactor logic to remove trailing CR/LF characters from strings
b654714 has reworked the way trailing CR/LF characters are removed from
strings.  This commit introduces a new routine in common/string.c and
refactors the code so as the logic is in a single place, mostly.

Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/20190801031820.GF29334@paquier.xyz
2019-08-09 11:05:14 +09:00
Michael Paquier
8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Michael Paquier
a9f301df0e Fix format truncation issue from ECPG test
This fixes one warning generated by GCC and present in the test case
array part of ECPG.  This likely got missed in past fixes like 3a4b891
because the compilation of those tests is not done by default.

Reported-by: Sergei Kornilov
Discussion: https://postgr.es/m/14951331562847675@sas2-a1efad875d04.qloud-c.yandex.net
2019-08-02 09:51:12 +09:00
Michael Paquier
eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Tom Lane
b654714f9b Fix failures to ignore \r when reading Windows-style newlines.
libpq failed to ignore Windows-style newlines in connection service files.
This normally wasn't a problem on Windows itself, because fgets() would
convert \r\n to just \n.  But if libpq were running inside a program that
changes the default fopen mode to binary, it would see the \r's and think
they were data.  In any case, it's project policy to ignore \r in text
files unconditionally, because people sometimes try to use files with
DOS-style newlines on Unix machines, where the C library won't hide that
from us.

Hence, adjust parseServiceFile() to ignore \r as well as \n at the end of
the line.  In HEAD, go a little further and make it ignore all trailing
whitespace, to match what it's always done with leading whitespace.

In HEAD, also run around and fix up everyplace where we have
newline-chomping code to make all those places look consistent and
uniformly drop \r.  It is not clear whether any of those changes are
fixing live bugs.  Most of the non-cosmetic changes are in places that
are reading popen output, and the jury is still out as to whether popen
on Windows can return \r\n.  (The Windows-specific code in pipe_read_line
seems to think so, but our lack of support for this elsewhere suggests
maybe it's not a problem in practice.)  Hence, I desisted from applying
those changes to back branches, except in run_ssl_passphrase_command()
which is new enough and little-tested enough that we'd probably not have
heard about any problems there.

Tom Lane and Michael Paquier, per bug #15827 from Jorge Gustavo Rocha.
Back-patch the parseServiceFile() change to all supported branches,
and the run_ssl_passphrase_command() change to v11 where that was added.

Discussion: https://postgr.es/m/15827-e6ba53a3a7ed543c@postgresql.org
2019-07-25 12:11:17 -04:00
Michael Paquier
23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Michael Paquier
0896ae561b Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues around:
- Fixes for typos and incorrect reference names.
- Removal of unneeded comments.
- Removal of unreferenced functions and structures.
- Fixes regarding variable name consistency.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/10bfd4ac-3e7c-40ab-2b2e-355ed15495e8@gmail.com
2019-07-16 13:23:53 +09:00
Michael Paquier
6b8548964b Fix inconsistencies in the code
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
2019-07-08 13:15:09 +09:00
Peter Eisentraut
7e9a4c5c3d Use consistent style for checking return from system calls
Use

    if (something() != 0)
        error ...

instead of just

    if (something)
        error ...

The latter is not incorrect, but it's a bit confusing and not the
common style.

Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com
2019-07-07 15:28:49 +02:00
Michael Paquier
d1a040543b Remove more unreferenced function declarations
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDuAYsRb3Q9aobkFZ6DZMWxsyg4HOmgkwgeWNfSkTwGxw@mail.gmail.com
2019-07-07 09:58:33 +09:00
Peter Eisentraut
b33283c364 Add missing source files to nls.mk 2019-07-06 15:03:21 +02:00
Peter Eisentraut
d5ab9df777 Remove unnecessary casts from size_t to int
We can use the %zu format specifier directly, no need to cast to int.
2019-07-04 17:02:13 +02:00
Michael Paquier
cfc40d384a Introduce safer encoding and decoding routines for base64.c
This is a follow-up refactoring after 09ec55b and b674211, which has
proved that the encoding and decoding routines used by SCRAM have a
poor interface when it comes to check after buffer overflows.  This adds
an extra argument in the shape of the length of the result buffer for
each routine, which is used for overflow checks when encoding or
decoding an input string.  The original idea comes from Tom Lane.

As a result of that, the encoding routine can now fail, so all its
callers are adjusted to generate proper error messages in case of
problems.

On failure, the result buffer gets zeroed.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20190623132535.GB1628@paquier.xyz
2019-07-04 16:08:09 +09:00
David Rowley
8abc13a889 Use appendStringInfoString and appendPQExpBufferStr where possible
This changes various places where appendPQExpBuffer was used in places
where it was possible to use appendPQExpBufferStr, and likewise for
appendStringInfo and appendStringInfoString.  This is really just a
stylistic improvement, but there are also small performance gains to be
had from doing this.

Discussion: http://postgr.es/m/CAKJS1f9P=M-3ULmPvr8iCno8yvfDViHibJjpriHU8+SXUgeZ=w@mail.gmail.com
2019-07-04 13:01:13 +12:00
Michael Meskes
e72489e101 Fix small memory leak in ecpglib ecpg_update_declare_statement() is called the
second time.

Author: "Zhang, Jie" <zhangjie2@cn.fujitsu.com>
2019-07-03 03:17:06 +02:00
Michael Meskes
8372e3c98f Use strtoint() instead of strtol() in pgtypeslib where the result is stored in
an int variable.

Author: Yang Xiao <YangX92@hotmail.com>
2019-07-03 03:17:06 +02:00
Michael Meskes
75220fb62b Made ecpg compatibility mode and run-time behaviour options case insensitive. 2019-07-03 03:17:06 +02:00
Tom Lane
615cebc94b Stamp HEAD as 13devel.
Let the hacking begin ...
2019-07-01 12:50:55 -04:00
Michael Paquier
c74d49d41c Fix many typos and inconsistencies
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-07-01 10:00:23 +09:00
Peter Eisentraut
666cbae16d Remove explicit error handling for obsolete date/time values
The date/time values 'current', 'invalid', and 'undefined' were
removed a long time ago, but the code still contains explicit error
handling for the transition.  To simplify the code and avoid having to
handle these values everywhere, just remove the recognition of these
tokens altogether now.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-06-30 10:27:35 +02:00
Peter Eisentraut
91acff7a53 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 1a710c413ce4c4cd081843e563cde256bb95f490
2019-06-17 15:30:20 +02:00
Michael Paquier
b674211788 Fix buffer overflow when processing SCRAM final message in libpq
When a client connects to a rogue server sending specifically-crafted
messages, this can suffice to execute arbitrary code as the operating
system account used by the client.

While on it, fix one error handling when decoding an incorrect salt
included in the first message received from server.

Author: Michael Paquier
Reviewed-by: Jonathan Katz, Heikki Linnakangas
Security: CVE-2019-10164
Backpatch-through: 10
2019-06-17 22:13:57 +09:00
Alvaro Herrera
313f56ce2d Tweak libpq's PQhost, PQhostaddr, and psql's \connect
Fixes some problems introduced by 6e5f8d489a:

* When reusing conninfo data from the previous connection in \connect,
  the host address should only be reused if it was specified as
  hostaddr; if it wasn't, then 'host' is resolved afresh.  We were
  reusing the same IP address, which ignores a possible DNS change
  as well as any other addresses that the name resolves to than the
  one that was used in the original connection.

* PQhost, PQhostaddr: Don't present user-specified hostaddr when we have
  an inet_net_ntop-produced equivalent address.  The latter has been
  put in canonical format, which is cleaner (so it produces "127.0.0.1"
  when given "host=2130706433", for example).

* Document the hostaddr-reusing aspect of \connect.

* Fix some code comments

Author: Fabien Coelho
Reported-by: Noah Misch
Discussion: https://postgr.es/m/20190527203713.GA58392@gust.leadboat.com
2019-06-14 18:02:26 -04:00
Alvaro Herrera
b976845815 Fix double-word typos
Discussion: https://postgr.es/m/20190612184527.GA24266@alvherre.pgsql
Reviewed-by: Michaël Paquier
2019-06-13 10:03:56 -04:00
Noah Misch
f2c71cb71f Stop using spelling "nonexistant".
The documentation used "nonexistent" exclusively, and the source tree
used it three times as often as "nonexistant".
2019-06-08 10:12:26 -07:00
Noah Misch
31d250e049 Update stale comments, and fix comment typos. 2019-06-08 10:12:26 -07:00
Tom Lane
10a53cae99 Un-break ecpg tests for Windows.
Declaring a function "inline" still doesn't work with Windows compilers
(C99? what's that?), unless the macro provided by pg_config.h is
in-scope, which it is not in our ECPG test programs.  So the workaround
I tried to use in commit 7640f9312 doesn't work for Windows.  Revert
the change in printf_hack.h, and instead just blacklist that file
in cpluspluscheck --- since it's a not-installed test file, we don't
really need to verify its C++ cleanliness anyway.
2019-06-02 11:07:54 -04:00
Tom Lane
ec66ec8b21 Adjust ecpg expected-results files for commit 7640f9312.
Mea culpa for not rechecking check-world at the last step :-(
Per buildfarm.
2019-05-31 12:47:19 -04:00
Tom Lane
4f67858d3f Fix C++ incompatibilities in ecpg/preproc/ header files.
There's probably no need to back-patch this, since it seems unlikely
that anybody would be inserting C++ code into ecpg's preprocessor.

Discussion: https://postgr.es/m/b517ec3918d645eb950505eac8dd434e@gaz-is.ru
2019-05-31 12:38:53 -04:00
Tom Lane
7640f93123 Fix assorted header files that failed to compile standalone.
We have a longstanding project convention that all .h files should
be includable with no prerequisites other than postgres.h.  This is
tested/relied-on by cpluspluscheck.  However, cpluspluscheck has not
historically been applied to most headers outside the src/include
tree, with the predictable consequence that some of them don't work.
Fix that, usually by adding missing #include dependencies.

The change in printf_hack.h might require some explanation: without
it, my C++ compiler whines that the function is unused.  There's
not so many call sites that "inline" is going to cost much, and
besides all the callers are in test code that we really don't care
about the size of.

There's no actual bugs being fixed here, so I see no need to back-patch.

Discussion: https://postgr.es/m/b517ec3918d645eb950505eac8dd434e@gaz-is.ru
2019-05-31 11:45:33 -04:00
Tom Lane
39fe881d3c Fix more thinkos in new ECPG "PREPARE AS" code.
ecpg_build_params() failed to check for ecpg_alloc failure in one
newly-added code path, and leaked a temporary string in another path.
Errors in commit a1dc6ab46, spotted by Coverity.
2019-05-26 10:39:11 -04:00
Tom Lane
331695a4d9 Fix thinko in new ECPG "PREPARE AS" code.
ecpg_register_prepared_stmt() is pretty obviously checking the wrong
variable while trying to detect malloc failure.  Error in commit
a1dc6ab46, spotted by Coverity.
2019-05-26 10:06:37 -04:00
Amit Kapila
9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Thomas Munro
4c9210f34c Update copyright year.
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CA%2BhUKGJFWXmtYo6Frd77RR8YXCHz7hJ2mRy5aHV%3D7fJOqDnBHA%40mail.gmail.com
2019-05-24 12:03:32 +12:00
Tom Lane
db6e2b4c52 Initial pgperltidy run for v12.
Make all the perl code look nice, too (for some value of "nice").
2019-05-22 13:36:19 -04:00
Tom Lane
8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane
be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Tom Lane
4a657ab260 Add .gitignore entries for new ecpg test case.
Oversight in commit a1dc6ab465.
2019-05-22 10:42:24 -04:00
Michael Meskes
a1dc6ab465 Implement PREPARE AS statement for ECPG.
Besides implementing the new statement this change fix some issues with the
parsing of PREPARE and EXECUTE statements. The different forms of these
statements are now all handled in a ujnified way.

Author: Matsumura-san <matsumura.ryo@jp.fujitsu.com>
2019-05-22 04:58:29 +02:00
Peter Eisentraut
3c439a58df Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: a20bf6b8a5b4e32450967055eb5b07cee4704edd
2019-05-20 16:00:53 +02:00
Tom Lane
fc9a62af3f Move logging.h and logging.c from src/fe_utils/ to src/common/.
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies.  That makes it
pointless to have distinct libraries at all.  The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.

We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)

Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511.  There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.

Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
2019-05-14 14:20:10 -04:00
Tom Lane
ddf927fb13 Fix misuse of an integer as a bool.
pgtls_read_pending is declared to return bool, but what the underlying
SSL_pending function returns is a count of available bytes.

This is actually somewhat harmless if we're using C99 bools, but in
the back branches it's a live bug: if the available-bytes count happened
to be a multiple of 256, it would get converted to a zero char value.
On machines where char is signed, counts of 128 and up could misbehave
as well.  The net effect is that when using SSL, libpq might block
waiting for data even though some has already been received.

Broken by careless refactoring in commit 4e86f1b16, so back-patch
to 9.5 where that came in.

Per bug #15802 from David Binderman.

Discussion: https://postgr.es/m/15802-f0911a97f0346526@postgresql.org
2019-05-13 10:53:19 -04:00
Stephen Frost
eb882a1b71 GSSAPI: Improve documentation and tests
The GSSAPI encryption patch neglected to update the protocol
documentation to describe how to set up a GSSAPI encrypted connection
from a client to the server, so fix that by adding the appropriate
documentation to protocol.sgml.

The tests added for encryption support were overly long and couldn't be
run in parallel due to race conditions; this was largely because each
test was setting up its own KDC to perform the tests.  Instead, merge
the authentication tests and the encryption tests into the original
test, where we only create one KDC to run the tests with.  Also, have
the tests check what the server's opinion is of the connection and if it
was GSS authenticated or encrypted using the pg_stat_gssapi view.

In passing, fix the libpq label for GSSENC-Mode to be consistent with
the "PGGSSENCMODE" environment variable.

Missing protocol documentation pointed out by Michael Paquier.
Issues with the tests pointed out by Tom Lane and Peter Eisentraut.

Refactored tests and added documentation by me.

Reviewed by Robbie Harwood (protocol documentation) and Michael Paquier
(rework of the tests).
2019-04-19 21:22:22 -04:00
Michael Paquier
148266fa35 Fix collection of typos and grammar mistakes in docs and comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20190330224333.GQ5815@telsasoft.com
2019-04-19 16:57:40 +09:00
Tom Lane
8cde7f4948 Fix assorted minor bogosity in GSSAPI transport error messages.
I noted that some buildfarm members were complaining about %ld being
used to format values that are (probably) declared size_t.  Use %zu
instead, and insert a cast just in case some versions of the GSSAPI
API declare the length field differently.  While at it, clean up
gratuitous differences in wording of equivalent messages, show
the complained-of length in all relevant messages not just some,
include trailing newline where needed, adjust random deviations
from project-standard code layout and message style, etc.
2019-04-17 17:06:50 -04:00
Michael Meskes
ed16ba3248 Fix off-by-one check that can lead to a memory overflow in ecpg.
Patch by Liu Huailing <liuhuailing@cn.fujitsu.com>
2019-04-11 20:56:17 +02:00
Michael Paquier
249d649996 Add support TCP user timeout in libpq and the backend server
Similarly to the set of parameters for keepalive, a connection parameter
for libpq is added as well as a backend GUC, called tcp_user_timeout.

Increasing the TCP user timeout is useful to allow a connection to
survive extended periods without end-to-end connection, and decreasing
it allows application to fail faster.  By default, the parameter is 0,
which makes the connection use the system default, and follows a logic
close to the keepalive parameters in its handling.  When connecting
through a Unix-socket domain, the parameters have no effect.

Author: Ryohei Nagaura
Reviewed-by: Fabien Coelho, Robert Haas, Kyotaro Horiguchi, Kirk
Jamison, Mikalai Keida, Takayuki Tsunakawa, Andrei Yahorau
Discussion: https://postgr.es/m/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
2019-04-06 15:23:37 +09:00
Stephen Frost
c46c85d459 Handle errors during GSSAPI startup better
There was some confusion over the format of the error message returned
from the server during GSSAPI startup; specifically, it was expected
that a length would be returned when, in reality, at this early stage in
the startup sequence, no length is returned from the server as part of
an error message.

Correct the client-side code for dealing with error messages sent by the
server during startup by simply reading what's available into our
buffer, after we've discovered it's an error message, and then reporting
back what was returned.

In passing, also add in documentation of the environment variable
PGGSSENCMODE which was missed previously, and adjust the code to look
for the PGGSSENCMODE variable (the environment variable change was
missed in the prior GSSMODE -> GSSENCMODE commit).

Error-handling issue discovered by Peter Eisentraut, the rest were items
discovered during testing of the error handling.
2019-04-04 22:52:42 -04:00
Tom Lane
7bac3acab4 Add a "SQLSTATE-only" error verbosity option to libpq and psql.
This is intended for use mostly in test scripts for external tools,
which could do without cross-PG-version variations in error message
wording.  Of course, the SQLSTATE isn't guaranteed stable either, but
it should be more so than the error message text.

Note: there's a bit of an ABI change for libpq here, but it seems
OK because if somebody compiles against a newer version of libpq-fe.h,
and then tries to pass PQERRORS_SQLSTATE to PQsetErrorVerbosity()
of an older libpq library, it will be accepted and then act like
PQERRORS_DEFAULT, thanks to the way the tests in pqBuildErrorMessage3
have historically been phrased.  That seems acceptable.

Didier Gautheron, reviewed by Dagfinn Ilmari Mannsåker

Discussion: https://postgr.es/m/CAJRYxuKyj4zA+JGVrtx8OWAuBfE-_wN4sUMK4H49EuPed=mOBw@mail.gmail.com
2019-04-04 17:22:02 -04:00
Stephen Frost
b0b39f72b9 GSSAPI encryption support
On both the frontend and backend, prepare for GSSAPI encryption
support by moving common code for error handling into a separate file.
Fix a TODO for handling multiple status messages in the process.
Eliminate the OIDs, which have not been needed for some time.

Add frontend and backend encryption support functions.  Keep the
context initiation for authentication-only separate on both the
frontend and backend in order to avoid concerns about changing the
requested flags to include encryption support.

In postmaster, pull GSSAPI authorization checking into a shared
function.  Also share the initiator name between the encryption and
non-encryption codepaths.

For HBA, add "hostgssenc" and "hostnogssenc" entries that behave
similarly to their SSL counterparts.  "hostgssenc" requires either
"gss", "trust", or "reject" for its authentication.

Similarly, add a "gssencmode" parameter to libpq.  Supported values are
"disable", "require", and "prefer".  Notably, negotiation will only be
attempted if credentials can be acquired.  Move credential acquisition
into its own function to support this behavior.

Add a simple pg_stat_gssapi view similar to pg_stat_ssl, for monitoring
if GSSAPI authentication was used, what principal was used, and if
encryption is being used on the connection.

Finally, add documentation for everything new, and update existing
documentation on connection security.

Thanks to Michael Paquier for the Windows fixes.

Author: Robbie Harwood, with changes to the read/write functions by me.
Reviewed in various forms and at different times by: Michael Paquier,
   Andres Freund, David Steele.
Discussion: https://www.postgresql.org/message-id/flat/jlg1tgq1ktm.fsf@thriss.redhat.com
2019-04-03 15:02:33 -04:00
Peter Eisentraut
cc8d415117 Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.

Features:

- Program name is automatically prefixed.

- Message string does not end with newline.  This removes a common
  source of inconsistencies and omissions.

- Additionally, a final newline is automatically stripped, simplifying
  use of PQerrorMessage() etc., another common source of mistakes.

- I converted error message strings to use %m where possible.

- As a result of the above several points, more translatable message
  strings can be shared between different components and between
  frontends and backend, without gratuitous punctuation or whitespace
  differences.

- There is support for setting a "log level".  This is not meant to be
  user-facing, but can be used internally to implement debug or
  verbose modes.

- Lazy argument evaluation, so no significant overhead if logging at
  some level is disabled.

- Some color in the messages, similar to gcc and clang.  Set
  PG_COLOR=auto to try it out.  Some colors are predefined, but can be
  customized by setting PG_COLORS.

- Common files (common/, fe_utils/, etc.) can handle logging much more
  simply by just using one API without worrying too much about the
  context of the calling program, requiring callbacks, or having to
  pass "progname" around everywhere.

- Some programs called setvbuf() to make sure that stderr is
  unbuffered, even on Windows.  But not all programs did that.  This
  is now done centrally.

Soft goals:

- Reduces vertical space use and visual complexity of error reporting
  in the source code.

- Encourages more deliberate classification of messages.  For example,
  in some cases it wasn't clear without analyzing the surrounding code
  whether a message was meant as an error or just an info.

- Concepts and terms are vaguely aligned with popular logging
  frameworks such as log4j and Python logging.

This is all just about printing stuff out.  Nothing affects program
flow (e.g., fatal exits).  The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.

I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded.  One significant change is that
pg_rewind used to write all error messages to stdout.  That is now
changed to stderr.

Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 20:01:35 +02:00
Tom Lane
1f39a1c064 Restructure libpq's handling of send failures.
Originally, if libpq got a failure (e.g., ECONNRESET) while trying to
send data to the server, it would just report that and wash its hands
of the matter.  It was soon found that that wasn't a very pleasant way
of coping with server-initiated disconnections, so we introduced a hack
(pqHandleSendFailure) in the code that sends queries to make it peek
ahead for server error reports before reporting the send failure.

It now emerges that related cases can occur during connection setup;
in particular, as of TLS 1.3 it's unsafe to assume that SSL connection
failures will be reported by SSL_connect rather than during our first
send attempt.  We could have fixed that in a hacky way by applying
pqHandleSendFailure after a startup packet send failure, but
(a) pqHandleSendFailure explicitly disclaims suitability for use in any
state except query startup, and (b) the problem still potentially exists
for other send attempts in libpq.

Instead, let's fix this in a more general fashion by eliminating
pqHandleSendFailure altogether, and instead arranging to postpone
all reports of send failures in libpq until after we've made an
attempt to read and process server messages.  The send failure won't
be reported at all if we find a server message or detect input EOF.

(Note: this removes one of the reasons why libpq typically overwrites,
rather than appending to, conn->errorMessage: pqHandleSendFailure needed
that behavior so that the send failure report would be replaced if we
got a server message or read failure report.  Eventually I'd like to get
rid of that overwrite behavior altogether, but today is not that day.
For the moment, pqSendSome is assuming that its callees will overwrite
not append to conn->errorMessage.)

Possibly this change should get back-patched someday; but it needs
testing first, so let's not consider that till after v12 beta.

Discussion: https://postgr.es/m/CAEepm=2n6Nv+5tFfe8YnkUm1fXgvxR0Mm1FoD+QKG-vLNGLyKg@mail.gmail.com
2019-03-19 16:20:28 -04:00
Michael Meskes
c21d6033f7 Use correct connection name variable in ecpglib.
Fixed-by: Kuroda-san <kuroda.hayato@jp.fujitsu.com>
2019-03-16 04:01:06 +01:00
Michael Meskes
08cecfaf60 Fix potential memory access violation in ecpg if filename of include file is
shorter than 2 characters.

Patch by: "Wu, Fei" <wufei.fnst@cn.fujitsu.com>
2019-03-11 16:11:16 +01:00
Michael Meskes
98bdaab0d9 Fix ecpglib regression that made it impossible to close a cursor that was
opened in a prepared statement.

Patch by: "Kuroda, Hayato" <kuroda.hayato@jp.fujitsu.com>
2019-03-11 16:00:13 +01:00
Peter Eisentraut
aad21d4c3c Fix whitespace 2019-03-01 20:56:53 +01:00
Michael Paquier
87c346a35e Fix SCRAM authentication via SSL when mixing versions of OpenSSL
When using a libpq client linked with OpenSSL 1.0.1 or older to connect
to a backend linked with OpenSSL 1.0.2 or newer, the server would send
SCRAM-SHA-256-PLUS and SCRAM-SHA-256 as valid mechanisms for the SASL
exchange, and the client would choose SCRAM-SHA-256-PLUS even if it does
not support channel binding, leading to a confusing error.  In this
case, what the client ought to do is switch to SCRAM-SHA-256 so as the
authentication can move on and succeed.

So for a SCRAM authentication over SSL, here are all the cases present
and how we deal with them using libpq:
1) Server supports channel binding, it sends SCRAM-SHA-256-PLUS and
SCRAM-SHA-256 as allowed mechanisms.
1-1) Client supports channel binding, chooses SCRAM-SHA-256-PLUS.
1-2) Client does not support channel binding, chooses SCRAM-SHA-256.
2) Server does not support channel binding, sends SCRAM-SHA-256 as
allowed mechanism.
2-1) Client supports channel binding, still it has no choice but to
choose SCRAM-SHA-256.
2-2) Client does not support channel binding, it chooses SCRAM-SHA-256.
In all these scenarios the connection should succeed, and the one which
was handled incorrectly prior this commit is 1-2), causing the
connection attempt to fail because client chose SCRAM-SHA-256-PLUS over
SCRAM-SHA-256.

Reported-by: Hugh Ranalli
Diagnosed-by: Peter Eisentraut
Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/CAAhbUMO89SqUk-5mMY+OapgWf-twF2NA5sCucbHEzMfGbvcepA@mail.gmail.com
Backpatch-through: 11
2019-02-28 09:40:28 +09:00
Michael Meskes
42ccbe4351 Free memory in ecpg bytea regression test.
While not really a problem it's easier to run tools like valgrind against it
when fixed.
2019-02-26 11:59:35 +01:00
Michael Meskes
0cc0507940 Hopefully fixing memory handling issues in ecpglib that Coverity found. 2019-02-26 10:56:54 +01:00
Tom Lane
9e138a401d Fix ecpg bugs caused by missing semicolons in the backend grammar.
The Bison documentation clearly states that a semicolon is required
after every grammar rule, and our scripts that generate ecpg's
grammar from the backend's implicitly assumed this is true.  But it
turns out that only ancient versions of Bison actually enforce that.
There have been a couple of rules without trailing semicolons in
gram.y for some time, and as a consequence, ecpg's grammar was faulty
and produced wrong output for the affected statements.

To fix, add the missing semis, and add some cross-checks to ecpg's
scripts so that they'll bleat if we mess this up again.

The cases that were broken were:
* "SET variable = DEFAULT" (but not "SET variable TO DEFAULT"),
  as well as allied syntaxes such as ALTER SYSTEM SET ... DEFAULT.
  These produced syntactically invalid output that the server
  would reject.
* Multiple type names in DROP TYPE/DOMAIN commands.  Only the
  first type name would be listed in the emitted command.

Per report from Daisuke Higuchi.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905DB51CE@g01jpexmbkw24
2019-02-24 12:51:50 -05:00
Peter Eisentraut
1373ba55c9 Add const qualifier
New code introduced in 050710b369.  The
lack of const is not currently a compiler warning, but it's nice to
have for consistency with surrounding code.
2019-02-22 09:01:19 +01:00
Michael Paquier
554ca6954e Remove duplicate variable declaration in fe-connect.c
The same variables are declared twice when checking if a connection is
writable, which is useless.

Author: Haribabu Kommi
Discussion: https://postgr.es/m/CAJrrPGf=rcALB54w_Tg1_hx3y+cgSWaERY-uYSQzGc3Zt5XN4g@mail.gmail.com
2019-02-22 13:16:47 +09:00
Tom Lane
315dcffb94 Fix omissions in ecpg/test/sql/.gitignore.
Oversights in commits 050710b36 and e81f0e311.
2019-02-18 21:24:38 -05:00
Michael Meskes
8e6ab9f801 Properly end string to make sure ecpglib does not read beyond its boundaries. 2019-02-18 12:52:53 +01:00
Michael Meskes
e81f0e3113 Sync ECPG's CREATE TABLE AS statement with backend's.
Author: Higuchi-san ("Higuchi, Daisuke" <higuchi.daisuke@jp.fujitsu.com>)
2019-02-18 11:57:34 +01:00
Michael Meskes
050710b369 Add bytea datatype to ECPG.
So far ECPG programs had to treat binary data for bytea column as 'char' type.
But this meant converting from/to escaped format with PQunescapeBytea/
PQescapeBytea() and therefore forcing users to add unnecessary code and cost
for the conversion in runtime. By adding a dedicated datatype for bytea most of
this special handling is no longer needed.

Author: Matsumura-san ("Matsumura, Ryo" <matsumura.ryo@jp.fujitsu.com>)

Discussion: https://postgr.es/m/flat/03040DFF97E6E54E88D3BFEE5F5480F737A141F9@G01JPEXMBYT04
2019-02-18 10:20:31 +01:00
Michael Meskes
bd7c95f0c1 Add DECLARE STATEMENT support to ECPG.
DECLARE STATEMENT is a statement that lets users declare an identifier
pointing at a connection.  This identifier will be used in other embedded
dynamic SQL statement such as PREPARE, EXECUTE, DECLARE CURSOR and so on.
When connecting to a non-default connection, the AT clause can be used in
a DECLARE STATEMENT once and is no longer needed in every dynamic SQL
statement.  This makes ECPG applications easier and more efficient.  Moreover,
writing code without designating connection explicitly improves portability.

Authors: Ideriha-san ("Ideriha, Takeshi" <ideriha.takeshi@jp.fujitsu.com>)
         Kuroda-san ("Kuroda, Hayato" <kuroda.hayato@jp.fujitsu.com>)

Discussion: https://postgr.es/m4E72940DA2BF16479384A86D54D0988A565669DF@G01JPEXMBKW04
2019-02-16 11:05:54 +01:00
Peter Eisentraut
37d9916020 More unconstify use
Replace casts whose only purpose is to cast away const with the
unconstify() macro.

Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
2019-02-13 11:50:16 +01:00
Michael Meskes
7ea38f045d Change error handling of out of scope variables in ecpg.
The function called can result in an out of memory error that subsequently was
disregarded. Instead it should set the appropriate SQL error variables and be
checked by whatever whenever statement is defined.
2019-01-30 14:35:52 +01:00
Michael Meskes
e2f731cdba Make some ecpg test cases more robust against unexpected errors that happen
during development. Test cases themselves should not hang or segfault.
2019-01-30 10:39:32 +01:00
Michael Meskes
5c04630ad0 Make sure that ecpglib's statement variable has a defined value no matter what. 2019-01-30 10:39:32 +01:00
Tom Lane
e3565fd61c Remove _configthreadlocale() calls in ecpg test suite.
This essentially reverts commits a772624b1 and 04fbe0e45, which
added "_configthreadlocale(_ENABLE_PER_THREAD_LOCALE)" calls to the
thread-related ecpg test programs.  That was nothing but a hack,
because we shouldn't expect that ecpg-using applications have
done that for us; and now that we've inserted such calls into
ecpglib, the tests should still pass without it.

(If they don't, it would be good to know that.)

HEAD only; there seems no big need to change this in the
back branches.

Discussion: https://postgr.es/m/22937.1548307384@sss.pgh.pa.us
2019-01-24 17:02:09 -05:00
Tom Lane
d5a1fde397 Remove infinite-loop hazards in ecpg test suite.
A report from Andrew Dunstan showed that an ecpglib breakage that
causes repeated query failures could lead to infinite loops in some
ecpg test scripts, because they contain "while(1)" loops with no
exit condition other than successful test completion.  That might
be all right for manual testing, but it seems entirely unacceptable
for automated test environments such as our buildfarm.  We don't
want buildfarm owners to have to intervene manually when a test
goes wrong.

To fix, just change all those while(1) loops to exit after at most
100 iterations (which is more than any of them expect to iterate).
This seems sufficient since we'd see discrepancies in the test output
if any loop executed the wrong number of times.

I tested this by dint of intentionally breaking ecpg_do_prologue
to always fail, and verifying that the tests still got to completion.

Back-patch to all supported branches, since the whole point of this
exercise is to protect the buildfarm against future mistakes.

Discussion: https://postgr.es/m/18693.1548302004@sss.pgh.pa.us
2019-01-24 16:47:06 -05:00
Tom Lane
2cf91ccb73 Blind attempt to fix _configthreadlocale() failures on MinGW.
Apparently, some builds of MinGW contain a version of
_configthreadlocale() that always returns -1, indicating failure.
Rather than treating that as a curl-up-and-die condition, soldier on
as though the function didn't exist.  This leaves us without thread
safety on such MinGW versions, but we didn't have it anyway.

Discussion: https://postgr.es/m/d06a16bc-52d6-9f0d-2379-21242d7dbe81@2ndQuadrant.com
2019-01-23 22:46:45 -05:00
Tom Lane
ee27584c4a Second try at fixing ecpglib thread-safety problem.
While Windows (allegedly) has _configthreadlocale() pretty far back,
it seems MinGW didn't acquire support for that till more recently.
Fortunately, we can use an autoconf probe on that toolchain,
instead of guessing whether it's there.  (Hm, I wonder whether Cygwin
will need this also.)

Per buildfarm.

Discussion: https://postgr.es/m/20190121193512.tdmcnic2yjxlufaw@alap3.anarazel.de
2019-01-21 16:17:10 -05:00
Tom Lane
8eb4a9312c Avoid thread-safety problem in ecpglib.
ecpglib attempts to force the LC_NUMERIC locale to "C" while reading
server output, to avoid problems with strtod() and related functions.
Historically it's just issued setlocale() calls to do that, but that
has major problems if we're in a threaded application.  setlocale()
itself is not required by POSIX to be thread-safe (and indeed is not,
on recent OpenBSD).  Moreover, its effects are process-wide, so that
we could cause unexpected results in other threads, or another thread
could change our setting.

On platforms having uselocale(), which is required by POSIX:2008,
we can avoid these problems by using uselocale() instead.  Windows
goes its own way as usual, but we can make it safe by using
_configthreadlocale().  Platforms having neither continue to use the
old code, but that should be pretty much nobody among current systems.

This should get back-patched, but let's see what the buildfarm
thinks of it first.

Michael Meskes and Tom Lane; thanks also to Takayuki Tsunakawa.

Discussion: https://postgr.es/m/31420.1547783697@sss.pgh.pa.us
2019-01-21 12:07:02 -05:00
Tomas Vondra
31f3817402 Allow COPY FROM to filter data using WHERE conditions
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.).  Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.

Author: Surafel Temesgen
Reviewed-by: Tomas Vondra, Masahiko Sawada, Lim Myungkyu
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com
2019-01-20 00:22:14 +01:00
Magnus Hagander
0301db623d Replace @postgresql.org with @lists.postgresql.org for mailinglists
Commit c0d0e54084 replaced the ones in the documentation, but missed out
on the ones in the code. Replace those as well, but unlike c0d0e54084,
don't backpatch the code changes to avoid breaking translations.
2019-01-19 19:06:35 +01:00
Tom Lane
f6cddbd4d7 Add .gitignore entry for a derived file created by "make distprep".
I chanced to notice that "make distprep" leaves a state of the
tree that git complains about.  It's been like this for awhile,
but given the lack of complaints it probably doesn't need
back-patching.
2019-01-10 14:20:28 -05:00
Tom Lane
c64d0cd5ce Use perfect hashing, instead of binary search, for keyword lookup.
We've been speculating for a long time that hash-based keyword lookup
ought to be faster than binary search, but up to now we hadn't found
a suitable tool for generating the hash function.  Joerg Sonnenberger
provided the inspiration, and sample code, to show us that rolling our
own generator wasn't a ridiculous idea.  Hence, do that.

The method used here requires a lookup table of approximately 4 bytes
per keyword, but that's less than what we saved in the predecessor commit
afb0d0712, so it's not a big problem.  The time savings is indeed
significant: preliminary testing suggests that the total time for raw
parsing (flex + bison phases) drops by ~20%.

Patch by me, but it owes its existence to Joerg Sonnenberger;
thanks also to John Naylor for review.

Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de
2019-01-09 19:47:46 -05:00
Tom Lane
afb0d0712f Replace the data structure used for keyword lookup.
Previously, ScanKeywordLookup was passed an array of string pointers.
This had some performance deficiencies: the strings themselves might
be scattered all over the place depending on the compiler (and some
quick checking shows that at least with gcc-on-Linux, they indeed
weren't reliably close together).  That led to very cache-unfriendly
behavior as the binary search touched strings in many different pages.
Also, depending on the platform, the string pointers might need to
be adjusted at program start, so that they couldn't be simple constant
data.  And the ScanKeyword struct had been designed with an eye to
32-bit machines originally; on 64-bit it requires 16 bytes per
keyword, making it even more cache-unfriendly.

Redesign so that the keyword strings themselves are allocated
consecutively (as part of one big char-string constant), thereby
eliminating the touch-lots-of-unrelated-pages syndrome.  And get
rid of the ScanKeyword array in favor of three separate arrays:
uint16 offsets into the keyword array, uint16 token codes, and
uint8 keyword categories.  That reduces the overhead per keyword
to 5 bytes instead of 16 (even less in programs that only need
one of the token codes and categories); moreover, the binary search
only touches the offsets array, further reducing its cache footprint.
This also lets us put the token codes somewhere else than the
keyword strings are, which avoids some unpleasant build dependencies.

While we're at it, wrap the data used by ScanKeywordLookup into
a struct that can be treated as an opaque type by most callers.
That doesn't change things much right now, but it will make it
less painful to switch to a hash-based lookup method, as is being
discussed in the mailing list thread.

Most of the change here is associated with adding a generator
script that can build the new data structure from the same
list-of-PG_KEYWORD header representation we used before.
The PG_KEYWORD lists that plpgsql and ecpg used to embed in
their scanner .c files have to be moved into headers, and the
Makefiles have to be taught to invoke the generator script.
This work is also necessary if we're to consider hash-based lookup,
since the generator script is what would be responsible for
constructing a hash table.

Aside from saving a few kilobytes in each program that includes
the keyword table, this seems to speed up raw parsing (flex+bison)
by a few percent.  So it's worth doing even as it stands, though
we think we can gain even more with a follow-on patch to switch
to hash-based lookup.

John Naylor, with further hacking by me

Discussion: https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com
2019-01-06 17:02:57 -05:00