Commit Graph

135 Commits

Author SHA1 Message Date
Tom Lane 1279414bc6 Fix misbehavior in contrib/pg_trgm with an unsatisfiable regex.
If the regex compiler can see that a regex is unsatisfiable
(for example, '$foo') then it may emit an NFA having no arcs.
pg_trgm's packGraph function did the wrong thing in this case;
it would access off the end of a work array, and with bad luck
could produce a corrupted output data structure causing more
problems later.  This could end with wrong answers or crashes
in queries using a pg_trgm GIN or GiST index with such a regex.

Fix by not trying to de-duplicate if there aren't at least 2 arcs.

Per bug #17830 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17830-57ff5f89bdb02b09@postgresql.org
2023-03-11 12:15:41 -05:00
Andrew Gierth 7f69ed4aeb pg_trgm: fix crash in 2-item picksplit
Whether from size overflow in gistSplit or from secondary splits,
picksplit is (rarely) called with exactly two items to split.

Formerly, due to special-case handling of the last item, this would
lead to access to an uninitialized cache entry; prior to PG 13 this
might have been harmless or at worst led to an incorrect union datum,
but in 13 onwards it can cause a backend crash from using an
uninitialized pointer.

Repair by removing the special case, which was deemed not to have been
appropriate anyway. Backpatch all the way, because this bug has
existed since pg_trgm was added.

Per report on IRC from user "ftzdomino". Analysis and testing by me,
patch from Alexander Korotkov.

Discussion: https://postgr.es/m/87k0usfdxg.fsf@news-spur.riddles.org.uk
2020-11-12 14:56:58 +00:00
Tom Lane 3ba9670847 Make contrib modules' installation scripts more secure.
Hostile objects located within the installation-time search_path could
capture references in an extension's installation or upgrade script.
If the extension is being installed with superuser privileges, this
opens the door to privilege escalation.  While such hazards have existed
all along, their urgency increases with the v13 "trusted extensions"
feature, because that lets a non-superuser control the installation path
for a superuser-privileged script.  Therefore, make a number of changes
to make such situations more secure:

* Tweak the construction of the installation-time search_path to ensure
that references to objects in pg_catalog can't be subverted; and
explicitly add pg_temp to the end of the path to prevent attacks using
temporary objects.

* Disable check_function_bodies within installation/upgrade scripts,
so that any security gaps in SQL-language or PL-language function bodies
cannot create a risk of unwanted installation-time code execution.

* Adjust lookup of type input/receive functions and join estimator
functions to complain if there are multiple candidate functions.  This
prevents capture of references to functions whose signature is not the
first one checked; and it's arguably more user-friendly anyway.

* Modify various contrib upgrade scripts to ensure that catalog
modification queries are executed with secure search paths.  (These
are in-place modifications with no extension version changes, since
it is the update process itself that is at issue, not the end result.)

Extensions that depend on other extensions cannot be made fully secure
by these methods alone; therefore, revert the "trusted" marking that
commit eb67623c9 applied to earthdistance and hstore_plperl, pending
some better solution to that set of issues.

Also add documentation around these issues, to help extension authors
write secure installation scripts.

Patch by me, following an observation by Andres Freund; thanks
to Noah Misch for review.

Security: CVE-2020-14350
2020-08-10 10:44:42 -04:00
Tom Lane c08da32f13 Get rid of trailing semicolons in C macro definitions.
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead.  (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".)  Thus, if we have a semi inside
the macro, the compiler sees "something;;".  Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites.  In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.

The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old.  (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)

John Naylor and Tom Lane

Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
2020-05-01 17:28:00 -04:00
Michael Paquier c74d49d41c Fix many typos and inconsistencies
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-07-01 10:00:23 +09:00
Michael Paquier 3412030205 Fix more typos and inconsistencies in the tree
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-17 16:13:16 +09:00
Alexander Korotkov b6987a885b Fix operator naming in pg_trgm GUC option descriptions
Descriptions of pg_trgm GUC options have % replaced with %% like it was
a printf-like format.  But that's not needed since they are just plain strings.
This commit fixed that.  Backpatch to last supported version since this error
present from the beginning.

Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoAgPKODUsu9gqUFiNqEOAqedStxJ-a0sapsJXWWAVp%3Dxg%40mail.gmail.com
Backpatch-through: 9.4
2019-06-10 20:14:19 +03:00
Peter Eisentraut 41205719d3 Change Graphviz file extension
Change extension for Graphviz files from .dot to .gv.  The latter
appears to be the generally preferred one nowadays.

Discussion: https://www.postgresql.org/message-id/flat/71fe76d2-c7d7-2acc-6762-bbf9e61c566e%402ndquadrant.com
2019-05-26 08:08:05 +02:00
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane 02a6a54ecd Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available.  Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.

On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet.  Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.

I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.

While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far.  Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.

David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane

Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
2019-02-15 23:22:33 -05:00
Andrew Gierth 02ddd49932 Change floating-point output format for improved performance.
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).

Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.

Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.

Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.

We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.

Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:

 - Output format is changed to use fixed-point notation for small
   exponents, as printf would, and also to use lowercase 'e', a
   minimum of 2 exponent digits, and a mandatory sign on the exponent,
   to keep the formatting as close as possible to previous output.

 - The output of exact midpoint values is disabled as noted above.

 - The integer fast-path code is changed somewhat (since we have
   fixed-point output and the upstream did not).

 - Our project style has been largely applied to the code with the
   exception of C99 declaration-after-statement, which has been
   retained as an exception to our present policy.

 - Most of upstream's debugging and conditionals are removed, and we
   use our own configure tests to determine things like uint128
   availability.

Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).

Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.

This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.

Code by me, adapting the work of Ulf Adams and other contributors.

References:
https://dl.acm.org/citation.cfm?id=3192369

Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 15:20:33 +00:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Tom Lane bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Tom Lane f83bf385c1 Preliminary work for pgindent run.
Update typedefs.list from current buildfarm results.  Adjust pgindent's
typedef blacklist to block some more unfortunate typedef names that have
snuck in since last time.  Manually tweak a few places where I didn't
like the initial results of pgindent'ing.
2018-04-26 14:45:04 -04:00
Teodor Sigaev be8a7a6866 Add strict_word_similarity to pg_trgm module
strict_word_similarity is similar to existing word_similarity function but
it takes into account word boundaries to compute similarity.

Author: Alexander Korotkov
Review by: David Steele, Liudmila Mantrova, me
Discussion: https://www.postgresql.org/message-id/flat/CY4PR17MB13207ED8310F847CF117EED0D85A0@CY4PR17MB1320.namprd17.prod.outlook.com
2018-03-21 14:57:42 +03:00
Teodor Sigaev aea7c17e86 Rework word_similarity documentation, make it close to actual algorithm.
word_similarity before claimed as returning similarity of closest word in
string, but, actually it returns similarity of substring. Also fix mistyped
comments.

Author: Alexander Korotkov
Review by: David Steele, Liudmila Mantrova
Discussionis:
https://www.postgresql.org/message-id/flat/CY4PR17MB13207ED8310F847CF117EED0D85A0@CY4PR17MB1320.namprd17.prod.outlook.com
https://www.postgresql.org/message-id/flat/f43b242d-000c-f4c8-cb8b-d37e9752cd93%40postgrespro.ru
2018-03-21 14:35:56 +03:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Teodor Sigaev c28aa157b8 Make pg_trgm tests independ from standard_conforming_string. Tests uses
regular expression which contains backslash.
2017-12-12 14:59:27 +03:00
Peter Eisentraut 2eb4a831e5 Change TRUE/FALSE to true/false
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources.  The upper case spellings are only used
in some files/modules.  So standardize on the standard spellings.

The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.

In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2017-11-08 11:37:28 -05:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane 1dffabed49 Further fix pg_trgm's extraction of trigrams from regular expressions.
Commit 9e43e8714 turns out to have been insufficient: not only is it
necessary to track tentative parent links while considering a set of
arc removals, but it's necessary to track tentative flag additions
as well.  This is because we always merge arc target states into
arc source states; therefore, when considering a merge of the final
state with some other, it is the other state that will acquire a new
TSTATE_FIN bit.  If there's another arc for the same color trigram
that would cause merging of that state with the initial state, we
failed to recognize the problem.  The test cases for the prior commit
evidently only exercised situations where a tentative merge with the
initial state occurs before one with the final state.  If it goes the
other way around, we'll happily merge the initial and final states,
either producing a broken final graph that would never match anything,
or triggering the Assert added by the prior commit.

It's tempting to consider switching the merge direction when the merge
involves the final state, but I lack the time to analyze that idea in
detail.  Instead just keep track of the flag changes that would result
from proposed merges, in the same way that the prior commit tracked
proposed parent links.

Along the way, add some more debugging support, because I'm not entirely
confident that this is the last bug here.  And tweak matters so that
the transformed.dot file uses small integers rather than pointer values
to identify states; that makes it more readable if you're just eyeballing
it rather than fooling with Graphviz.  And rename a couple of identically
named struct fields to reduce confusion.

Per report from Corey Csuhta.  Add a test case based on his example.
(Note: this case does not trigger the bug under 9.3, apparently because
its different measurement of costs causes it to stop merging states before
it hits the failure.  I spent some time trying to find a variant that would
fail in 9.3, without success; but I'm sure such cases exist.)

Like the previous patch, back-patch to 9.3 where this code was added.

Report: https://postgr.es/m/E2B01A4B-4530-406B-8D17-2F67CF9A16BA@csuhta.com
2017-04-14 14:52:21 -04:00
Tom Lane 6cfaffc0dd Fix regexport.c to behave sanely with lookaround constraints.
regexport.c thought it could just ignore LACON arcs, but the correct
behavior is to treat them as satisfiable while consuming zero input
(rather reminiscently of commit 9f1e642d5).  Otherwise, the emitted
simplified-NFA representation may contain no paths leading from initial
to final state, which unsurprisingly confuses pg_trgm, as seen in
bug #14623 from Jeff Janes.

Since regexport's output representation has no concept of an arc that
consumes zero input, recurse internally to find the next normal arc(s)
after any LACON transitions.  We'd be forced into changing that
representation if a LACON could be the last arc reaching the final
state, but fortunately the regex library never builds NFAs with such
a configuration, so there always is a next normal arc.

Back-patch to 9.3 where this logic was introduced.

Discussion: https://postgr.es/m/20170413180503.25948.94871@wrigleys.postgresql.org
2017-04-13 17:18:35 -04:00
Noah Misch 3a0d473192 Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit.  Specific decisions:

- Text search support functions with char* and length arguments, such as
  prsstart and lexize, may receive unaligned strings.  I doubt
  maintainers of non-core text search code will notice.

- Use plain VARDATA() on values detoasted or synthesized earlier in the
  same function.  Use VARDATA_ANY() on varlenas sourced outside the
  function, even if they happen to always have four-byte headers.  As an
  exception, retain the universal practice of using VARDATA() on return
  values of SendFunctionCall().

- Retain PG_GETARG_BYTEA_P() in pageinspect.  (Page images are too large
  for a one-byte header, so this misses no optimization.)  Sites that do
  not call get_page_from_raw() typically need the four-byte alignment.

- For now, do not change btree_gist.  Its use of four-byte headers in
  memory is partly entangled with storage of 4-byte headers inside
  GBT_VARKEY, on disk.

- For now, do not change gtrgm_consistent() or gtrgm_distance().  They
  incorporate the varlena header into a cache, and there are multiple
  credible implementation strategies to consider.
2017-03-12 19:35:34 -04:00
Tom Lane 9e43e8714c Fix contrib/pg_trgm's extraction of trigrams from regular expressions.
The logic for removing excess trigrams from the result was faulty.
It intends to avoid merging the initial and final states of the NFA,
which is necessary, but in testing whether removal of a specific trigram
would cause that, it failed to consider the combined effects of all the
state merges that that trigram's removal would cause.  This could result
in a broken final graph that would never match anything, leading to GIN
or GiST indexscans not finding anything.

To fix, add a "tentParent" field that is used only within this loop,
and set it to show state merges that we are tentatively going to do.
While examining a particular arc, we must chase up through tentParent
links as well as regular parent links (the former can only appear atop
the latter), and we must account for state init/fin flag merges that
haven't actually been done yet.

To simplify the latter, combine the separate init and fin bool fields
into a bitmap flags field.  I also chose to get rid of the "children"
state list, which seems entirely inessential.

Per bug #14563 from Alexey Isayko, which the added test cases are based on.
Back-patch to 9.3 where this code was added.

Report: https://postgr.es/m/20170222111446.1256.67547@wrigleys.postgresql.org
Discussion: https://postgr.es/m/8816.1487787594@sss.pgh.pa.us
2017-02-22 15:04:26 -05:00
Heikki Linnakangas 181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Tom Lane ade49c605f Test all contrib-created operator classes with amvalidate.
I'd supposed that people would do this manually when creating new operator
classes, but the folly of that was exposed today.  The tests seem fast
enough that we can just apply them during the normal regression tests.

contrib/isn fails the checks for lack of complete sets of cross-type
operators.  That's a nice-to-have policy rather than a functional
requirement, so leave it as-is, but insert ORDER BY in the query to
ensure consistent cross-platform output.

Discussion: https://postgr.es/m/7076.1480446837@sss.pgh.pa.us
2016-11-29 15:05:22 -05:00
Tom Lane ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane e611515dd6 pg_trgm's set_limit() function is parallel unsafe, not parallel restricted.
Per buildfarm.  Fortunately, it's not quite too late to squeeze this fix
into the pg_trgm 1.3 update.
2016-06-20 11:29:54 -04:00
Tom Lane 9c852566a3 Fix comparison of similarity to threshold in GIST trigram searches.
There was some very strange code here, dating to commit b525bf77, that
purported to work around an ancient gcc bug by forcing a float4 comparison
to be done as int instead.  Commit 5871b8848 broke that when it changed
one side of the comparison to "double" but left the comparison code alone.
Commit f576b17cd doubled down on the weirdness by introducing a "volatile"
marker, which had nothing to do with the actual problem.

Guess that the gcc bug, even if it's still present in the wild, was
triggered by comparison of float4's and can be avoided if we store the
result of cnt_sml() into a double before comparing to the double "nlimit".
This will at least work correctly on non-broken compilers, and it's way
more readable.

Per bug #14202 from Greg Navis.  Add a regression test based on his
example.

Report: <20160620115321.5792.10766@wrigleys.postgresql.org>
2016-06-20 10:49:19 -04:00
Robert Haas 2910fc8239 Update extensions with GIN/GIST support for parallel query.
Commit 749a787c5b bumped the extension
version on all of these extensions already, and we haven't had a
release since then, so we can make further changes without bumping the
extension version again.  Take this opportunity to mark all of the
functions exported by these modules PARALLEL SAFE -- except for
pg_trgm's set_limit().  Mark that one PARALLEL RESTRICTED, because it
makes a persistent change to a GUC value.

Note that some of the markings added by this commit don't have any
effect; for example, gseg_picksplit() isn't likely to be mentioned
explicitly in a query and therefore it's parallel-safety marking will
never be consulted.  But this commit just marks everything for
consistency: if it were somehow used in a query, that would be fine as
far as parallel query is concerned, since it does not consult any
backend-private state, attempt to write data, etc.

Andreas Karlsson, with a few revisions by me.
2016-06-14 13:34:37 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Tom Lane 749a787c5b Handle contrib's GIN/GIST support function signature changes honestly.
In commits 9ff60273e3 and dbe2328959 I (tgl) fixed the
signatures of a bunch of contrib's GIN and GIST support functions so that
they would pass validation by the recently-added amvalidate functions.
The backend does not actually consult or check those signatures otherwise,
so I figured this was basically cosmetic and did not require an extension
version bump.  However, Alexander Korotkov pointed out that that would
leave us in a pretty messy situation if we ever wanted to redefine those
functions later, because there wouldn't be a unique way to name them.
Since we're going to be bumping these extensions' versions anyway for
parallel-query cleanups, let's take care of this now.

Andreas Karlsson, adjusted for more search-path-safety by me
2016-06-09 16:44:25 -04:00
Robert Haas 2d8a1e22b1 Various minor corrections of and improvements to comments.
Aleksander Alekseev
2016-03-18 09:38:59 -04:00
Teodor Sigaev aa698d7535 pg_trgm's set_limit() now uses SetConfigOption()
Deprecated set_limit() is modified to use SetConfigOption() to set
similarity_threshold which is actually an instance of
pg_trgm.similarity_threshold GUC variable. Previous coding directly sets
similarity_threshold what could cause an inconsistency between states of
actual variable and GUC representation.

Per gripe from Tom Lane
2016-03-18 12:26:27 +03:00
Teodor Sigaev e4b523e5b5 Add files forgotten in f576b17cd6 2016-03-16 19:23:41 +03:00
Teodor Sigaev f576b17cd6 Add word_similarity to pg_trgm contrib module.
Patch introduces a concept of similarity over string and just a word from
another string.

Version of extension is not changed because 1.2 was already introduced in 9.6
release cycle, so, there wasn't a public version.

Author: Alexander Korotkov, Artur Zakirov
2016-03-16 18:59:21 +03:00
Teodor Sigaev 5871b88487 GUC variable pg_trgm.similarity_threshold insead of set_limit()
Use GUC variable pg_trgm.similarity_threshold insead of
set_limit()/show_limit() which was introduced when defining GUC varuables
by modules was absent.

Author: Artur Zakirov
2016-03-16 17:44:58 +03:00
Tom Lane 9ff60273e3 Fix assorted inconsistencies in GiST opclass support function declarations.
The conventions specified by the GiST SGML documentation were widely
ignored.  For example, the strategy-number argument for "consistent" and
"distance" functions is specified to be a smallint, but most of the
built-in support functions declared it as an integer, and for that matter
the core code passed it using Int32GetDatum not Int16GetDatum.  None of
that makes any real difference at runtime, but it's quite confusing for
newcomers to the code, and it makes it very hard to write an amvalidate()
function that checks support function signatures.  So let's try to instill
some consistency here.

Another similar issue is that the "query" argument is not of a single
well-defined type, but could have different types depending on the strategy
(corresponding to search operators with different righthand-side argument
types).  Some of the functions threw up their hands and declared the query
argument as being of "internal" type, which surely isn't right ("any" would
have been more appropriate); but the majority position seemed to be to
declare it as being of the indexed data type, corresponding to a search
operator with both input types the same.  So I've specified a convention
that that's what to do always.

Also, the result of the "union" support function actually must be of the
index's storage type, but the documentation suggested declaring it to
return "internal", and some of the functions followed that.  Standardize
on telling the truth, instead.

Similarly, standardize on declaring the "same" function's inputs as
being of the storage type, not "internal".

Also, somebody had forgotten to add the "recheck" argument to both
the documentation of the "distance" support function and all of their
SQL declarations, even though the C code was happily using that argument.
Clean that up too.

Fix up some other omissions in the docs too, such as documenting that
union's second input argument is vestigial.

So far as the errors in core function declarations go, we can just fix
pg_proc.h and bump catversion.  Adjusting the erroneous declarations in
contrib modules is more debatable: in principle any change in those
scripts should involve an extension version bump, which is a pain.
However, since these changes are purely cosmetic and make no functional
difference, I think we can get away without doing that.
2016-01-19 12:04:36 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Teodor Sigaev 25bfa7efd0 Improve the gin index scan performance in pg_trgm.
Previous coding assumes too pessimistic upper bound of similarity
in consistent method of GIN.

Author: Fornaroli Christophe with comments by me.
2015-12-25 13:05:13 +03:00
Teodor Sigaev 97f3014647 This supports the triconsistent function for pg_trgm GIN opclass
to make it faster to implement indexed queries where some keys are
common and some are rare.

Patch by Jeff Janes
2015-07-20 18:18:48 +03:00
Alvaro Herrera 26df7066cc Move strategy numbers to include/access/stratnum.h
For upcoming BRIN opclasses, it's convenient to have strategy numbers
defined in a single place.  Since there's nothing appropriate, create
it.  The StrategyNumber typedef now lives there, as well as existing
strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from
gist.h).  skey.h is forced to include stratnum.h because of the
StrategyNumber typedef, but gist.h is not; extensions that currently
rely on gist.h for rtree strategy numbers might need to add a new

A few .c files can stop including skey.h and/or gist.h, which is a nice
side benefit.

Per discussion:
https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org

Authored by Emre Hasegeli and Álvaro.

(It's not clear to me why bootscanner.l has any #include lines at all.)
2015-05-15 17:03:16 -03:00
Heikki Linnakangas 4f700bcd20 Reorganize our CRC source files again.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.

Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-14 17:03:42 +03:00
Tom Lane 09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Tom Lane 586dd5d6a5 Replace a bunch more uses of strncpy() with safer coding.
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about.  So just use memcpy() in
these cases.

In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).

I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution).  There are also a
few such calls in ecpg that could possibly do with more analysis.

AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior.  These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
2015-01-24 13:05:42 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Tom Lane 66c029c842 Fix volatility markings of some contrib I/O functions.
In general, datatype I/O functions are supposed to be immutable or at
worst stable.  Some contrib I/O functions were, through oversight, not
marked with any volatility property at all, which made them VOLATILE.
Since (most of) these functions actually behave immutably, the erroneous
marking isn't terribly harmful; but it can be user-visible in certain
circumstances, as per a recent bug report from Joe Van Dyk in which a
cast to text was disallowed in an expression index definition.

To fix, just adjust the declarations in the extension SQL scripts.  If we
were being very fussy about this, we'd bump the extension version numbers,
but that seems like more trouble (for both developers and users) than the
problem is worth.

A fly in the ointment is that chkpass_in actually is volatile, because
of its use of random() to generate a fresh salt when presented with a
not-yet-encrypted password.  This is bad because of the general assumption
that I/O functions aren't volatile: the consequence is that records or
arrays containing chkpass elements may have input behavior a bit different
from a bare chkpass column.  But there seems no way to fix this without
breaking existing usage patterns for chkpass, and the consequences of the
inconsistency don't seem bad enough to justify that.  So for the moment,
just document it in a comment.

Since we're not bumping version numbers, there seems no harm in
back-patching these fixes; at least future installations will get the
functions marked correctly.
2014-11-05 11:34:11 -05:00