Commit Graph

265 Commits

Author SHA1 Message Date
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane a69dfe5f40 Don't downcase entries within shared_preload_libraries et al.
load_libraries(), which processes the various xxx_preload_libraries GUCs,
was parsing them using SplitIdentifierString() which isn't really
appropriate for values that could be path names: it downcases unquoted
text, and it doesn't allow embedded whitespace unless quoted.
Use SplitDirectoriesString() instead.  That also allows us to simplify
load_libraries() a bit, since canonicalize_path() is now done for it.

While this definitely seems like a bug fix, it has the potential to
break configuration settings that accidentally worked before because
of the downcasing behavior.  Also, there's an easy workaround for the
bug, namely to double-quote troublesome text.  Hence, no back-patch.

QL Zhuo, tweaked a bit by me

Discussion: https://postgr.es/m/CAB-oJtxHVDc3H+Km3CjB9mY1VDzuyaVH_ZYSz7iXcRqCtb93Ew@mail.gmail.com
2017-06-20 13:03:29 -04:00
Peter Eisentraut 41839b7abc Fix ICU collation use on Windows
Windows uses a separate code path for libc locales.  The code previously
ended up there also if an ICU collation should be used, leading to a
crash.

Reported-by: Ashutosh Sharma <ashu.coek88@gmail.com>
2017-06-16 10:08:54 -04:00
Tom Lane 9ae2661fe1 Tighten checks for whitespace in functions that parse identifiers etc.
This patch replaces isspace() calls with scanner_isspace() in functions
that are likely to be presented with non-ASCII input.  isspace() has
the small advantage that it will correctly recognize no-break space
in single-byte encodings (such as LATIN1); but it cannot work successfully
for any multibyte character, and depending on platform it might return
false positive results for some fragments of multibyte characters.  That's
disastrous for functions that are trying to discard whitespace between
valid strings, as noted in bug #14662 from Justin Muise.  Even treating
no-break space as whitespace is pretty questionable for the usages touched
here, because the core scanner would think it is an identifier character.

Affected functions are parse_ident(), parseNameAndArgTypes (underlying
regprocedurein() and siblings), SplitIdentifierString (used for parsing
GUCs and options that are qualified names or lists of names), and
SplitDirectoriesString (used for parsing GUCs that are lists of
directories).

All the functions adjusted here are parsing SQL identifiers and similar
constructs, so it's reasonable to insist that their definition of
whitespace match the core scanner.  So we can hope that this won't cause
many backwards-compatibility problems.  I've left alone isspace() calls
in places that aren't really expecting any non-ASCII input characters,
such as float8in().

Back-patch to all supported branches.

Discussion: https://postgr.es/m/10129.1495302480@sss.pgh.pa.us
2017-05-24 15:28:34 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane 23c6eb0336 Remove create_singleton_array(), hard-coding the case in its sole caller.
create_singleton_array() was not really as useful as we perhaps thought
when we added it.  It had never accreted more than one call site, and is
only saving a dozen lines of code at that one, which is considerably less
bulk than the function itself.  Moreover, because of its insistence on
using the caller's fn_extra cache space, it's arguably a coding hazard.
text_to_array_internal() does not currently use fn_extra in any other way,
but if it did it would be subtly broken, since the conflicting fn_extra
uses could be needed within a single query, in the seldom-tested case that
the field separator varies during the query.  The same objection seems
likely to apply to any other potential caller.

The replacement code is a bit uglier, because it hardwires knowledge of
the storage parameters of type TEXT, but it's not like we haven't got
dozens or hundreds of other places that do the same.  Uglier seems like
a good tradeoff for smaller, faster, and safer.

Per discussion with Neha Khatri.

Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
2017-05-02 20:41:37 -04:00
Peter Eisentraut 066e3a68ae Fix locale pointer use in WIN32 code path
Author: David Rowley <david.rowley@2ndquadrant.com>
2017-03-25 00:38:12 -04:00
Peter Eisentraut eccfef81e1 ICU support
Add a column collprovider to pg_collation that determines which library
provides the collation data.  The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.

The pg_locale_t type is changed to a union that contains the
provider-specific locale handles.  Users of locale information are
changed to look into that struct for the appropriate handle to use.

Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures.  This is currently only supported by
ICU-provided collations.

initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.

Currently, ICU-provided collations can only be explicitly named
collations.  The global database locales are still always libc-provided.

ICU support is enabled by configure --with-icu.

Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
2017-03-23 15:28:48 -04:00
Noah Misch 3a0d473192 Use wrappers of PG_DETOAST_DATUM_PACKED() more.
This makes almost all core code follow the policy introduced in the
previous commit.  Specific decisions:

- Text search support functions with char* and length arguments, such as
  prsstart and lexize, may receive unaligned strings.  I doubt
  maintainers of non-core text search code will notice.

- Use plain VARDATA() on values detoasted or synthesized earlier in the
  same function.  Use VARDATA_ANY() on varlenas sourced outside the
  function, even if they happen to always have four-byte headers.  As an
  exception, retain the universal practice of using VARDATA() on return
  values of SendFunctionCall().

- Retain PG_GETARG_BYTEA_P() in pageinspect.  (Page images are too large
  for a one-byte header, so this misses no optimization.)  Sites that do
  not call get_page_from_raw() typically need the four-byte alignment.

- For now, do not change btree_gist.  Its use of four-byte headers in
  memory is partly entangled with storage of 4-byte headers inside
  GBT_VARKEY, on disk.

- For now, do not change gtrgm_consistent() or gtrgm_distance().  They
  incorporate the varlena header into a cache, and there are multiple
  credible implementation strategies to consider.
2017-03-12 19:35:34 -04:00
Fujii Masao e14ec7d346 Fix typo in comment.
neha khatri
2017-02-22 03:39:45 +09:00
Peter Eisentraut f21a563d25 Move some things from builtins.h to new header files
This avoids that builtins.h has to include additional header files.
2017-01-20 20:29:53 -05:00
Alvaro Herrera 9a34123bc3 Make messages mentioning type names more uniform
This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names).  This continues what started
as f402b99501.

Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
2017-01-18 16:08:20 -03:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Tom Lane da6c4f6ca8 Refer to OS X as "macOS", except for the port name which is still "darwin".
We weren't terribly consistent about whether to call Apple's OS "OS X"
or "Mac OS X", and the former is probably confusing to people who aren't
Apple users.  Now that Apple has rebranded it "macOS", follow their lead
to establish a consistent naming pattern.  Also, avoid the use of the
ancient project name "Darwin", except as the port code name which does not
seem desirable to change.  (In short, this patch touches documentation and
comments, but no actual code.)

I didn't touch contrib/start-scripts/osx/, either.  I suspect those are
obsolete and due for a rewrite, anyway.

I dithered about whether to apply this edit to old release notes, but
those were responsible for quite a lot of the inconsistencies, so I ended
up changing them too.  Anyway, Apple's being ahistorical about this,
so why shouldn't we be?
2016-09-25 15:40:57 -04:00
Heikki Linnakangas ec136d19b2 Move code shared between libpq and backend from backend/libpq/ to common/.
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.

Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.

Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.

There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.

Patch by Michael Paquier, with some changes by me.

Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
2016-09-02 13:49:59 +03:00
Peter Eisentraut 8a56d4e361 Make format() error messages consistent again
07d25a964 changed only one occurrence.  Change the other one as well.
2016-08-08 08:15:41 -04:00
Robert Haas 4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Robert Haas 3df9c374e2 Disable abbreviated keys for string-sorting in non-C locales.
Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard.  This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative.  Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.

Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.

Report by Marc-Olaf Jaschke.  Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me.  Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
2016-03-23 16:03:13 -04:00
Teodor Sigaev 07d25a964b Improve error reporting in format()
Clarify invalid format conversion type error message and add hint.

Author: Jim Nasby
2016-02-11 18:11:11 +03:00
Tom Lane 0231f83856 Re-pgindent varlena.c.
Just to make sure previous commit worked ...
2016-02-08 15:17:40 -05:00
Tom Lane 58e797216f Rename typedef "string" to "VarString".
Since pgindent treats typedef names as global, the original coding of
b47b4dbf68 would have had rather nasty effects on the formatting
of other files in which "string" is used as a variable or field name.
Use a less generic name for this typedef, and rename some other
identifiers to match.

Peter Geoghegan, per gripe from me
2016-02-08 15:15:56 -05:00
Robert Haas 63f39b9148 Fix small goof in comment.
Peter Geoghegan
2016-02-05 08:04:48 -05:00
Robert Haas b47b4dbf68 Extend sortsupport for text to more opclasses.
Have varlena.c expose an interface that allows the char(n), bytea, and
bpchar types to piggyback on a now-generalized SortSupport for text.
This pushes a little more knowledge of the bpchar/char(n) type into
varlena.c than might be preferred, but that seems like the approach
that creates least friction.  Also speed things up for index builds
that use text_pattern_ops or varchar_pattern_ops.

This patch does quite a bit of renaming, but it seems likely to be
worth it, so as to avoid future confusion about the fact that this code
is now more generally used than the old names might have suggested.

Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson,
with small tweaks by me.
2016-02-03 14:29:53 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Peter Eisentraut 30c0c4bf12 Remove unnecessary escaping in C character literals
'\"' is more commonly written simply as '"'.
2015-12-22 22:43:46 -05:00
Robert Haas 0279f62fdc Correct tiny inaccuracy in strxfrm cache comment.
Peter Geoghegan
2015-11-03 08:32:22 -05:00
Robert Haas 5be94a9eb1 Be a bit more rigorous about how we cache strcoll and strxfrm results.
Commit 0e57b4d8bd contained some clever
logic that attempted to make sure that we couldn't get confused about
whether the last thing we cached was a strcoll() result or a strxfrm()
result, but it wasn't quite clever enough, because we can perform
further abbreviations after having already performed some comparisons.
Introduce an explicit flag in the hopes of making this watertight.

Peter Geoghegan, reviewed by me.
2015-10-20 09:27:50 -04:00
Robert Haas d53f808e7e Remove obsolete comment.
Peter Geoghegan
2015-10-20 09:15:13 -04:00
Robert Haas 0e57b4d8bd Speed up text sorts where the same strings occur multiple times.
Cache strxfrm() blobs across calls made to the text SortSupport
abbreviation routine.  This can speed up sorting if the same string
needs to be abbreviated many times in a row.

Also, cache the result of the previous strcoll() comparison, so that
if we're asked to compare the same strings agin, we do need to call
strcoll() again.

Perhaps surprisingly, these optimizations don't seem to hurt even when
they don't help.  memcmp() is really cheap compared to strcoll() or
strxfrm().

Peter Geoghegan, reviewed by me.
2015-10-09 19:03:44 -04:00
Robert Haas bfb54ff15a Make abbreviated key comparisons for text a bit cheaper.
If we do some byte-swapping while abbreviating, we can do comparisons
using integer arithmetic rather than memcmp.

Peter Geoghegan, reviewed and slightly revised by me.
2015-10-09 15:06:06 -04:00
Robert Haas b48ecf862b In bttext_abbrev_convert, move pfree to the right place.
Without this, we might access memory that's already been freed, or
leak memory if in the C locale.

Peter Geoghegan
2015-06-29 23:53:05 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas 4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Robert Haas aea652abd3 Make trace_sort control abbreviation debug output for the text opclass.
This is consistent with what the new numeric suppor for abbreviated keys
now does, and seems much more convenient than having a separate compiler
define to control this debug output.

Peter Geoghegan
2015-04-07 22:45:17 -04:00
Robert Haas f85155e18c Change the way we decide whether to give up on abbreviated text keys.
Be more aggressive about aborting early on if it looks like it's not
helping, but be less aggressive about aborting later on, since it's
more expensive at that point, and also since we're currently aborting
in some cases where abbreviation can still deliver a substantial win.

Peter Geoghegan. Extensive testing by Tomas Vondra.
2015-04-03 08:32:05 -04:00
Robert Haas c02ef232c1 Add missing calls to DatumGetUInt32.
These were inadvertently ommitted from the commit that introduced
abbreviated keys, commit 4ea51cdfe8.

Peter Geoghegan
2015-04-02 11:57:35 -04:00
Robert Haas 168a809d4b Re-enable abbreviated keys on Windows.
Commit 1be4eb1b2d disabled this, but I
think the real problem here was fixed by commit
b181a91981 and commit
d060e07fa9.  So let's try re-enabling
it now and see what happens.
2015-01-26 14:28:14 -05:00
Tom Lane 586dd5d6a5 Replace a bunch more uses of strncpy() with safer coding.
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.

A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about.  So just use memcpy() in
these cases.

In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).

I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution).  There are also a
few such calls in ecpg that could possibly do with more analysis.

AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior.  These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
2015-01-24 13:05:42 -05:00
Robert Haas d1747571b6 Fix typos, update README.
Peter Geoghegan
2015-01-23 15:06:53 -05:00
Robert Haas d060e07fa9 Repair brain fade in commit b181a91981.
The split between which things need to happen in the C-locale case and
which needed to happen in the locale-aware case was a few bricks short
of a load.  Try to fix that.
2015-01-22 12:51:20 -05:00
Robert Haas b181a91981 More fixes for abbreviated keys infrastructure.
First, when LC_COLLATE = C, bttext_abbrev_convert should use memcpy()
rather than strxfrm() to construct the abbreviated key, because the
authoritative comparator uses memcpy().  If we do anything else here,
we might get inconsistent answers, and the buildfarm says this risk
is not theoretical.  It should be faster this way, too.

Second, while I'm looking at bttext_abbrev_convert, convert a needless
use of goto into the loop it's trying to implement into an actual
loop.

Both of the above problems date to the original commit of abbreviated
keys, commit 4ea51cdfe8.

Third, fix a bogus assignment to tss->locale before tss is set up.
That's a new goof in commit b529b65d1b.
2015-01-22 11:58:58 -05:00
Robert Haas b529b65d1b Heavily refactor btsortsupport_worker.
Prior to commit 4ea51cdfe8, this function
only had one job, which was to decide whether we could avoid trampolining
through the fmgr layer when performing sort comparisons.  As of that
commit, it has a second job, which is to decide whether we can use
abbreviated keys.  Unfortunately, those two tasks are somewhat intertwined
in the existing coding, which is likely why neither Peter Geoghegan nor
I noticed prior to commit that this calls pg_newlocale_from_collation() in
cases where it didn't previously.  The buildfarm noticed, though.

To fix, rewrite the logic so that the decision as to which comparator to
use is more cleanly separated from the decision about abbreviation.
2015-01-22 10:54:16 -05:00
Robert Haas 1be4eb1b2d Disable abbreviated keys on Windows.
Most of the Windows buildfarm members (bowerbird, hamerkop, currawong,
jacana, brolga) are unhappy with yesterday's abbreviated keys patch,
although there are some (narwhal, frogmouth) that seem OK with it.
Since there's no obvious pattern to explain why some are working and
others are failing, just disable this across-the-board on Windows for
now.  This is a bit unfortunate since the optimization will be a big
win in some cases, but we can't leave the buildfarm broken.
2015-01-20 20:32:21 -05:00
Robert Haas 4ea51cdfe8 Use abbreviated keys for faster sorting of text datums.
This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob.  If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the
comparison.

There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys.  HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.

Peter Geoghegan, reviewed by me.
2015-01-19 15:28:27 -05:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Robert Haas c0828b78e9 Move the guts of our Levenshtein implementation into core.
The hope is that we can use this to produce better diagnostics in
some cases.

Peter Geoghegan, reviewed by Michael Paquier, with some further
changes by me.
2014-11-13 12:33:26 -05:00
Robert Haas e246b3d6ea Add a fast pre-check for equality of equal-length strings.
Testing reveals that that doing a memcmp() before the strcoll() costs
practically nothing, at least on the systems we tested, and it speeds
up sorts containing many equal strings significatly.

Peter Geoghegan.  Review by myself and Heikki Linnakangas.  Comments
rewritten by me.
2014-09-19 12:39:00 -04:00
Robert Haas 9522ec3e70 Fix typo in b34e37bfef.
Spotted by Peter Geoghegan.
2014-08-26 15:58:50 -04:00
Robert Haas b34e37bfef Add sortsupport routines for text.
This provides a small but worthwhile speedup when sorting text, at least
in cases to which the sortsupport machinery applies.

Robert Haas and Peter Geoghegan
2014-08-14 12:09:52 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00