postgresql

Commit Graph

Author	SHA1	Message	Date
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Heikki Linnakangas	d2495f272c	Fix bug in to_tsquery(). We were using memcpy() to copy to a possibly overlapping memory region, which is a no-no. Use memmove() instead.	2012-05-15 19:27:34 +03:00
Peter Eisentraut	e9605a039b	Even more duplicate word removal, in the spirit of the season	2012-05-02 20:56:03 +03:00
Robert Haas	85efd5f065	Reduce hash size for compute_array_stats, compute_tsvector_stats. The size is only a hint, but a big hint chews up a lot of memory without apparently improving performance much. Analysis and patch by Noah Misch.	2012-04-23 22:05:41 -04:00
Tom Lane	0e5e167aae	Collect and use element-frequency statistics for arrays. This patch improves selectivity estimation for the array <@, &&, and @> (containment and overlaps) operators. It enables collection of statistics about individual array element values by ANALYZE, and introduces operator-specific estimators that use these stats. In addition, ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)" and "const <> ANY/ALL (array_column)" are estimated by treating them as variants of the containment operators. Since we still collect scalar-style stats about the array values as a whole, the pg_stats view is expanded to show both these stats and the array-style stats in separate columns. This creates an incompatible change in how stats for tsvector columns are displayed in pg_stats: the stats about lexemes are now displayed in the array-related columns instead of the original scalar-related columns. There are a few loose ends here, notably that it'd be nice to be able to suppress either the scalar-style stats or the array-element stats for columns for which they're not useful. But the patch is in good enough shape to commit for wider testing. Alexander Korotkov, reviewed by Noah Misch and Nathan Boley	2012-03-03 20:20:57 -05:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Peter Eisentraut	1b81c2fe6e	Remove many -Wcast-qual warnings This addresses only those cases that are easy to fix by adding or moving a const qualifier or removing an unnecessary cast. There are many more complicated cases remaining.	2011-09-11 21:54:32 +03:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Bruce Momjian	6560407c7d	Pgindent run before 9.1 beta2.	2011-06-09 14:32:50 -04:00
Tom Lane	6755558b92	Improve aset.c's space management in contexts with small maxBlockSize. The previous coding would allow requests up to half of maxBlockSize to be treated as "chunks", but when that actually did happen, we'd waste nearly half of the space in the malloc block containing the chunk, if no smaller requests came along to fill it. Avoid this scenario by limiting the maximum size of a chunk to 1/8th maxBlockSize, so that we can waste no more than 1/8th of the allocated space. This will not change the behavior at all for the default context size parameters (with large maxBlockSize), but it will change the behavior when using ALLOCSET_SMALL_MAXSIZE. In particular, there's no longer a need for spell.c to be overly concerned about the request size parameters it uses, so remove a rather unhelpful comment about that. Merlin Moncure, per an idea of Tom Lane's	2011-05-02 12:08:08 -04:00
Tom Lane	2ab0796d7a	Fix char2wchar/wchar2char to support collations properly. These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.	2011-04-23 12:35:41 -04:00
Tom Lane	d64713df7e	Pass collations to functions in FunctionCallInfoData, not FmgrInfo. Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...	2011-04-12 19:19:24 -04:00
Tom Lane	1e16a8107d	Teach regular expression operators to honor collations. This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.	2011-04-10 18:03:09 -04:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Tom Lane	52b60530f2	Fix tsmatchsel() to account properly for null rows. ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh	2011-02-17 19:00:49 -05:00
Bruce Momjian	135724ec35	Fix "variable not used" warnings when USE_WIDE_UPPER_LOWER is not defined.	2011-02-10 16:58:02 -05:00
Peter Eisentraut	414c5a2ea6	Per-column collation support This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch	2011-02-08 23:04:18 +02:00
Bruce Momjian	97116ca417	Rename macro DECIMAL to DECIMAL_T to help pgindent; this is already done for a few other macros in that file, for other reasons. I also remove pgindent/README mention of the file.	2011-02-06 10:48:17 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Robert Haas	5aa446c961	Cleanup various comparisons with the constant "true". Itagaki Takahiro, with slight modifications.	2010-11-14 21:03:48 -05:00
Tom Lane	3e5f9412d0	Reduce the memory requirement for large ispell dictionaries. This patch eliminates per-chunk palloc overhead for most small allocations needed in the representation of an ispell dictionary. This saves close to a factor of 2 on the current Czech ispell data. While it doesn't cover every last small allocation in the ispell code, we are at the point of diminishing returns, because about 95% of the allocations are covered already. Pavel Stehule, rather heavily revised by Tom	2010-10-06 19:31:05 -04:00
Tom Lane	9b910def24	Clean up temporary-memory management during ispell dictionary loading. Add explicit initialization and cleanup functions to spell.c, and keep all working state in the already-existing ISpellDict struct. This lets us get rid of a static variable along with some extremely shaky assumptions about usage of child memory contexts. This commit is just code beautification and has no impact on functionality or performance, but it opens the way to a less-grotty implementation of Pavel's memory-saving hack, which will follow shortly.	2010-10-06 15:15:15 -04:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Peter Eisentraut	3f11971916	Remove extra newlines at end and beginning of files, add missing newlines at end of files.	2010-08-19 05:57:36 +00:00
Robert Haas	fd1843ff89	Standardize get_whatever_oid functions for other object types. - Rename TSParserGetPrsid to get_ts_parser_oid. - Rename TSDictionaryGetDictid to get_ts_dict_oid. - Rename TSTemplateGetTmplid to get_ts_template_oid. - Rename TSConfigGetCfgid to get_ts_config_oid. - Rename FindConversionByName to get_conversion_oid. - Rename GetConstraintName to get_constraint_oid. - Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid, get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid. The name of each function matches the corresponding catalog. Thanks to KaiGai Kohei for the review.	2010-08-05 15:25:36 +00:00
Tom Lane	97532f7c29	Add some knowledge about prefix matches to tsmatchsel(). It's not terribly bright, but it beats assuming that a prefix match behaves identically to an exact match, which is what the code was doing before :-(. Noted while experimenting with Artur Dobrowski's example.	2010-08-01 21:31:08 +00:00
Tom Lane	b8c798ebc5	Tweak tsmatchsel() so that it examines the structure of the tsquery whenever possible (ie, whenever the tsquery is a constant), even when no statistics are available for the tsvector. For example, foo @@ 'a & b'::tsquery can be expected to be more selective than foo @@ 'a'::tsquery, whether or not we know anything about foo. We use DEFAULT_TS_MATCH_SEL as the assumed selectivity of individual query terms when no stats are available, then combine the terms according to the query's AND/OR structure as usual. Per experimentation with Artur Dabrowski's example. (The fact that there are no stats available in that example is a problem in itself, but nonetheless tsmatchsel should be smarter about the case.) Back-patch to 8.4 to keep all versions of tsmatchsel() in sync.	2010-07-31 03:27:40 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Tom Lane	bc0f080928	Fix misuse of Lossy Counting (LC) algorithm in compute_tsvector_stats(). We must filter out hashtable entries with frequencies less than those specified by the algorithm, else we risk emitting junk entries whose actual frequency is much less than other lexemes that did not get tabulated. This is bad enough by itself, but even worse is that tsquerysel() believes that the minimum frequency seen in pg_statistic is a hard upper bound for lexemes not included, and was thus underestimating the frequency of non-MCEs. Also, set the threshold frequency to something with a little bit of theory behind it, to wit assume that the input distribution is approximately Zipfian. This might need adjustment in future, but some preliminary experiments suggest that it's not too unreasonable. Back-patch to 8.4, where this code was introduced. Jan Urbanski, with some editorialization by Tom	2010-05-30 21:59:02 +00:00
Tom Lane	ed437e2b27	Adjust comments about avoiding use of printf's %.s. My initial impression that glibc was measuring the precision in characters (which is what the Linux man page says it does) was incorrect. It does take the precision to be in bytes, but it also tries to truncate the string at a character boundary. The bottom line remains the same: it will mess up if the string is not in the encoding it expects, so we need to avoid %.s anytime there's a significant risk of that. Previous code changes are still good, but adjust the comments to reflect this knowledge. Per research by Hernan Gonzalez.	2010-05-09 02:16:00 +00:00
Tom Lane	54cd4f0457	Work around a subtle portability problem in use of printf %s format. Depending on which spec you read, field widths and precisions in %s may be counted either in bytes or characters. Our code was assuming bytes, which is wrong at least for glibc's implementation, and in any case libc might have a different idea of the prevailing encoding than we do. Hence, for portable results we must avoid using anything more complex than just "%s" unless the string to be printed is known to be all-ASCII. This patch fixes the cases I could find, including the psql formatting failure reported by Hernan Gonzalez. In HEAD only, I also added comments to some places where it appears safe to continue using "%.*s".	2010-05-08 16:39:53 +00:00
Tom Lane	2c265adea3	Modify the built-in text search parser to handle URLs more nearly according to RFC 3986. In particular, these characters now terminate the path part of a URL: '"', '<', '>', '\', '^', '`', '{', '\|', '}'. The previous behavior was inconsistent and depended on whether a "?" was present in the path. Per gripe from Donald Fraser and spec research by Kevin Grittner. This is a pre-existing bug, but not back-patching since the risks of breaking existing applications seem to outweigh the benefits.	2010-04-28 02:04:16 +00:00
Tom Lane	8f0ab2298f	Add missing newlines in WPARSER_TRACE output.	2010-04-26 17:10:18 +00:00
Bruce Momjian	89b0095ebd	Allow underscores in tsearch email addressses, per RFC 5322 and report by Dan O'Hara. Patch by Teodor Sigaev	2010-03-13 00:41:58 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Tom Lane	40608e7f94	When estimating the selectivity of an inequality "column > constant" or "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.	2010-01-04 02:44:40 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tom Lane	21d11e7ee2	Avoid unnecessary copying of source string when generating a cloned TParser. For long source strings the copying results in O(N^2) behavior, and the multiplier can be significant if wide-char conversion is involved. Andres Freund, reviewed by Kevin Grittner.	2009-12-15 20:37:17 +00:00
Tom Lane	908854209b	Avoid core dump on empty thesaurus dictionary. Per report from Robert Gravsjö.	2009-11-30 16:38:31 +00:00
Peter Eisentraut	66363e8d6d	Make text search parser accept underscores in XML attributes (bug #5075 )	2009-11-15 13:57:01 +00:00
Peter Eisentraut	f1c5247563	Simplify a few makefile rules since install-sh can now install multiple files in one run.	2009-10-26 21:33:01 +00:00
Tom Lane	dd6de24e69	Remove duplicate variable initializations identified by clang static checker. One of these represents a nontrivial bug (a promptly-leaked palloc), so backpatch. Greg Stark	2009-08-30 16:53:31 +00:00
Peter Eisentraut	9d182ef002	Update of install-sh, mkinstalldirs, and associated configury Update install-sh to that from Autoconf 2.63, plus our Darwin-specific changes (which I simplified a bit). install-sh is now able to install multiple files in one run, so we could simplify our makefiles sometime. install-sh also now has a -d option to create directories, so we don't need mkinstalldirs anymore. Use AC_PROG_MKDIR_P in configure.in, so we can use mkdir -p when available instead of install-sh -d. For consistency with the rest of the world, the corresponding make variable has been renamed from $(mkinstalldirs) to $(MKDIR_P).	2009-08-26 22:24:44 +00:00
Teodor Sigaev	a88a48011c	Introduce filtering dictionary support to tsearch. Propagate --nolocale option to CREATE DATABASE command in pg_regress to allow correct checking of locale-sensitive contrib modules.	2009-08-18 10:30:41 +00:00
Teodor Sigaev	abd8c94ff9	Add prefix support for synonym dictionary	2009-08-14 14:53:20 +00:00
Peter Eisentraut	de160e2c00	Make backend header files C++ safe This alters various incidental uses of C++ key words to use other similar identifiers, so that a C++ compiler won't choke outright. You still (probably) need extern "C" { }; around the inclusion of backend headers. based on a patch by Kurt Harriman <harriman@acm.org> Also add a script cpluspluscheck to check for C++ compatibility in the future. As of right now, this passes without error for me.	2009-07-16 06:33:46 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	a734979e0a	Fix tsquerysel() to not fail on an empty TSQuery. Per report from Tatsuo Ishii.	2009-06-03 18:42:13 +00:00
Teodor Sigaev	e43bb5beb7	Some languages have symbols with zero display's width or/and vowels/signs which are not an alphabetic character although they are not word-breakers too. So, treat them as part of word. Per off-list discussion with Dibyendra Hyoju <dibyendra@gmail.com> and and Bal Krishna Bal <balkrishna7bal@gmail.com> about Nepali language and Devanagari alphabet.	2009-03-11 16:03:40 +00:00

1 2 3

118 Commits