of different parsers having different YYSTYPE unions that they want to use
with it. I defined a new union core_YYSTYPE that is just the (very short)
list of semantic values returned by the core scanner. I had originally
worried that this would require an extra interface layer, but actually we can
have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at
no extra cost. Names associated with the core scanner are now "core_yy_foo",
with "base_yy_foo" being used in the core Bison parser and the parser.c
interface layer.
This solves the last serious stumbling block to eliminating plpgsql's separate
lexer. One restriction that will still be present is that plpgsql and the
core will have to agree on the token numbers assigned to tokens that can be
returned by the core lexer. Since Bison doesn't seem willing to accept
external assignments of those numbers, we'll have to live with decreeing that
core and plpgsql grammars declare these tokens first and in the same order.
it works just as well to have them be ordinary identifiers, and this gets rid
of a number of ugly special cases. Plus we aren't interfering with non-rule
usage of these names.
catversion bump because the names change internally in stored rules.
Changes:
Pass in the keyword lookup array instead of having it be hardwired.
(This incidentally allows elimination of some duplicate coding in ecpg.)
Re-order the token declarations in gram.y so that non-keyword tokens have
numbers that won't change when keywords are added or removed.
Add ".." and ":=" to the set of tokens recognized by scan.l. (Since these
combinations are nowhere legal in core SQL, this does not change anything
except the precise wording of the error you get when you write this.)
of features added to flex and bison since this code was originally written.
This change doesn't in itself offer any new capability, but it's needed
infrastructure for planned improvements in plpgsql.
Another feature now available in flex is the ability to make it use palloc
instead of malloc, so do that to avoid possible memory leaks. (We should
at some point change the other lexers likewise, but this commit doesn't
touch them.)
distinction between the external API (parser.h) and declarations that only
need to be visible within the raw parser code (gramparse.h, which now is only
included by parser.c, gram.y, scan.l, and keywords.c). This is in preparation
for the upcoming change to a reentrant lexer, which will require referencing
YYSTYPE in the declarations of base_yylex and filtered_base_yylex, hence
gram.h will have to be included by gramparse.h. We don't want any more files
than absolutely necessary to depend on gram.h, so some cleanup is called for.
Instead put in a test to drop a NULL default at the last moment before
storing the catalog entry. This changes the behavior in a couple of ways:
* Specifying DEFAULT NULL when creating an inheritance child table will
successfully suppress inheritance of any default expression from the
parent's column, where formerly it failed to do so.
* Specifying DEFAULT NULL for a column of a domain type will correctly
override any default belonging to the domain; likewise for a sub-domain.
The latter change happens because by the time the clause is checked,
it won't be a simple null Const but a CoerceToDomain expression.
Personally I think this should be back-patched, but there doesn't seem to
be consensus for that on pgsql-hackers, so refraining.
JOIN, which I removed in a recent fit of over-optimism that we wouldn't
have any future use for it. Now it's needed to support disambiguating
WITH CHECK OPTION from WITH TIME ZONE. As proof of concept, add stub
grammar productions for WITH CHECK OPTION.
parser will allow "\'" to be used to represent a literal quote mark. The
"\'" representation has been deprecated for some time in favor of the
SQL-standard representation "''" (two single quote marks), but it has been
used often enough that just disallowing it immediately won't do. Hence
backslash_quote allows the settings "on", "off", and "safe_encoding",
the last meaning to allow "\'" only if client_encoding is a valid server
encoding. That is now the default, and the reason is that in encodings
such as SJIS that allow 0x5c (ASCII backslash) to be the last byte of a
multibyte character, accepting "\'" allows SQL-injection attacks as per
CVE-2006-2314 (further details will be published after release). The
"on" setting is available for backward compatibility, but it must not be
used with clients that are exposed to untrusted input.
Thanks to Akio Ishida and Yasuo Ohgaki for identifying this security issue.
throw warnings for 100%-SQL-standard constructs, clean up some minor
infelicities, try to un-break ecpg to the best of my ability. (It's not clear
how ecpg is going to find out the setting of standard_conforming_strings,
though.) I think pg_dump still needs work, too.
during parse analysis, not only errors detected in the flex/bison stages.
This is per my earlier proposal. This commit includes all the basic
infrastructure, but locations are only tracked and reported for errors
involving column references, function calls, and operators. More could
be done later but this seems like a good set to start with. I've also
moved the ReportSyntaxErrorPosition logic out of psql and into libpq,
which should make it available to more people --- even within psql this
is an improvement because warnings weren't handled by ReportSyntaxErrorPosition.
not likely ever to be implemented seeing it's been removed from SQL2003.
This allows getting rid of the 'filter' version of yylex() that we had in
parser.c, which should save at least a few microseconds in parsing.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
that the types of untyped string-literal constants are deduced (ie,
when coerce_type is applied to 'em, that's what the type must be).
Remove the ancient hack of storing the input Param-types array as a
global variable, and put the info into ParseState instead. This touches
a lot of files because of adjustment of routine parameter lists, but
it's really not a large patch. Note: PREPARE statement still insists on
exact specification of parameter types, but that could easily be relaxed
now, if we wanted to do so.
I had inadvertently omitted it while rearranging things to support
length-counted incoming messages. Also, change the parser's API back
to accepting a 'char *' query string instead of 'StringInfo', as the
latter wasn't buying us anything except overhead. (I think when I put
it in I had some notion of making the parser API 8-bit-clean, but
seeing that flex depends on null-terminated input, that's not really
ever gonna happen.)
handled as special productions. This is needed to keep us honest about
user-schema type names that happen to coincide with system type names.
Per pghackers discussion 24-Apr. To avoid bloating the keyword list
too much, I removed the translations for datetime, timespan, and lztext,
all of which were slated for destruction several versions back anyway.
Use flex flags -CF. Pass the to-be-scanned string around as StringInfo
type, to avoid querying the length repeatedly. Clean up some code and
remove lex-compatibility cruft. Escape backslash sequences inline. Use
flex-provided yy_scan_buffer() function to set up input, rather than using
myinput().