postgresql/doc/src/sgml/ecpg.sgml

<Chapter>
<DocInfo>
<AuthorGroup>
<Author>
<FirstName>Linux</FirstName>
<Surname>Tolke</Surname>
</Author>
<Author>
<FirstName>Michael</FirstName>
<Surname>Meskes</Surname>
</Author>
</AuthorGroup>
<Copyright>
<Year>1996-1997</Year>
<Holder>Linus Tolke</Holder>
</Copyright>
<Copyright>
<Year>1998</Year>
<Holder>Michael Meskes</Holder>
</Copyright>
<Date>Transcribed 1998-02-12</Date>
</DocInfo>

<Title><Application>ecpg</Application> - Embedded <Acronym>SQL</Acronym> in <Acronym>C</Acronym></Title>

<Para>
This describes an embedded <Acronym>SQL</Acronym> in <Acronym>C</Acronym> package for <ProductName>Postgres</ProductName>.

It is written by <ULink url="mailto:linus@epact.se">Linus Tolke</ULink>
and <ULink url="mailto:meskes@debian.org">Michael Meskes</ULink>.

<Note>
<Para>
Permission is granted to copy and use in the same way as you are allowed
to copy and use the rest of the <ProductName>PostgreSQL</ProductName>.
</Para>
</Note>

<Sect1>
<Title>Why Embedded <Acronym>SQL</Acronym>?</Title>

<Para>
Embedded <Acronym>SQL</Acronym> has some small advantages over other ways to handle <Acronym>SQL</Acronym>
queries. It takes care of all the tedious moving of information to and
from variables in your <Acronym>C</Acronym> program. Many <Acronym>RDBMS</Acronym> packages
support this embedded language.

<Para>
There is an ANSI-standard describing how the embedded language should
work. Most embedded <Acronym>SQL</Acronym> preprocessors I have seen and heard of make
extensions so it is difficult to obtain portability between them
anyway. I have not read the standard but I hope that my implementation
does not deviate too much and that it would be possible to port programs
with embedded <Acronym>SQL</Acronym> written for other <Acronym>RDBMS</Acronym> packages 
to <ProductName>Postgres</ProductName> and thus
promoting the spirit of free software.

<Sect1>
<Title>The Concept</Title>

<Para>
You write your program in <Acronym>C</Acronym> with some special <Acronym>SQL</Acronym> things.
For declaring variables that can be used in <Acronym>SQL</Acronym> statements you need to
put them in a special declare section.
You use a special syntax for the <Acronym>SQL</Acronym> queries.

<Para>
Before compiling you run the file through the embedded <Acronym>SQL</Acronym> <Acronym>C</Acronym>
preprocessor and it converts the <Acronym>SQL</Acronym> statements you used to function
calls with the variables used as arguments. Both variables that are used
as input to the <Acronym>SQL</Acronym> statements and variables that will contain the
result are passed.

<Para>
Then you compile and at link time you link with a special library that
contains the functions used. These functions (actually it is mostly one
single function) fetches the information from the arguments, performs
the <Acronym>SQL</Acronym> query using the ordinary interface (<FileName>libpq</FileName>) and puts back
the result in the arguments dedicated for output.

<Para>
Then you run your program and when the control arrives to the <Acronym>SQL</Acronym>
statement the <Acronym>SQL</Acronym> statement is performed against the database and you
can continue with the result.


<Sect1>
<Title>How To Use <Application>egpc</Application></Title>

<Para>
This section describes how to use the <Application>egpc</Application> tool.

<Sect2>
<Title>Preprocessor

<Para>
The preprocessor is called <Application>ecpg</Application>. After installation it resides in
the <ProductName>Postgres</ProductName> <FileName>bin/</FileName> directory. 

<Sect2>
<Title>Library

<Para>
The <Application>ecpg</Application> library is called <FileName>libecpg.a</FileName> or
<FileName>libecpg.so</FileName>. Additionally, the library
uses the <FileName>libpq</FileName> library for communication to the 
<ProductName>Postgres</ProductName> server so you will
have to link your program with <Parameter>-lecpg -lpq</Parameter>.

<Para>
The library has some methods that are "hidden" but that could prove very
useful sometime.

<VariableList>
<VarListEntry>
<Term>ECPGdebug(int, FILE *stream)</Term>
<ListItem>
<Para>
If this is called, with the first argument non-zero, then debuglogging is turned
on. Debuglogging is done on <Function>stream</Function>. Most <Acronym>SQL</Acronym> statement logs its
arguments and result.

<Para>
The most important one (<Function>ECPGdo</Function>) that is called on all <Acronym>SQL</Acronym>
statements except <Command>EXEC SQL COMMIT</Command>, <Command>EXEC SQL ROLLBACK</Command>, 
<Command>EXEC SQL CONNECT</Command> logs both its expanded string, i.e. the string
with all the input variables inserted, and the result from the
<ProductName>Postgres</ProductName> server. This can be very useful when searching for errors
in your <Acronym>SQL</Acronym> statements.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>ECPGstatus()</Term>
<ListItem>
<Para>
This method returns TRUE if we are connected to a database and FALSE if not.
</Para>
</ListItem>
</VarListEntry>
</VariableList>

<Sect2>
<Title>Error handling

<Para>
To be able to detect errors from the <ProductName>Postgres</ProductName> server you include a line
like
<ProgramListing>
exec sql include sqlca;
</ProgramListing>
in the include section of your file. This will define a struct and a
variable with the name <Parameter>sqlca</Parameter> as following:
<ProgramListing>
struct sqlca {
    int sqlcode;
    struct {
        int sqlerrml;
        char sqlerrmc[1000];
    } sqlerrm;
} sqlca;
</ProgramListing>

<Para>
If an error occured in the last <Acronym>SQL</Acronym> statement then <Parameter>sqlca.sqlcode</Parameter>
will be non-zero. If <Parameter>sqlca.sqlcode</Parameter> is less that 0 then this is
some kind of serious error, like the database definition does not match
the query given. If it is bigger than 0 then this is a normal error like
the table did not contain the requested row.

<Para>
sqlca.sqlerrm.sqlerrmc will contain a string that describes the error.
The string ends with <Quote>line 23.</Quote> where the line is the line number
in the source file (actually the file generated by the preprocessor but
I hope I can fix this to be the line number in the input file.)

<Para>
List of errors that can occur:

<VariableList>
<VarListEntry>
<Term>-1, Unsupported type %s on line %d.</Term>
<ListItem>
<Para>
Does not normally occur. This is a sign that the preprocessor has
generated something that the library does not know about. Perhaps you
are running incompatible versions of the preprocessor and the library.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Too many arguments line %d.</Term>
<ListItem>
<Para>
The preprocessor has goofed up and generated some incorrect code.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Too few arguments line %d.</Term>
<ListItem>
<Para>
The preprocessor has goofed up and generated some incorrect code.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Error starting transaction line %d.</Term>
<ListItem>
<Para>
<ProductName>Postgres</ProductName> signalled to us that we cannot open the connection.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Postgres error: %s line %d.</Term>
<ListItem>
<Para>
Some <ProductName>Postgres</ProductName> error. The message contains the error message from the
<ProductName>Postgres</ProductName> backend.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>1, Data not found line %d.</Term>
<ListItem>
<Para>
This is a "normal" error that tells you that what you are quering cannot
be found or we have gone through the cursor.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, To many matches line %d.</Term>
<ListItem>
<Para>
This means that the query has returned several lines. The <Command>SELECT</Command>
you made probably was not unique.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Not correctly formatted int type: %s line %d.</Term>
<ListItem>
<Para>
This means that the host variable is of an <Type>int</Type> type and the field
in the <ProductName>Postgres</ProductName> database is of another type and contains a value that
cannot be interpreted as an <Type>int</Type>. The library uses <Function>strtol</Function>
for this conversion.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Not correctly formatted unsigned type: %s line %d.</Term>
<ListItem>
<Para>
This means that the host variable is of an <Type>unsigned int</Type> type and
the field in the <ProductName>Postgres</ProductName> database is of another type and contains a
value that cannot be interpreted as an <Type>unsigned int</Type>. The library
uses <Function>strtoul</Function> for this conversion.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Not correctly formatted floating point type: %s line %d.</Term>
<ListItem>
<Para>
This means that the host variable is of an <Type>float</Type> type and
the field in the <ProductName>Postgres</ProductName> database is of another type and contains a
value that cannot be interpreted as an <Type>float</Type>. The library
uses <Function>strtod</Function> for this conversion.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Too few arguments line %d.</Term>
<ListItem>
<Para>
This means that <ProductName>Postgres</ProductName> has returned more records than we have
matching variables. Perhaps you have forgotten a couple of the host
variables in the <Command>INTO :var1,:var2</Command>-list.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Too many arguments line %d.</Term>
<ListItem>
<Para>
This means that <ProductName>Postgres</ProductName> has returned fewer records than we have
host variables. Perhaps you have to many host variables in the 
<Command>INTO :var1,:var2</Command>-list.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Empty query line %d.</Term>
<ListItem>
<Para>
<ProductName>Postgres</ProductName> returned PGRES_EMPTY_QUERY.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Error: %s line %d.</Term>
<ListItem>
<Para>
This means that <ProductName>Postgres</ProductName> returned on of the errors
PGRES_NONFATAL_ERROR, PGRES_FATAL_ERROR or PGRES_BAD_RESPONSE. Which one
and why is explained in the message.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Postgres error line %d.</Term>
<ListItem>
<Para>
<ProductName>Postgres</ProductName> returns something that the library does not know how to
handle. This is probably because the version of <ProductName>Postgres</ProductName> does not
match the version of the <Application>ecpg</Application> library.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Error committing line %d.</Term>
<ListItem>
<Para>
Error during <Command>COMMIT</Command>. <Command>EXEC SQL COMMIT</Command> is translated to an
<Command>end</Command> operation in <ProductName>Postgres</ProductName> and that is the operation that could
not be performed.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, Error rolling back line %d.</Term>
<ListItem>
<Para>
Error during <Command>ROLLBACK</Command>. <Command>EXEC SQL ROLLBACK</Command> is translated to
an <Command>abort</Command> operation in <ProductName>Postgres</ProductName> and that is the operation that
could not be performed.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>-1, ECPGconnect: could not open database %s.</Term>
<ListItem>
<Para>
The connect to the database did not work.
</Para>
</ListItem>
</VarListEntry>
</VariableList>

</Sect2>

<Sect1>
<Title>Limitations</Title>

<Para>
What will never be included and why or what cannot be done with this
concept.

<VariableList>
<VarListEntry>
<Term>oracles single tasking possibility</Term>
<ListItem>
<Para>
Oracle version 7.0 on AIX 3 uses the OS-supported locks on the shared
memory segments and allows the application designer to link an
application in a so called single tasking way. Instead of starting one
client process per application process both the database part and the
application part is run in the same process. In later versions of oracle
this is no longer supported.

<Para>
This would require a total redesign of the <ProductName>Postgres</ProductName> access model and
that effort can not justify the performance gained.
</Para>
</ListItem>
</VarListEntry>
</VariableList>

<Sect1>
<Title>Porting From Other <Acronym>RDBMS</Acronym> Packages</Title>

<Para>
To be written by persons that knows the different <Acronym>RDBMS</Acronym> packages and that
actually does port something...

<Sect1>
<Title>Installation</Title>

<Para>
Since version 0.5 <Application>ecpg</Application> is distributed together with <ProductName>Postgres</ProductName>. So you
should get your precompiler, libraries and header files compiled and
installed on the fly.

<Sect1>
<Title>For the Developer</Title>

<Para>
This section is for those that wants to develop the <Application>ecpg</Application> interface. It
describes how the things work. The ambition is to make this section
contain things for those that want to have a look inside and the section
on How to use it should be enough for all normal questions.

So, read this before looking at the internals of the <Application>ecpg</Application>. If
you are not interested in how it really works, skip this section.

<Sect2>
<Title>ToDo List</Title>

<Para>
This version the preprocessor has some flaws:

<VariableList>
<VarListEntry>
<Term>Preprocessor output</Term>
<ListItem>
<Para>
The variables should be static.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Preprocessor cannot do syntax checking on your <Acronym>SQL</Acronym> statements</Term>
<ListItem>
<Para>
Whatever you write is copied more or less exactly to the <ProductName>Postgres</ProductName> and
you will not be able to locate your errors until run-time.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>no restriction to strings only</Term>
<ListItem>
<Para>
The PQ interface, and most of all the PQexec function, that is used by
the <Application>ecpg</Application> relies on that the request is built up as a string. In some
cases, like when the data contains the null character, this will be a
serious problem.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>error codes</Term>
<ListItem>
<Para>
There should be different error numbers for the different errors instead
of just -1 for them all.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>library functions</Term>
<ListItem>
<Para>
to_date et al.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>records</Term>
<ListItem>
<Para>
Possibility to define records or structures in the declare section
in a way that the record can be filled from one row in the database.

<Para>
This is a simpler way to handle an entire row at a time.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>array operations</Term>
<ListItem>
<Para>
Oracle has array operations that enhances speed. When implementing it in
<Application>ecpg</Application> it is done for compatibility reasons only. For them to
improve speed would require a lot more insight in the <ProductName>Postgres</ProductName> internal
mechanisms than I possess.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>indicator variables</Term>
<ListItem>
<Para>
Oracle has indicator variables that tell if a value is <Type>null</Type> or if
it is empty. This largely simplifies array operations and provides for a
way to hack around some design flaws in the handling of <Type>VARCHAR2</Type>
(like that an empty string isn't distinguishable from a
<Type>null</Type> value). I am not sure if this is an Oracle extension or part
of the ANSI standard.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>typedefs</Term>
<ListItem>
<Para>
As well as complex types like records and arrays, typedefs would be
a good thing to take care of.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>conversion of scripts</Term>
<ListItem>
<Para>
To set up a database you need a few scripts with table definitions and
other configuration parameters. If you have these scripts for an old
database you would like to just apply them to get a <ProductName>Postgres</ProductName> database
that works in the same way.

<Para>
To set up a database you need a few scripts with table definitions and
The functionality could be accomplished with some conversion scripts.
Speed will never be accomplished in this way. To do this you need a
bigger insight in the database construction and the use of the database
than could be realised in a script.
</Para>
</ListItem>
</VarListEntry>
</VariableList>

<Sect2>
<Title>The Preprocessor</Title>

<Para>
First four lines are written to the output. Two comments and two include
lines necessary for the interface to the library. 

<Para>
Then the preprocessor works in one pass only reading the input file and
writing to the output as it goes along. Normally it just echoes
everything to the output without looking at it further.

<Para>
When it comes to an <Command>EXEC SQL</Command> statements it interviens and
changes them depending on what iit is. The <Command>EXEC SQL</Command> statement can
be one of these:

<VariableList>
<VarListEntry>
<Term>Declare sections</Term>
<ListItem>
<Para>
Declare sections begins with
<ProgramListing>
exec sql begin declare section;
</ProgramListing>
and ends with
<ProgramListing>
exec sql end declare section;
</ProgramListing>
In the section only variable declarations are allowed. Every variable
declare within this section is also entered in a list of variables
indexed on their name together with the corresponding type.

<Para>
The declaration is echoed to the file to make the variable a normal
C-variable also.

<Para>
The special types VARCHAR and VARCHAR2 are converted into a named struct
for every variable. A declaration like:
<ProgramListing>
VARCHAR var[180];
</ProgramListing>
is converted into
<ProgramListing>
struct varchar_var { int len; char arr[180]; } var;
</ProgramListing>
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Include statements</Term>
<ListItem>
<Para>
An include statement looks like:
<ProgramListing>
exec sql include filename;
</ProgramListing>
It is converted into
<ProgramListing>
#include &lt;filename.h&gt;
</ProgramListing>
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Connect statement</Term>
<ListItem>
<Para>
A connect statement looks like:
<ProgramListing>
exec sql connect '<Replaceable>database</Replaceable>';
</ProgramListing>
That statement is converted into
<ProgramListing>
ECPGconnect("<Replaceable>database</Replaceable>");
</ProgramListing>
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Open cursor statement</Term>
<ListItem>
<Para>
An open cursor statement looks like:
<ProgramListing>
exec sql open <Replaceable>cursor</Replaceable>;
</ProgramListing>
and is ignore and not copied from the output.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Commit statement</Term>
<ListItem>
<Para>
A commit statement looks like
<ProgramListing>
exec sql commit;
</ProgramListing>
and is translated on the output to
<ProgramListing>
ECPGcommit(__LINE__);
</ProgramListing>
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Rollback statement</Term>
<ListItem>
<Para>
A rollback statement looks like
<ProgramListing>
exec sql rollback;
</ProgramListing>
and is translated on the output to
<ProgramListing>
ECPGrollback(__LINE__);
</ProgramListing>
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Other statements</Term>
<ListItem>
<Para>
Other <Acronym>SQL</Acronym> statements are other statements that start with 
<Command>exec sql</Command> and ends with <Command>;</Command>. Everything inbetween is treated
as an <Acronym>SQL</Acronym> statement and parsed for variable substitution.

<Para>
Variable substitution occur when a symbol starts with a colon
(<Command>:</Command>). Then a variable with that name is found among the variables
that were previously declared within a declare section and depending on
whether or not the <Acronym>SQL</Acronym> statements knows it to be a variable for input or
output the pointers to the variables are written to the output to allow
for access by the function.

<Para>
For every variable that is part of the <Acronym>SQL</Acronym> request the function gets
another five arguments.

<SimpleList>
<Member>The type as a special symbol</Member>
<Member>A pointer to the value</Member>
<Member>The size of the variable if it is a varchar</Member>
<Member>Number of elements in the array (for array fetches)</Member>
<Member>The offset to the next element in the array (for array fetches)</Member>
</SimpleList>

<Para>
Since the array fetches are not implemented yet the two last arguments
are not really important. They could perhaps have been left out.
</Para>
</ListItem>
</VarListEntry>
</VariableList>

</Sect2>

<Sect2>
<Title>A Complete Example</Title>

<Para>
Here is a complete example describing the output of the preprocessor:
<ProgramListing>
exec sql begin declare section;
int index;
int result;
exec sql end declare section;
...
    exec sql select res into :result from mytable where index = :index;
</ProgramListing>
is translated into:
<ProgramListing>
/* These two include files are added by the preprocessor */
#include &lt;ecpgtype.h&gt;
#include &lt;ecpglib.h&gt;
/* exec sql begin declare section */

 int index;
 int result;
/* exec sql end declare section */

...
    ECPGdo(__LINE__, "select res from mytable where index = ;;", 
           ECPGt_int,&amp;index,0,0,sizeof(int), 
           ECPGt_EOIT, 
           ECPGt_int,&amp;result,0,0,sizeof(int), 
           ECPGt_EORT );
</ProgramListing>
(the indentation in this manual is added for readability and not
something that the preprocessor can do.)

<Sect2>
<Title>The Library</Title>

<Para>
The most important function in the library is the <Function>ECPGdo</Function>
function. It takes a variable amount of arguments. Hopefully we wont run
into machines with limits on the amount of variables that can be
accepted by a varchar function. This could easily add up to 50 or so
arguments.

<Para>
The arguments are:

<VariableList>
<VarListEntry>
<Term>A line number</Term>
<ListItem>
<Para>
This is a line number for the original line used in error messages only.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>A string</Term>
<ListItem>
<Para>
This is the <Acronym>SQL</Acronym> request that is to be issued. This request is modified
by the input variables, i.e. the variables that where not known at
compile time but are to be entered in the request. Where the variables
should go the string contains <Quote>;</Quote>.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Input variables</Term>
<ListItem>
<Para>
As described in the section about the preprocessor every input variable
gets five arguments.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>ECPGt_EOIT</Term>
<ListItem>
<Para>
An enum telling that there are no more input variables.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>Output variables</Term>
<ListItem>
<Para>
As described in the section about the preprocessor every input variable
gets five arguments. These variables are filled by the function.
</Para>
</ListItem>
</VarListEntry>

<VarListEntry>
<Term>ECPGt_EORT</Term>
<ListItem>
<Para>
An enum telling that there are no more variables.
</Para>
</ListItem>
</VarListEntry>
</VariableList>

<Para>
All the <Acronym>SQL</Acronym> statements are performed in one transaction unless you issue
a commit transaction. This works so that the first transaction or the
first after a commit or rollback always begins a transaction.

<Para>
To be completed: entries describing the other entries.

</Chapter>