551 lines
22 KiB
Plaintext
551 lines
22 KiB
Plaintext
<!-- doc/src/sgml/fdwhandler.sgml -->
|
|
|
|
<chapter id="fdwhandler">
|
|
<title>Writing A Foreign Data Wrapper</title>
|
|
|
|
<indexterm zone="fdwhandler">
|
|
<primary>foreign data wrapper</primary>
|
|
<secondary>handler for</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
All operations on a foreign table are handled through its foreign data
|
|
wrapper, which consists of a set of functions that the core server
|
|
calls. The foreign data wrapper is responsible for fetching
|
|
data from the remote data source and returning it to the
|
|
<productname>PostgreSQL</productname> executor. This chapter outlines how
|
|
to write a new foreign data wrapper.
|
|
</para>
|
|
|
|
<para>
|
|
The foreign data wrappers included in the standard distribution are good
|
|
references when trying to write your own. Look into the
|
|
<filename>contrib/file_fdw</> subdirectory of the source tree.
|
|
The <xref linkend="sql-createforeigndatawrapper"> reference page also has
|
|
some useful details.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The SQL standard specifies an interface for writing foreign data wrappers.
|
|
However, PostgreSQL does not implement that API, because the effort to
|
|
accommodate it into PostgreSQL would be large, and the standard API hasn't
|
|
gained wide adoption anyway.
|
|
</para>
|
|
</note>
|
|
|
|
<sect1 id="fdw-functions">
|
|
<title>Foreign Data Wrapper Functions</title>
|
|
|
|
<para>
|
|
The FDW author needs to implement a handler function, and optionally
|
|
a validator function. Both functions must be written in a compiled
|
|
language such as C, using the version-1 interface.
|
|
For details on C language calling conventions and dynamic loading,
|
|
see <xref linkend="xfunc-c">.
|
|
</para>
|
|
|
|
<para>
|
|
The handler function simply returns a struct of function pointers to
|
|
callback functions that will be called by the planner, executor, and
|
|
various maintenance commands.
|
|
Most of the effort in writing an FDW is in implementing these callback
|
|
functions.
|
|
The handler function must be registered with
|
|
<productname>PostgreSQL</productname> as taking no arguments and
|
|
returning the special pseudo-type <type>fdw_handler</type>. The
|
|
callback functions are plain C functions and are not visible or
|
|
callable at the SQL level. The callback functions are described in
|
|
<xref linkend="fdw-callbacks">.
|
|
</para>
|
|
|
|
<para>
|
|
The validator function is responsible for validating options given in
|
|
<command>CREATE</command> and <command>ALTER</command> commands for its
|
|
foreign data wrapper, as well as foreign servers, user mappings, and
|
|
foreign tables using the wrapper.
|
|
The validator function must be registered as taking two arguments, a
|
|
text array containing the options to be validated, and an OID
|
|
representing the type of object the options are associated with (in
|
|
the form of the OID of the system catalog the object would be stored
|
|
in, either
|
|
<literal>ForeignDataWrapperRelationId</>,
|
|
<literal>ForeignServerRelationId</>,
|
|
<literal>UserMappingRelationId</>,
|
|
or <literal>ForeignTableRelationId</>).
|
|
If no validator function is supplied, options are not checked at object
|
|
creation time or object alteration time.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="fdw-callbacks">
|
|
<title>Foreign Data Wrapper Callback Routines</title>
|
|
|
|
<para>
|
|
The FDW handler function returns a palloc'd <structname>FdwRoutine</>
|
|
struct containing pointers to the following callback functions:
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
void
|
|
GetForeignRelSize (PlannerInfo *root,
|
|
RelOptInfo *baserel,
|
|
Oid foreigntableid);
|
|
</programlisting>
|
|
|
|
Obtain relation size estimates for a foreign table. This is called
|
|
at the beginning of planning for a query involving a foreign table.
|
|
<literal>root</> is the planner's global information about the query;
|
|
<literal>baserel</> is the planner's information about this table; and
|
|
<literal>foreigntableid</> is the <structname>pg_class</> OID of the
|
|
foreign table. (<literal>foreigntableid</> could be obtained from the
|
|
planner data structures, but it's passed explicitly to save effort.)
|
|
</para>
|
|
|
|
<para>
|
|
This function should update <literal>baserel->rows</> to be the
|
|
expected number of rows returned by the table scan, after accounting for
|
|
the filtering done by the restriction quals. The initial value of
|
|
<literal>baserel->rows</> is just a constant default estimate, which
|
|
should be replaced if at all possible. The function may also choose to
|
|
update <literal>baserel->width</> if it can compute a better estimate
|
|
of the average result row width.
|
|
</para>
|
|
|
|
<para>
|
|
See <xref linkend="fdw-planning"> for additional information.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
void
|
|
GetForeignPaths (PlannerInfo *root,
|
|
RelOptInfo *baserel,
|
|
Oid foreigntableid);
|
|
</programlisting>
|
|
|
|
Create possible access paths for a scan on a foreign table.
|
|
This is called during query planning.
|
|
The parameters are the same as for <function>GetForeignRelSize</>,
|
|
which has already been called.
|
|
</para>
|
|
|
|
<para>
|
|
This function must generate at least one access path
|
|
(<structname>ForeignPath</> node) for a scan on the foreign table and
|
|
must call <function>add_path</> to add each such path to
|
|
<literal>baserel->pathlist</>. It's recommended to use
|
|
<function>create_foreignscan_path</> to build the
|
|
<structname>ForeignPath</> nodes. The function can generate multiple
|
|
access paths, e.g., a path which has valid <literal>pathkeys</> to
|
|
represent a pre-sorted result. Each access path must contain cost
|
|
estimates, and can contain any FDW-private information that is needed to
|
|
identify the specific scan method intended.
|
|
</para>
|
|
|
|
<para>
|
|
See <xref linkend="fdw-planning"> for additional information.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
ForeignScan *
|
|
GetForeignPlan (PlannerInfo *root,
|
|
RelOptInfo *baserel,
|
|
Oid foreigntableid,
|
|
ForeignPath *best_path,
|
|
List *tlist,
|
|
List *scan_clauses);
|
|
</programlisting>
|
|
|
|
Create a <structname>ForeignScan</> plan node from the selected foreign
|
|
access path. This is called at the end of query planning.
|
|
The parameters are as for <function>GetForeignRelSize</>, plus
|
|
the selected <structname>ForeignPath</> (previously produced by
|
|
<function>GetForeignPaths</>), the target list to be emitted by the
|
|
plan node, and the restriction clauses to be enforced by the plan node.
|
|
</para>
|
|
|
|
<para>
|
|
This function must create and return a <structname>ForeignScan</> plan
|
|
node; it's recommended to use <function>make_foreignscan</> to build the
|
|
<structname>ForeignScan</> node.
|
|
</para>
|
|
|
|
<para>
|
|
See <xref linkend="fdw-planning"> for additional information.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
void
|
|
ExplainForeignScan (ForeignScanState *node,
|
|
ExplainState *es);
|
|
</programlisting>
|
|
|
|
Print additional <command>EXPLAIN</> output for a foreign table scan.
|
|
This can just return if there is no need to print anything.
|
|
Otherwise, it should call <function>ExplainPropertyText</> and
|
|
related functions to add fields to the <command>EXPLAIN</> output.
|
|
The flag fields in <literal>es</> can be used to determine what to
|
|
print, and the state of the <structname>ForeignScanState</> node
|
|
can be inspected to provide runtime statistics in the <command>EXPLAIN
|
|
ANALYZE</> case.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
void
|
|
BeginForeignScan (ForeignScanState *node,
|
|
int eflags);
|
|
</programlisting>
|
|
|
|
Begin executing a foreign scan. This is called during executor startup.
|
|
It should perform any initialization needed before the scan can start,
|
|
but not start executing the actual scan (that should be done upon the
|
|
first call to <function>IterateForeignScan</>).
|
|
The <structname>ForeignScanState</> node has already been created, but
|
|
its <structfield>fdw_state</> field is still NULL. Information about
|
|
the table to scan is accessible through the
|
|
<structname>ForeignScanState</> node (in particular, from the underlying
|
|
<structname>ForeignScan</> plan node, which contains any FDW-private
|
|
information provided by <function>GetForeignPlan</>).
|
|
</para>
|
|
|
|
<para>
|
|
Note that when <literal>(eflags & EXEC_FLAG_EXPLAIN_ONLY)</> is
|
|
true, this function should not perform any externally-visible actions;
|
|
it should only do the minimum required to make the node state valid
|
|
for <function>ExplainForeignScan</> and <function>EndForeignScan</>.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
TupleTableSlot *
|
|
IterateForeignScan (ForeignScanState *node);
|
|
</programlisting>
|
|
|
|
Fetch one row from the foreign source, returning it in a tuple table slot
|
|
(the node's <structfield>ScanTupleSlot</> should be used for this
|
|
purpose). Return NULL if no more rows are available. The tuple table
|
|
slot infrastructure allows either a physical or virtual tuple to be
|
|
returned; in most cases the latter choice is preferable from a
|
|
performance standpoint. Note that this is called in a short-lived memory
|
|
context that will be reset between invocations. Create a memory context
|
|
in <function>BeginForeignScan</> if you need longer-lived storage, or use
|
|
the <structfield>es_query_cxt</> of the node's <structname>EState</>.
|
|
</para>
|
|
|
|
<para>
|
|
The rows returned must match the column signature of the foreign table
|
|
being scanned. If you choose to optimize away fetching columns that
|
|
are not needed, you should insert nulls in those column positions.
|
|
</para>
|
|
|
|
<para>
|
|
Note that <productname>PostgreSQL</productname>'s executor doesn't care
|
|
whether the rows returned violate the <literal>NOT NULL</literal>
|
|
constraints which were defined on the foreign table columns - but the
|
|
planner does care, and may optimize queries incorrectly if
|
|
<literal>NULL</> values are present in a column declared not to contain
|
|
them. If a <literal>NULL</> value is encountered when the user has
|
|
declared that none should be present, it may be appropriate to raise an
|
|
error (just as you would need to do in the case of a data type mismatch).
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
void
|
|
ReScanForeignScan (ForeignScanState *node);
|
|
</programlisting>
|
|
|
|
Restart the scan from the beginning. Note that any parameters the
|
|
scan depends on may have changed value, so the new scan does not
|
|
necessarily return exactly the same rows.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
void
|
|
EndForeignScan (ForeignScanState *node);
|
|
</programlisting>
|
|
|
|
End the scan and release resources. It is normally not important
|
|
to release palloc'd memory, but for example open files and connections
|
|
to remote servers should be cleaned up.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
bool
|
|
AnalyzeForeignTable (Relation relation,
|
|
AcquireSampleRowsFunc *func,
|
|
BlockNumber *totalpages);
|
|
</programlisting>
|
|
|
|
This function is called when <xref linkend="sql-analyze"> is executed on
|
|
a foreign table. If the FDW can collect statistics for this
|
|
foreign table, it should return <literal>true</>, and provide a pointer
|
|
to a function that will collect sample rows from the table in
|
|
<parameter>func</>, plus the estimated size of the table in pages in
|
|
<parameter>totalpages</>. Otherwise, return <literal>false</>.
|
|
If the FDW does not support collecting statistics for any tables, the
|
|
<function>AnalyzeForeignTable</> pointer can be set to <literal>NULL</>.
|
|
</para>
|
|
|
|
<para>
|
|
If provided, the sample collection function must have the signature
|
|
<programlisting>
|
|
int
|
|
AcquireSampleRowsFunc (Relation relation, int elevel,
|
|
HeapTuple *rows, int targrows,
|
|
double *totalrows,
|
|
double *totaldeadrows);
|
|
</programlisting>
|
|
|
|
A random sample of up to <parameter>targrows</> rows should be collected
|
|
from the table and stored into the caller-provided <parameter>rows</>
|
|
array. The actual number of rows collected must be returned. In
|
|
addition, store estimates of the total numbers of live and dead rows in
|
|
the table into the output parameters <parameter>totalrows</> and
|
|
<parameter>totaldeadrows</>. (Set <parameter>totaldeadrows</> to zero
|
|
if the FDW does not have any concept of dead rows.)
|
|
</para>
|
|
|
|
<para>
|
|
The <structname>FdwRoutine</> struct type is declared in
|
|
<filename>src/include/foreign/fdwapi.h</>, which see for additional
|
|
details.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="fdw-helpers">
|
|
<title>Foreign Data Wrapper Helper Functions</title>
|
|
|
|
<para>
|
|
Several helper functions are exported from the core server so that
|
|
authors of foreign data wrappers can get easy access to attributes of
|
|
FDW-related objects, such as FDW options.
|
|
To use any of these functions, you need to include the header file
|
|
<filename>foreign/foreign.h</filename> in your source file.
|
|
That header also defines the struct types that are returned by
|
|
these functions.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
ForeignDataWrapper *
|
|
GetForeignDataWrapper(Oid fdwid);
|
|
</programlisting>
|
|
|
|
This function returns a <structname>ForeignDataWrapper</structname>
|
|
object for the foreign-data wrapper with the given OID. A
|
|
<structname>ForeignDataWrapper</structname> object contains properties
|
|
of the FDW (see <filename>foreign/foreign.h</filename> for details).
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
ForeignServer *
|
|
GetForeignServer(Oid serverid);
|
|
</programlisting>
|
|
|
|
This function returns a <structname>ForeignServer</structname> object
|
|
for the foreign server with the given OID. A
|
|
<structname>ForeignServer</structname> object contains properties
|
|
of the server (see <filename>foreign/foreign.h</filename> for details).
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
UserMapping *
|
|
GetUserMapping(Oid userid, Oid serverid);
|
|
</programlisting>
|
|
|
|
This function returns a <structname>UserMapping</structname> object for
|
|
the user mapping of the given role on the given server. (If there is no
|
|
mapping for the specific user, it will return the mapping for
|
|
<literal>PUBLIC</>, or throw error if there is none.) A
|
|
<structname>UserMapping</structname> object contains properties of the
|
|
user mapping (see <filename>foreign/foreign.h</filename> for details).
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
ForeignTable *
|
|
GetForeignTable(Oid relid);
|
|
</programlisting>
|
|
|
|
This function returns a <structname>ForeignTable</structname> object for
|
|
the foreign table with the given OID. A
|
|
<structname>ForeignTable</structname> object contains properties of the
|
|
foreign table (see <filename>foreign/foreign.h</filename> for details).
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
List *
|
|
GetForeignTableColumnOptions(Oid relid, AttrNumber attnum);
|
|
</programlisting>
|
|
|
|
This function returns the per-column FDW options for the column with the
|
|
given foreign table OID and attribute number, in the form of a list of
|
|
<structname>DefElem</structname>. NIL is returned if the column has no
|
|
options.
|
|
</para>
|
|
|
|
<para>
|
|
Some object types have name-based lookup functions in addition to the
|
|
OID-based ones:
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
ForeignDataWrapper *
|
|
GetForeignDataWrapperByName(const char *name, bool missing_ok);
|
|
</programlisting>
|
|
|
|
This function returns a <structname>ForeignDataWrapper</structname>
|
|
object for the foreign-data wrapper with the given name. If the wrapper
|
|
is not found, return NULL if missing_ok is true, otherwise raise an
|
|
error.
|
|
</para>
|
|
|
|
<para>
|
|
<programlisting>
|
|
ForeignServer *
|
|
GetForeignServerByName(const char *name, bool missing_ok);
|
|
</programlisting>
|
|
|
|
This function returns a <structname>ForeignServer</structname> object
|
|
for the foreign server with the given name. If the server is not found,
|
|
return NULL if missing_ok is true, otherwise raise an error.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="fdw-planning">
|
|
<title>Foreign Data Wrapper Query Planning</title>
|
|
|
|
<para>
|
|
The FDW callback functions <function>GetForeignRelSize</>,
|
|
<function>GetForeignPaths</>, and <function>GetForeignPlan</> must fit
|
|
into the workings of the <productname>PostgreSQL</> planner. Here are
|
|
some notes about what they must do.
|
|
</para>
|
|
|
|
<para>
|
|
The information in <literal>root</> and <literal>baserel</> can be used
|
|
to reduce the amount of information that has to be fetched from the
|
|
foreign table (and therefore reduce the cost).
|
|
<literal>baserel->baserestrictinfo</> is particularly interesting, as
|
|
it contains restriction quals (<literal>WHERE</> clauses) that should be
|
|
used to filter the rows to be fetched. (The FDW itself is not required
|
|
to enforce these quals, as the core executor can check them instead.)
|
|
<literal>baserel->reltargetlist</> can be used to determine which
|
|
columns need to be fetched; but note that it only lists columns that
|
|
have to be emitted by the <structname>ForeignScan</> plan node, not
|
|
columns that are used in qual evaluation but not output by the query.
|
|
</para>
|
|
|
|
<para>
|
|
Various private fields are available for the FDW planning functions to
|
|
keep information in. Generally, whatever you store in FDW private fields
|
|
should be palloc'd, so that it will be reclaimed at the end of planning.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>baserel->fdw_private</> is a <type>void</> pointer that is
|
|
available for FDW planning functions to store information relevant to
|
|
the particular foreign table. The core planner does not touch it except
|
|
to initialize it to NULL when the <literal>baserel</> node is created.
|
|
It is useful for passing information forward from
|
|
<function>GetForeignRelSize</> to <function>GetForeignPaths</> and/or
|
|
<function>GetForeignPaths</> to <function>GetForeignPlan</>, thereby
|
|
avoiding recalculation.
|
|
</para>
|
|
|
|
<para>
|
|
<function>GetForeignPaths</> can identify the meaning of different
|
|
access paths by storing private information in the
|
|
<structfield>fdw_private</> field of <structname>ForeignPath</> nodes.
|
|
<structfield>fdw_private</> is declared as a <type>List</> pointer, but
|
|
could actually contain anything since the core planner does not touch
|
|
it. However, best practice is to use a representation that's dumpable
|
|
by <function>nodeToString</>, for use with debugging support available
|
|
in the backend.
|
|
</para>
|
|
|
|
<para>
|
|
<function>GetForeignPlan</> can examine the <structfield>fdw_private</>
|
|
field of the selected <structname>ForeignPath</> node, and can generate
|
|
<structfield>fdw_exprs</> and <structfield>fdw_private</> lists to be
|
|
placed in the <structname>ForeignScan</> plan node, where they will be
|
|
available at execution time. Both of these lists must be
|
|
represented in a form that <function>copyObject</> knows how to copy.
|
|
The <structfield>fdw_private</> list has no other restrictions and is
|
|
not interpreted by the core backend in any way. The
|
|
<structfield>fdw_exprs</> list, if not NIL, is expected to contain
|
|
expression trees that are intended to be executed at runtime. These
|
|
trees will undergo post-processing by the planner to make them fully
|
|
executable.
|
|
</para>
|
|
|
|
<para>
|
|
In <function>GetForeignPlan</>, generally the passed-in targetlist can
|
|
be copied into the plan node as-is. The passed scan_clauses list
|
|
contains the same clauses as <literal>baserel->baserestrictinfo</>,
|
|
but may be re-ordered for better execution efficiency. In simple cases
|
|
the FDW can just strip <structname>RestrictInfo</> nodes from the
|
|
scan_clauses list (using <function>extract_actual_clauses</>) and put
|
|
all the clauses into the plan node's qual list, which means that all the
|
|
clauses will be checked by the executor at runtime. More complex FDWs
|
|
may be able to check some of the clauses internally, in which case those
|
|
clauses can be removed from the plan node's qual list so that the
|
|
executor doesn't waste time rechecking them.
|
|
</para>
|
|
|
|
<para>
|
|
As an example, the FDW might identify some restriction clauses of the
|
|
form <replaceable>foreign_variable</> <literal>=</>
|
|
<replaceable>sub_expression</>, which it determines can be executed on
|
|
the remote server given the locally-evaluated value of the
|
|
<replaceable>sub_expression</>. The actual identification of such a
|
|
clause should happen during <function>GetForeignPaths</>, since it would
|
|
affect the cost estimate for the path. The path's
|
|
<structfield>fdw_private</> field would probably include a pointer to
|
|
the identified clause's <structname>RestrictInfo</> node. Then
|
|
<function>GetForeignPlan</> would remove that clause from scan_clauses,
|
|
but add the <replaceable>sub_expression</> to <structfield>fdw_exprs</>
|
|
to ensure that it gets massaged into executable form. It would probably
|
|
also put control information into the plan node's
|
|
<structfield>fdw_private</> field to tell the execution functions what
|
|
to do at runtime. The query transmitted to the remote server would
|
|
involve something like <literal>WHERE <replaceable>foreign_variable</> =
|
|
$1</literal>, with the parameter value obtained at runtime from
|
|
evaluation of the <structfield>fdw_exprs</> expression tree.
|
|
</para>
|
|
|
|
<para>
|
|
The FDW should always construct at least one path that depends only on
|
|
the table's restriction clauses. In join queries, it might also choose
|
|
to construct path(s) that depend on join clauses, for example
|
|
<replaceable>foreign_variable</> <literal>=</>
|
|
<replaceable>local_variable</>. Such clauses will not be found in
|
|
<literal>baserel->baserestrictinfo</> but must be sought in the
|
|
relation's join lists. A path using such a clause is called a
|
|
<quote>parameterized path</>. It must show the other relation(s) as
|
|
<literal>required_outer</> and list the specific join clause(s) in
|
|
<literal>param_clauses</>. In <function>GetForeignPlan</>, the
|
|
<replaceable>local_variable</> portion of the join clause would be added
|
|
to <structfield>fdw_exprs</>, and then at runtime the case works the
|
|
same as for an ordinary restriction clause.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|