2015-05-16 03:44:53 +02:00
|
|
|
<!-- doc/src/sgml/tsm-system-rows.sgml -->
|
|
|
|
|
|
|
|
<sect1 id="tsm-system-rows" xreflabel="tsm_system_rows">
|
|
|
|
<title>tsm_system_rows</title>
|
|
|
|
|
|
|
|
<indexterm zone="tsm-system-rows">
|
|
|
|
<primary>tsm_system_rows</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
|
|
|
<para>
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
The <filename>tsm_system_rows</> module provides the table sampling method
|
|
|
|
<literal>SYSTEM_ROWS</literal>, which can be used in
|
|
|
|
the <literal>TABLESAMPLE</> clause of a <xref linkend="sql-select">
|
|
|
|
command.
|
2015-05-16 03:44:53 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
This table sampling method accepts a single integer argument that is the
|
|
|
|
maximum number of rows to read. The resulting sample will always contain
|
|
|
|
exactly that many rows, unless the table does not contain enough rows, in
|
|
|
|
which case the whole table is selected.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Like the built-in <literal>SYSTEM</literal> sampling
|
|
|
|
method, <literal>SYSTEM_ROWS</literal> performs block-level sampling, so
|
|
|
|
that the sample is not completely random but may be subject to clustering
|
|
|
|
effects, especially if only a small number of rows are requested.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<literal>SYSTEM_ROWS</literal> does not support
|
|
|
|
the <literal>REPEATABLE</literal> clause.
|
2015-05-16 03:44:53 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Examples</title>
|
|
|
|
|
|
|
|
<para>
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
Here is an example of selecting a sample of a table with
|
|
|
|
<literal>SYSTEM_ROWS</>. First install the extension:
|
2015-05-16 03:44:53 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
CREATE EXTENSION tsm_system_rows;
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<para>
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
Then you can use it in a <command>SELECT</command> command, for instance:
|
2015-05-16 03:44:53 +02:00
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM my_table TABLESAMPLE SYSTEM_ROWS(100);
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 20:39:00 +02:00
|
|
|
This command will return a sample of 100 rows from the
|
|
|
|
table <structname>my_table</> (unless the table does not have 100
|
|
|
|
visible rows, in which case all its rows are returned).
|
2015-05-16 03:44:53 +02:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
</sect1>
|