PL/Python - Python Procedural Language PL/Python Python The PL/Python procedural language allows PostgreSQL functions to be written in the Python language. To install PL/Python in a particular database, use createlang plpythonu dbname (but see also ). If a language is installed into template1, all subsequently created databases will have the language installed automatically. As of PostgreSQL 7.4, PL/Python is only available as an untrusted language, meaning it does not offer any way of restricting what users can do in it. It has therefore been renamed to plpythonu. The trusted variant plpython might become available again in future, if a new secure execution mechanism is developed in Python. The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Only superusers can create functions in untrusted languages such as plpythonu. Users of source packages must specially enable the build of PL/Python during the installation process. (Refer to the installation instructions for more information.) Users of binary packages might find PL/Python in a separate subpackage. Python 2 vs. Python 3 PL/Python supports both the Python 2 and Python 3 language variants. (The PostgreSQL installation instructions might contain more precise information about the exact supported minor versions of Python.) Because the Python 2 and Python 3 language variants are incompatible in some important aspects, the following naming and transitioning scheme is used by PL/Python to avoid mixing them: The PostgreSQL language named plpython2u implements PL/Python based on the Python 2 language variant. The PostgreSQL language named plpython3u implements PL/Python based on the Python 3 language variant. The language named plpythonu implements PL/Python based on the default Python language variant, which is currently Python 2. (This default is independent of what any local Python installations might consider to be their default, for example, what /usr/bin/python might be.) The default will probably be changed to Python 3 in a distant future release of PostgreSQL, depending on the progress of the migration to Python 3 in the Python community. It depends on the build configuration or the installed packages whether PL/Python for Python 2 or Python 3 or both are available. The built variant depends on which Python version was found during the installation or which version was explicitly set using the PYTHON environment variable; see . To make both variants of PL/Python available in one installation, the source tree has to be configured and built twice. This results in the following usage and migration strategy: Existing users and users who are currently not interested in Python 3 use the language name plpythonu and don't have to change anything for the foreseeable future. It is recommended to gradually future-proof the code via migration to Python 2.6/2.7 to simplify the eventual migration to Python 3. In practice, many PL/Python functions will migrate to Python 3 with few or no changes. Users who know that they have heavily Python 2 dependent code and don't plan to ever change it can make use of the plpython2u language name. This will continue to work into the very distant future, until Python 2 support might be completely dropped by PostgreSQL. Users who want to dive into Python 3 can use the plpython3u language name, which will keep working forever by today's standards. In the distant future, when Python 3 might become the default, they might like to remove the 3 for aesthetic reasons. Daredevils, who want to build a Python-3-only operating system environment, can change the build scripts to make plpythonu be equivalent to plpython3u, keeping in mind that this would make their installation incompatible with most of the rest of the world. See also the document What's New In Python 3.0 for more information about porting to Python 3. It is not allowed to use PL/Python based on Python 2 and PL/Python based on Python 3 in the same session, because the symbols in the dynamic modules would clash, which could result in crashes of the PostgreSQL server process. There is a check that prevents mixing Python major versions in a session, which will abort the session if a mismatch is detected. It is possible, however, to use both PL/Python variants in the same database, from separate sessions. PL/Python Functions Functions in PL/Python are declared via the standard syntax: CREATE FUNCTION funcname (argument-list) RETURNS return-type AS $$ # PL/Python function body $$ LANGUAGE plpythonu; The body of a function is simply a Python script. When the function is called, its arguments are passed as elements of the list args; named arguments are also passed as ordinary variables to the Python script. Use of named arguments is usually more readable. The result is returned from the Python code in the usual way, with return or yield (in case of a result-set statement). If you do not provide a return value, Python returns the default None. PL/Python translates Python's None into the SQL null value. For example, a function to return the greater of two integers can be defined as: CREATE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if a > b: return a return b $$ LANGUAGE plpythonu; The Python code that is given as the body of the function definition is transformed into a Python function. For example, the above results in: def __plpython_procedure_pymax_23456(): if a > b: return a return b assuming that 23456 is the OID assigned to the function by PostgreSQL. The arguments are set as global variables. Because of the scoping rules of Python, this has the subtle consequence that an argument variable cannot be reassigned inside the function to the value of an expression that involves the variable name itself, unless the variable is redeclared as global in the block. For example, the following won't work: CREATE FUNCTION pystrip(x text) RETURNS text AS $$ x = x.strip() # error return x $$ LANGUAGE plpythonu; because assigning to x makes x a local variable for the entire block, and so the x on the right-hand side of the assignment refers to a not-yet-assigned local variable x, not the PL/Python function parameter. Using the global statement, this can be made to work: CREATE FUNCTION pystrip(x text) RETURNS text AS $$ global x x = x.strip() # ok now return x $$ LANGUAGE plpythonu; But it is advisable not to rely on this implementation detail of PL/Python. It is better to treat the function parameters as read-only. Data Values Generally speaking, the aim of PL/Python is to provide a natural mapping between the PostgreSQL and the Python worlds. This informs the data mapping rules described below. Data Type Mapping Function arguments are converted from their PostgreSQL type to a corresponding Python type: PostgreSQL boolean is converted to Python bool. PostgreSQL smallint and int are converted to Python int. PostgreSQL bigint is converted to long in Python 2 and to int in Python 3. PostgreSQL real, double, and numeric are converted to Python float. Note that for the numeric this loses information and can lead to incorrect results. This might be fixed in a future release. PostgreSQL bytea is converted to Python str in Python 2 and to bytes in Python 3. In Python 2, the string should be treated as a byte sequence without any character encoding. All other data types, including the PostgreSQL character string types, are converted to a Python str. In Python 2, this string will be in the PostgreSQL server encoding; in Python 3, it will be a Unicode string like all strings. For nonscalar data types, see below. Function return values are converted to the declared PostgreSQL return data type as follows: When the PostgreSQL return type is boolean, the return value will be evaluated for truth according to the Python rules. That is, 0 and empty string are false, but notably 'f' is true. When the PostgreSQL return type is bytea, the return value will be converted to a string (Python 2) or bytes (Python 3) using the respective Python builtins, with the result being converted bytea. For all other PostgreSQL return types, the returned Python value is converted to a string using the Python builtin str, and the result is passed to the input function of the PostgreSQL data type. Strings in Python 2 are required to be in the PostgreSQL server encoding when they are passed to PostgreSQL. Strings that are not valid in the current server encoding will raise an error, but not all encoding mismatches can be detected, so garbage data can still result when this is not done correctly. Unicode strings are converted to the correct encoding automatically, so it can be safer and more convenient to use those. In Python 3, all strings are Unicode strings. For nonscalar data types, see below. Note that logical mismatches between the declared PostgreSQL return type and the Python data type of the actual return object are not flagged; the value will be converted in any case. PL/Python functions cannot return either type RECORD or SETOF RECORD. A workaround is to write a PL/pgSQL function that creates a temporary table, have it call the PL/Python function to fill the table, and then have the PL/pgSQL function return the generic RECORD from the temporary table. Null, None If an SQL null valuenull valuePL/Python is passed to a function, the argument value will appear as None in Python. The above function definition will return the wrong answer for null inputs. We could add STRICT to the function definition to make PostgreSQL do something more reasonable: if a null value is passed, the function will not be called at all, but will just return a null result automatically. Alternatively, we could check for null inputs in the function body: CREATE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if (a is None) or (b is None): return None if a > b: return a return b $$ LANGUAGE plpythonu; As shown above, to return an SQL null value from a PL/Python function, return the value None. This can be done whether the function is strict or not. Arrays, Lists SQL array values are passed into PL/Python as a Python list. To return an SQL array value out of a PL/Python function, return a Python sequence, for example a list or tuple: CREATE FUNCTION return_arr() RETURNS int[] AS $$ return (1, 2, 3, 4, 5) $$ LANGUAGE plpythonu; SELECT return_arr(); return_arr ------------- {1,2,3,4,5} (1 row) Note that in Python, strings are sequences, which can have undesirable effects that might be familiar to Python programmers: CREATE FUNCTION return_str_arr() RETURNS varchar[] AS $$ return "hello" $$ LANGUAGE plpythonu; SELECT return_str_arr(); return_str_arr ---------------- {h,e,l,l,o} (1 row) Composite Types Composite-type arguments are passed to the function as Python mappings. The element names of the mapping are the attribute names of the composite type. If an attribute in the passed row has the null value, it has the value None in the mapping. Here is an example: CREATE TABLE employee ( name text, salary integer, age integer ); CREATE FUNCTION overpaid (e employee) RETURNS boolean AS $$ if e["salary"] > 200000: return True if (e["age"] < 30) and (e["salary"] > 100000): return True return False $$ LANGUAGE plpythonu; There are multiple ways to return row or composite types from a Python function. The following examples assume we have: CREATE TYPE named_value AS ( name text, value integer ); A composite result can be returned as a: Sequence type (a tuple or list, but not a set because it is not indexable) Returned sequence objects must have the same number of items as the composite result type has fields. The item with index 0 is assigned to the first field of the composite type, 1 to the second and so on. For example: CREATE FUNCTION make_pair (name text, value integer) RETURNS named_value AS $$ return [ name, value ] # or alternatively, as tuple: return ( name, value ) $$ LANGUAGE plpythonu; To return a SQL null for any column, insert None at the corresponding position. Mapping (dictionary) The value for each result type column is retrieved from the mapping with the column name as key. Example: CREATE FUNCTION make_pair (name text, value integer) RETURNS named_value AS $$ return { "name": name, "value": value } $$ LANGUAGE plpythonu; Any extra dictionary key/value pairs are ignored. Missing keys are treated as errors. To return a SQL null value for any column, insert None with the corresponding column name as the key. Object (any object providing method __getattr__) This works the same as a mapping. Example: CREATE FUNCTION make_pair (name text, value integer) RETURNS named_value AS $$ class named_value: def __init__ (self, n, v): self.name = n self.value = v return named_value(name, value) # or simply class nv: pass nv.name = name nv.value = value return nv $$ LANGUAGE plpythonu; Set-Returning Functions A PL/Python function can also return sets of scalar or composite types. There are several ways to achieve this because the returned object is internally turned into an iterator. The following examples assume we have composite type: CREATE TYPE greeting AS ( how text, who text ); A set result can be returned from a: Sequence type (tuple, list, set) CREATE FUNCTION greet (how text) RETURNS SETOF greeting AS $$ # return tuple containing lists as composite types # all other combinations work also return ( [ how, "World" ], [ how, "PostgreSQL" ], [ how, "PL/Python" ] ) $$ LANGUAGE plpythonu; Iterator (any object providing __iter__ and next methods) CREATE FUNCTION greet (how text) RETURNS SETOF greeting AS $$ class producer: def __init__ (self, how, who): self.how = how self.who = who self.ndx = -1 def __iter__ (self): return self def next (self): self.ndx += 1 if self.ndx == len(self.who): raise StopIteration return ( self.how, self.who[self.ndx] ) return producer(how, [ "World", "PostgreSQL", "PL/Python" ]) $$ LANGUAGE plpythonu; Generator (yield) CREATE FUNCTION greet (how text) RETURNS SETOF greeting AS $$ for who in [ "World", "PostgreSQL", "PL/Python" ]: yield ( how, who ) $$ LANGUAGE plpythonu; Due to Python bug #1483133, some debug versions of Python 2.4 (configured and compiled with option --with-pydebug) are known to crash the PostgreSQL server when using an iterator to return a set result. Unpatched versions of Fedora 4 contain this bug. It does not happen in production versions of Python or on patched versions of Fedora 4. Sharing Data The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, available to all Python functions within a session. Use with care.global datain PL/Python Each function gets its own execution environment in the Python interpreter, so that global data and function arguments from myfunc are not available to myfunc2. The exception is the data in the GD dictionary, as mentioned above. Anonymous Code Blocks PL/Python also supports anonymous code blocks called with the statement: DO $$ # PL/Python code $$ LANGUAGE plpythonu; An anonymous code block receives no arguments, and whatever value it might return is discarded. Otherwise it behaves just like a function. Trigger Functions trigger in PL/Python When a function is used as a trigger, the dictionary TD contains trigger-related values: TD["event"] contains the event as a string: INSERT, UPDATE, DELETE, or TRUNCATE. TD["when"] contains one of BEFORE, AFTER, or INSTEAD OF. TD["level"] contains ROW or STATEMENT. TD["new"] TD["old"] For a row-level trigger, one or both of these fields contain the respective trigger rows, depending on the trigger event. TD["name"] contains the trigger name. TD["table_name"] contains the name of the table on which the trigger occurred. TD["table_schema"] contains the schema of the table on which the trigger occurred. TD["relid"] contains the OID of the table on which the trigger occurred. TD["args"] If the CREATE TRIGGER command included arguments, they are available in TD["args"][0] to TD["args"][n-1]. If TD["when"] is BEFORE or INSTEAD OF and TD["level"] is ROW, you can return None or "OK" from the Python function to indicate the row is unmodified, "SKIP" to abort the event, or if TD["event"] is INSERT or UPDATE you can return "MODIFY" to indicate you've modified the new row. Otherwise the return value is ignored. Database Access The PL/Python language module automatically imports a Python module called plpy. The functions and constants in this module are available to you in the Python code as plpy.foo. The plpy module provides two functions called execute and prepare. Calling plpy.execute with a query string and an optional limit argument causes that query to be run and the result to be returned in a result object. The result object emulates a list or dictionary object. The result object can be accessed by row number and column name. It has these additional methods: nrows which returns the number of rows returned by the query, and status which is the SPI_execute() return value. The result object can be modified. For example: rv = plpy.execute("SELECT * FROM my_table", 5) returns up to 5 rows from my_table. If my_table has a column my_column, it would be accessed as: foo = rv[i]["my_column"] preparing a queryin PL/Python The second function, plpy.prepare, prepares the execution plan for a query. It is called with a query string and a list of parameter types, if you have parameter references in the query. For example: plan = plpy.prepare("SELECT last_name FROM my_users WHERE first_name = $1", [ "text" ]) text is the type of the variable you will be passing for $1. After preparing a statement, you use the function plpy.execute to run it: rv = plpy.execute(plan, [ "name" ], 5) The third argument is the limit and is optional. Query parameters and result row fields are converted between PostgreSQL and Python data types as described in . The exception is that composite types are currently not supported: They will be rejected as query parameters and are converted to strings when appearing in a query result. As a workaround for the latter problem, the query can sometimes be rewritten so that the composite type result appears as a result row rather than as a field of the result row. Alternatively, the resulting string could be parsed apart by hand, but this approach is not recommended because it is not future-proof. When you prepare a plan using the PL/Python module it is automatically saved. Read the SPI documentation () for a description of what this means. In order to make effective use of this across function calls one needs to use one of the persistent storage dictionaries SD or GD (see ). For example: CREATE FUNCTION usesavedplan() RETURNS trigger AS $$ if SD.has_key("plan"): plan = SD["plan"] else: plan = plpy.prepare("SELECT 1") SD["plan"] = plan # rest of function $$ LANGUAGE plpythonu; Utility Functions The plpy module also provides the functions plpy.debug(msg), plpy.log(msg), plpy.info(msg), plpy.notice(msg), plpy.warning(msg), plpy.error(msg), and plpy.fatal(msg).elogin PL/Python plpy.error and plpy.fatal actually raise a Python exception which, if uncaught, propagates out to the calling query, causing the current transaction or subtransaction to be aborted. raise plpy.ERROR(msg) and raise plpy.FATAL(msg) are equivalent to calling plpy.error and plpy.fatal, respectively. The other functions only generate messages of different priority levels. Whether messages of a particular priority are reported to the client, written to the server log, or both is controlled by the and configuration variables. See for more information. Environment Variables Some of the environment variables that are accepted by the Python interpreter can also be used to affect PL/Python behavior. They would need to be set in the environment of the main PostgreSQL server process, for example in a start script. The available environment variables depend on the version of Python; see the Python documentation for details. At the time of this writing, the following environment variables have an affect on PL/Python, assuming an adequate Python version: PYTHONHOME PYTHONPATH PYTHONY2K PYTHONOPTIMIZE PYTHONDEBUG PYTHONVERBOSE PYTHONCASEOK PYTHONDONTWRITEBYTECODE PYTHONIOENCODING PYTHONUSERBASE (It appears to be a Python implementation detail beyond the control of PL/Python that some of the environment variables listed on the python man page are only effective in a command-line interpreter and not an embedded Python interpreter.)