Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* copyfrom_internal.h
|
|
|
|
* Internal definitions for COPY FROM command.
|
|
|
|
*
|
|
|
|
*
|
2022-01-08 01:04:57 +01:00
|
|
|
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* src/include/commands/copyfrom_internal.h
|
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#ifndef COPYFROM_INTERNAL_H
|
|
|
|
#define COPYFROM_INTERNAL_H
|
|
|
|
|
|
|
|
#include "commands/copy.h"
|
|
|
|
#include "commands/trigger.h"
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Represents the different source cases we need to worry about at
|
|
|
|
* the bottom level
|
|
|
|
*/
|
|
|
|
typedef enum CopySource
|
|
|
|
{
|
|
|
|
COPY_FILE, /* from file (or a piped program) */
|
2021-03-04 09:45:55 +01:00
|
|
|
COPY_FRONTEND, /* from frontend */
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
COPY_CALLBACK /* from callback function */
|
|
|
|
} CopySource;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Represents the end-of-line terminator type of the input
|
|
|
|
*/
|
|
|
|
typedef enum EolType
|
|
|
|
{
|
|
|
|
EOL_UNKNOWN,
|
|
|
|
EOL_NL,
|
|
|
|
EOL_CR,
|
|
|
|
EOL_CRNL
|
|
|
|
} EolType;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Represents the heap insert method to be used during COPY FROM.
|
|
|
|
*/
|
|
|
|
typedef enum CopyInsertMethod
|
|
|
|
{
|
|
|
|
CIM_SINGLE, /* use table_tuple_insert or fdw routine */
|
|
|
|
CIM_MULTI, /* always use table_multi_insert */
|
|
|
|
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
|
|
|
|
} CopyInsertMethod;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This struct contains all the state variables used throughout a COPY FROM
|
|
|
|
* operation.
|
|
|
|
*/
|
|
|
|
typedef struct CopyFromStateData
|
|
|
|
{
|
|
|
|
/* low-level state data */
|
|
|
|
CopySource copy_src; /* type of copy source */
|
|
|
|
FILE *copy_file; /* used if copy_src == COPY_FILE */
|
|
|
|
StringInfo fe_msgbuf; /* used if copy_src == COPY_NEW_FE */
|
|
|
|
|
|
|
|
EolType eol_type; /* EOL type of input */
|
|
|
|
int file_encoding; /* file or remote side's character encoding */
|
|
|
|
bool need_transcoding; /* file encoding diff from server? */
|
2021-04-01 11:23:40 +02:00
|
|
|
Oid conversion_proc; /* encoding conversion function */
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
|
|
|
|
/* parameters from the COPY command */
|
|
|
|
Relation rel; /* relation to copy from */
|
|
|
|
List *attnumlist; /* integer list of attnums to copy */
|
|
|
|
char *filename; /* filename, or NULL for STDIN */
|
|
|
|
bool is_program; /* is 'filename' a program to popen? */
|
|
|
|
copy_data_source_cb data_source_cb; /* function for reading data */
|
|
|
|
|
|
|
|
CopyFormatOptions opts;
|
|
|
|
bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
|
|
|
|
Node *whereClause; /* WHERE condition (or NULL) */
|
|
|
|
|
|
|
|
/* these are just for error messages, see CopyFromErrorCallback */
|
|
|
|
const char *cur_relname; /* table name for error messages */
|
|
|
|
uint64 cur_lineno; /* line number for error messages */
|
|
|
|
const char *cur_attname; /* current att for error messages */
|
|
|
|
const char *cur_attval; /* current att value for error messages */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Working state
|
|
|
|
*/
|
|
|
|
MemoryContext copycontext; /* per-copy execution context */
|
|
|
|
|
|
|
|
AttrNumber num_defaults;
|
|
|
|
FmgrInfo *in_functions; /* array of input functions for each attrs */
|
|
|
|
Oid *typioparams; /* array of element types for in_functions */
|
|
|
|
int *defmap; /* array of default att numbers */
|
|
|
|
ExprState **defexprs; /* array of default att expressions */
|
|
|
|
bool volatile_defexprs; /* is any of defexprs volatile? */
|
|
|
|
List *range_table;
|
|
|
|
ExprState *qualexpr;
|
|
|
|
|
|
|
|
TransitionCaptureState *transition_capture;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* These variables are used to reduce overhead in COPY FROM.
|
|
|
|
*
|
|
|
|
* attribute_buf holds the separated, de-escaped text for each field of
|
|
|
|
* the current line. The CopyReadAttributes functions return arrays of
|
|
|
|
* pointers into this buffer. We avoid palloc/pfree overhead by re-using
|
|
|
|
* the buffer on each cycle.
|
|
|
|
*
|
|
|
|
* In binary COPY FROM, attribute_buf holds the binary data for the
|
|
|
|
* current field, but the usage is otherwise similar.
|
|
|
|
*/
|
|
|
|
StringInfoData attribute_buf;
|
|
|
|
|
|
|
|
/* field raw data pointers found by COPY FROM */
|
|
|
|
|
|
|
|
int max_fields;
|
|
|
|
char **raw_fields;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Similarly, line_buf holds the whole input line being processed. The
|
2021-04-01 11:23:40 +02:00
|
|
|
* input cycle is first to read the whole line into line_buf, and then
|
|
|
|
* extract the individual attribute fields into attribute_buf. line_buf
|
|
|
|
* is preserved unmodified so that we can display it in error messages if
|
|
|
|
* appropriate. (In binary mode, line_buf is not used.)
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
*/
|
|
|
|
StringInfoData line_buf;
|
|
|
|
bool line_buf_valid; /* contains the row being processed? */
|
|
|
|
|
|
|
|
/*
|
2021-04-01 11:23:40 +02:00
|
|
|
* input_buf holds input data, already converted to database encoding.
|
|
|
|
*
|
|
|
|
* In text mode, CopyReadLine parses this data sufficiently to locate line
|
|
|
|
* boundaries, then transfers the data to line_buf. We guarantee that
|
|
|
|
* there is a \0 at input_buf[input_buf_len] at all times. (In binary
|
|
|
|
* mode, input_buf is not used.)
|
|
|
|
*
|
|
|
|
* If encoding conversion is not required, input_buf is not a separate
|
|
|
|
* buffer but points directly to raw_buf. In that case, input_buf_len
|
|
|
|
* tracks the number of bytes that have been verified as valid in the
|
|
|
|
* database encoding, and raw_buf_len is the total number of bytes stored
|
|
|
|
* in the buffer.
|
|
|
|
*/
|
|
|
|
#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */
|
|
|
|
char *input_buf;
|
|
|
|
int input_buf_index; /* next byte to process */
|
|
|
|
int input_buf_len; /* total # of bytes stored */
|
|
|
|
bool input_reached_eof; /* true if we reached EOF */
|
|
|
|
bool input_reached_error; /* true if a conversion error happened */
|
|
|
|
/* Shorthand for number of unconsumed bytes available in input_buf */
|
|
|
|
#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* raw_buf holds raw input data read from the data source (file or client
|
|
|
|
* connection), not yet converted to the database encoding. Like with
|
|
|
|
* 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
*/
|
|
|
|
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
|
|
|
|
char *raw_buf;
|
|
|
|
int raw_buf_index; /* next byte to process */
|
|
|
|
int raw_buf_len; /* total # of bytes stored */
|
2021-04-01 11:23:40 +02:00
|
|
|
bool raw_reached_eof; /* true if we reached EOF */
|
|
|
|
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
/* Shorthand for number of unconsumed bytes available in raw_buf */
|
|
|
|
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
|
2021-04-01 11:23:40 +02:00
|
|
|
|
|
|
|
uint64 bytes_processed; /* number of bytes processed so far */
|
Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 09:50:50 +01:00
|
|
|
} CopyFromStateData;
|
|
|
|
|
|
|
|
extern void ReceiveCopyBegin(CopyFromState cstate);
|
|
|
|
extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
|
|
|
|
|
|
|
|
#endif /* COPYFROM_INTERNAL_H */
|