COPY INTO <table>
COPY
moves data between Databend tables and object storage system(Such as Amazon S3-like) files.
Loads data from staged files into a table, the files must be staged in one of the following locations:
- Named internal stage, files can be staged using the PUT to Stage.
- Named external stage that references an external location (Amazon S3 S3-like object storage system).
- External location (Amazon S3-like object storage system).
Syntax
COPY INTO [<database>.]<table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( TYPE = { CSV | JSON | PARQUET } [ formatTypeOptions ] } ) ]
[ copyOptions ]
Where:
internalStage
internalStage ::= @<internal_stage_name>[/<path>]
externalStage
externalStage ::= @<external_stage_name>[/<path>]
externalLocation (for Amazon S3-like)
externalLocation (for Amazon S3) ::=
's3://<bucket>[/<path>]'
[ { CREDENTIALS = ( { { AWS_KEY_ID = '<string>' AWS_SECRET_KEY = '<string>' } } ) } ]
Parameters | Description | Required |
---|---|---|
s3://<bucket>/[<path>] | Files are in the specified external location (S3-like bucket) | YES |
[ { CREDENTIALS = ( { { AWS_KEY_ID = '<string>' AWS_SECRET_KEY = '<string>' } } ) } ]' ] | The credentials for connecting to AWS and accessing the private/protected S3 bucket where the files to load are staged. | Optional |
[ ENDPOINT_URL = '<endpoint_url>' ] | S3-compatible endpoint URL like MinIO, default is https://s3.amazonaws.com | Optional |
FILES = ( 'file_name' [ , 'file_name' ... ] )
Specifies a list of one or more files names (separated by commas) to be loaded.
PATTERN = 'regex_pattern'
A regular expression pattern string, enclosed in single quotes, specifying the file names to match.
formatTypeOptions
formatTypeOptions ::=
RECORD_DELIMITER = '<character>'
FIELD_DELIMITER = '<character>'
SKIP_HEADER = <integer>
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
RECORD_DELIMITER = '<character>'
Description: One character that separate records in an input file.
Default: '\n'
FIELD_DELIMITER = '<character>'
Description: One character that separate fields in an input file.
Default: ','
(comma)
SKIP_HEADER = '<integer>'
Description: Number of lines at the start of the file to skip.
Default: 0
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Description: String that represents the compression algorithm.
Default: NONE
Values:
Values | Notes |
---|---|
AUTO | Auto detect compression via file extensions |
GZIP | |
BZ2 | |
BROTLI | Must be specified if loading/unloading Brotli-compressed files. |
ZSTD | Zstandard v0.8 (and higher) is supported. |
DEFLATE | Deflate-compressed files (with zlib header, RFC1950). |
RAW_DEFLATE | Deflate-compressed files (without any header, RFC1951). |
NONE | Indicates that the files have not been compressed. |
copyOptions
copyOptions ::=
[ SIZE_LIMIT = <num> ]
Parameters | Description | Required |
---|---|---|
SIZE_LIMIT = <num> | Number (> 0) that specifies the maximum rows of data to be loaded for a given COPY statement. Default 0 | Optional |
Examples
Loading Files from Internal Stage
First, create a named internal stage:
CREATE STAGE my_internal_s1;
Then, PUT a local file to my_internal_s1
stage with PUT to Stage API:
curl -H "stage_name:my_internal_s1" -F "upload=@books.parquet" -XPUT "http://localhost:8081/v1/upload_to_stage"
Final, copy the file into mytable
from the my_internal_s1
named internal stage:
LIST @my_internal_s1;
COPY INTO mytable FROM '@my_internal_s1' pattern = 'books.*parquet' file_format = (type = 'PARQUET');
Loading Files from External Stage
First, create a named external stage:
CREATE STAGE my_external_s1 url = 's3://testbucket/admin/data/' credentials=(aws_key_id='minioadmin' aws_secret_key='minioadmin');
Then, copy the file into mytable
from the my_external_s1
named external stage:
COPY INTO mytable FROM '@my_external_s1' pattern = 'books.*parquet' file_format = (type = 'PARQUET');
Loading Files Directly from External Location
Amazon S3
Try to read 10 rows from csv and insert into the mytable
.
COPY INTO mytable
FROM 's3://mybucket/data.csv'
credentials=(aws_key_id='<AWS_ACCESS_KEY_ID>' aws_secret_key='<AWS_SECRET_ACCESS_KEY>')
FILE_FORMAT = (type = "CSV" field_delimiter = ',' record_delimiter = '\n' skip_header = 1) size_limit=10;
Try to load data from a gzip compressed csv and insert into mytable
COPY INTO mytable
FROM 's3://mybucket/data.csv.gz'
credentials=(aws_key_id='<AWS_ACCESS_KEY_ID>' aws_secret_key='<AWS_SECRET_ACCESS_KEY>')
FILE_FORMAT = (type = "CSV" field_delimiter = ',' record_delimiter = '\n' skip_header = 1 compression = GZIP) size_limit=10;