A table-valued function (TVF) that lists the data files that make up a DuckLake table at a given snapshot, along with per-file metadata such as record count, file size, and per-column min/max statistics. Unlike READ_DUCKLAKE, it does not read the row data itself — it returns one row per Parquet data file.
LIST_DUCKLAKE_FILES is useful for inspecting a table’s physical layout, debugging pruning and partitioning, and understanding what Firebolt would scan for a query.
DuckLake support is experimental and disabled by default. Set enable_ducklake = true to use it, and expect the behavior to change. Only DuckLake catalogs hosted on PostgreSQL are supported.
Topics:
To read the actual row data of a DuckLake table, use READ_DUCKLAKE.
Syntax
-- Using a LOCATION object (catalog connection string and credentials stored in LOCATION)
LIST_DUCKLAKE_FILES (
LOCATION => '<location_name>'
[, SCHEMA => '<schema_name>' ]
[, TABLE => '<table_name>' ]
[, SNAPSHOT_ID => <snapshot_id> ]
)
-- Using individual TVF parameters
LIST_DUCKLAKE_FILES (
CATALOG => '<postgres_connection_string>',
SCHEMA => '<schema_name>',
TABLE => '<table_name>'
[, SNAPSHOT_ID => <snapshot_id> ]
[, S3_ENDPOINT => '<s3_endpoint>' ]
-- Parameters for Amazon S3
[, AWS_ACCESS_KEY_ID => '<aws_access_key_id>' ]
[, AWS_SECRET_ACCESS_KEY => '<aws_secret_access_key>' ]
[, AWS_SESSION_TOKEN => '<aws_session_token>' ]
[, AWS_ROLE_ARN => '<aws_role_arn>' ]
[, AWS_ROLE_EXTERNAL_ID => '<aws_role_external_id>' ]
)
The parameters are identical to those of READ_DUCKLAKE.
Parameters
Using a LOCATION object
| Parameter | Description | Supported input types |
|---|
LOCATION | The name of a location object that stores the DuckLake catalog connection string, the optional S3 endpoint, the optional storage credentials, and an optional default schema and table. LOCATION must be specified as a string literal. | TEXT |
SCHEMA | The catalog schema the table resides in. Optional. If the location object already defines a schema, the two must match. If neither specifies a schema, main is used. | TEXT |
TABLE | The name of the DuckLake table. Optional only if the location object already defines a table; otherwise required. | TEXT |
SNAPSHOT_ID | The DuckLake snapshot to inspect. Optional. The default is the latest snapshot in the catalog. | BIGINT |
Using individual TVF parameters
| Parameter | Description | Supported input types |
|---|
CATALOG | A PostgreSQL connection string pointing to the DuckLake catalog database, for example postgresql://user:password@host:5432/dbname. Required. | TEXT |
SCHEMA | The catalog schema the table resides in, for example main. Required for this call shape. | TEXT |
TABLE | The name of the DuckLake table. Required for this call shape. | TEXT |
SNAPSHOT_ID | The DuckLake snapshot to inspect. Optional. The default is the latest snapshot in the catalog. | BIGINT |
S3_ENDPOINT | The S3-compatible endpoint that serves the data files. Optional. Omit it when the data files live on a local filesystem. | TEXT |
Amazon S3 parameters
Use these parameters to authenticate to the object storage that holds the Parquet data files. They apply only to the individual-parameter call shape.
| Parameter | Description | Supported input types |
|---|
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
Return type
The result is a table with one row per Parquet data file in the requested snapshot. The leading columns are fixed; the trailing columns depend on the table’s schema and partitioning.
Fixed columns:
| Column | Type | Description |
|---|
file_path | TEXT | Full path to the data file. |
file_format | TEXT | File format of the data file (for example, parquet). |
record_count | BIGINT | Number of records in the file. |
file_size | BIGINT | File size in bytes. |
file_name | TEXT | The file name component of the path. |
file_bucket | TEXT | The storage bucket the file resides in. |
file_storage_api | TEXT | The storage API used to access the file. |
deleted_positions | BYTEA | A serialized roaring bitmap of row positions deleted from the file. |
Per-column statistics columns:
For every column in the table, two nullable columns are appended, holding the per-file minimum and maximum value:
min_<column_name>
max_<column_name>
Partition value columns:
For every partition column, a partition_value_<column_name>_<field_id> column holds the file’s partition value.
Enabling DuckLake
DuckLake support is disabled by default. Enable it for your session before you query DuckLake tables:
SET enable_ducklake = true;
LIST_DUCKLAKE_FILES runs on a user engine. It is not supported on the system engine.
Examples
Listing files using a LOCATION
SET enable_ducklake = true;
SELECT *
FROM LIST_DUCKLAKE_FILES(
LOCATION => 'my_ducklake_loc',
SCHEMA => 'main',
TABLE => 'my_first_ducklake_table'
);
Inspecting file sizes and record counts
SELECT file_name, record_count, file_size
FROM LIST_DUCKLAKE_FILES(
LOCATION => 'my_ducklake_loc',
TABLE => 'my_first_ducklake_table'
)
ORDER BY file_size DESC;
Listing files using individual parameters
SELECT file_path, record_count
FROM LIST_DUCKLAKE_FILES(
CATALOG => 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db',
SCHEMA => 'main',
TABLE => 'my_first_ducklake_table'
);
Limitations
- DuckLake support is disabled by default. Set
enable_ducklake = true to use it.
LIST_DUCKLAKE_FILES runs on a user engine; the system engine is not supported.
- Only PostgreSQL DuckLake catalogs are supported. The data files must be Parquet.
- The table’s column types must be readable by Firebolt. See Supported data types.