Skip to main content
A table-valued function (TVF) that lists the data files that make up a DuckLake table at a given snapshot, along with per-file metadata such as record count, file size, and per-column min/max statistics. Unlike READ_DUCKLAKE, it does not read the row data itself — it returns one row per Parquet data file. LIST_DUCKLAKE_FILES is useful for inspecting a table’s physical layout, debugging pruning and partitioning, and understanding what Firebolt would scan for a query.
DuckLake support is experimental and disabled by default. Set enable_ducklake = true to use it, and expect the behavior to change. Only DuckLake catalogs hosted on PostgreSQL are supported.
Topics: To read the actual row data of a DuckLake table, use READ_DUCKLAKE.

Syntax

-- Using a LOCATION object (catalog connection string and credentials stored in LOCATION)
LIST_DUCKLAKE_FILES (
  LOCATION => '<location_name>'
  [, SCHEMA => '<schema_name>' ]
  [, TABLE => '<table_name>' ]
  [, SNAPSHOT_ID => <snapshot_id> ]
)

-- Using individual TVF parameters
LIST_DUCKLAKE_FILES (
  CATALOG => '<postgres_connection_string>',
  SCHEMA => '<schema_name>',
  TABLE => '<table_name>'
  [, SNAPSHOT_ID => <snapshot_id> ]
  [, S3_ENDPOINT => '<s3_endpoint>' ]
  -- Parameters for Amazon S3
  [, AWS_ACCESS_KEY_ID => '<aws_access_key_id>' ]
  [, AWS_SECRET_ACCESS_KEY => '<aws_secret_access_key>' ]
  [, AWS_SESSION_TOKEN => '<aws_session_token>' ]
  [, AWS_ROLE_ARN => '<aws_role_arn>' ]
  [, AWS_ROLE_EXTERNAL_ID => '<aws_role_external_id>' ]
)
The parameters are identical to those of READ_DUCKLAKE.

Parameters

Using a LOCATION object

ParameterDescriptionSupported input types
LOCATIONThe name of a location object that stores the DuckLake catalog connection string, the optional S3 endpoint, the optional storage credentials, and an optional default schema and table. LOCATION must be specified as a string literal.TEXT
SCHEMAThe catalog schema the table resides in. Optional. If the location object already defines a schema, the two must match. If neither specifies a schema, main is used.TEXT
TABLEThe name of the DuckLake table. Optional only if the location object already defines a table; otherwise required.TEXT
SNAPSHOT_IDThe DuckLake snapshot to inspect. Optional. The default is the latest snapshot in the catalog.BIGINT

Using individual TVF parameters

ParameterDescriptionSupported input types
CATALOGA PostgreSQL connection string pointing to the DuckLake catalog database, for example postgresql://user:password@host:5432/dbname. Required.TEXT
SCHEMAThe catalog schema the table resides in, for example main. Required for this call shape.TEXT
TABLEThe name of the DuckLake table. Required for this call shape.TEXT
SNAPSHOT_IDThe DuckLake snapshot to inspect. Optional. The default is the latest snapshot in the catalog.BIGINT
S3_ENDPOINTThe S3-compatible endpoint that serves the data files. Optional. Omit it when the data files live on a local filesystem.TEXT

Amazon S3 parameters

Use these parameters to authenticate to the object storage that holds the Parquet data files. They apply only to the individual-parameter call shape.
ParameterDescriptionSupported input types
AWS_ACCESS_KEY_IDThe AWS access key ID.TEXT
AWS_SECRET_ACCESS_KEYThe AWS secret access key.TEXT
AWS_SESSION_TOKENThe AWS session token.TEXT
AWS_ROLE_ARNThe AWS role ARN.TEXT
AWS_ROLE_EXTERNAL_IDThe AWS role external ID.TEXT

Return type

The result is a table with one row per Parquet data file in the requested snapshot. The leading columns are fixed; the trailing columns depend on the table’s schema and partitioning. Fixed columns:
ColumnTypeDescription
file_pathTEXTFull path to the data file.
file_formatTEXTFile format of the data file (for example, parquet).
record_countBIGINTNumber of records in the file.
file_sizeBIGINTFile size in bytes.
file_nameTEXTThe file name component of the path.
file_bucketTEXTThe storage bucket the file resides in.
file_storage_apiTEXTThe storage API used to access the file.
deleted_positionsBYTEAA serialized roaring bitmap of row positions deleted from the file.
Per-column statistics columns: For every column in the table, two nullable columns are appended, holding the per-file minimum and maximum value:
  • min_<column_name>
  • max_<column_name>
Partition value columns: For every partition column, a partition_value_<column_name>_<field_id> column holds the file’s partition value.

Enabling DuckLake

DuckLake support is disabled by default. Enable it for your session before you query DuckLake tables:
SET enable_ducklake = true;
LIST_DUCKLAKE_FILES runs on a user engine. It is not supported on the system engine.

Examples

Listing files using a LOCATION

SET enable_ducklake = true;

SELECT *
FROM LIST_DUCKLAKE_FILES(
  LOCATION => 'my_ducklake_loc',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
);

Inspecting file sizes and record counts

SELECT file_name, record_count, file_size
FROM LIST_DUCKLAKE_FILES(
  LOCATION => 'my_ducklake_loc',
  TABLE => 'my_first_ducklake_table'
)
ORDER BY file_size DESC;

Listing files using individual parameters

SELECT file_path, record_count
FROM LIST_DUCKLAKE_FILES(
  CATALOG => 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
);

Limitations

  • DuckLake support is disabled by default. Set enable_ducklake = true to use it.
  • LIST_DUCKLAKE_FILES runs on a user engine; the system engine is not supported.
  • Only PostgreSQL DuckLake catalogs are supported. The data files must be Parquet.
  • The table’s column types must be readable by Firebolt. See Supported data types.