LIST_DUCKLAKE_FILES

A table-valued function (TVF) that lists the data files that make up a DuckLake table at a given snapshot, along with per-file metadata such as record count, file size, and per-column min/max statistics. Unlike READ_DUCKLAKE, it does not read the row data itself — it returns one row per Parquet data file. LIST_DUCKLAKE_FILES is useful for inspecting a table’s physical layout, debugging pruning and partitioning, and understanding what Firebolt would scan for a query.

DuckLake support is experimental and may change. Only DuckLake catalogs hosted on PostgreSQL are supported.

To read the actual row data of a DuckLake table, use READ_DUCKLAKE.

Syntax

-- Using a LOCATION object (catalog connection string and credentials stored in LOCATION)
LIST_DUCKLAKE_FILES (
  LOCATION => '<location_name>'
  [, SCHEMA => '<schema_name>' ]
  [, TABLE => '<table_name>' ]
  [, SNAPSHOT_ID => <snapshot_id> ]
)

-- Using individual TVF parameters
LIST_DUCKLAKE_FILES (
  CATALOG => '<postgres_connection_string>',
  SCHEMA => '<schema_name>',
  TABLE => '<table_name>'
  [, SNAPSHOT_ID => <snapshot_id> ]
  [, S3_ENDPOINT => '<s3_endpoint>' ]
  -- Parameters for Amazon S3
  [, AWS_ACCESS_KEY_ID => '<aws_access_key_id>' ]
  [, AWS_SECRET_ACCESS_KEY => '<aws_secret_access_key>' ]
  [, AWS_SESSION_TOKEN => '<aws_session_token>' ]
  [, AWS_ROLE_ARN => '<aws_role_arn>' ]
  [, AWS_ROLE_EXTERNAL_ID => '<aws_role_external_id>' ]
)

When you pass AWS_ROLE_ARN, set the optional AWS_ROLE_EXTERNAL_ID to add a customer-controlled condition to your role’s trust policy.

For role-based AWS access you can additionally set an external ID. An external ID is a value you choose and control that AWS checks when Firebolt assumes your role, adding a second condition on top of your account’s unique IAM principal. Configuring one is a recommended best practice. See IAM roles.

The parameters are identical to those of READ_DUCKLAKE.

Parameters

Using a LOCATION object

Parameter	Description	Supported input types
`LOCATION`	The name of a location object that stores the DuckLake catalog connection string, the optional S3 endpoint, the optional storage credentials, and an optional default schema and table. `LOCATION` must be specified as a string literal.	`TEXT`
`SCHEMA`	The catalog schema the table resides in. Optional. If the location object already defines a schema, the two must match. If neither specifies a schema, `main` is used.	`TEXT`
`TABLE`	The name of the DuckLake table. Optional only if the location object already defines a table; otherwise required.	`TEXT`
`SNAPSHOT_ID`	The DuckLake snapshot to inspect. Optional. The default is the latest snapshot in the catalog.	`BIGINT`

Using individual TVF parameters

Parameter	Description	Supported input types
`CATALOG`	A PostgreSQL connection string pointing to the DuckLake catalog database, for example `postgresql://user:password@host:5432/dbname`. Required.	`TEXT`
`SCHEMA`	The catalog schema the table resides in, for example `main`. Required for this call shape.	`TEXT`
`TABLE`	The name of the DuckLake table. Required for this call shape.	`TEXT`
`SNAPSHOT_ID`	The DuckLake snapshot to inspect. Optional. The default is the latest snapshot in the catalog.	`BIGINT`
`S3_ENDPOINT`	The S3-compatible endpoint that serves the data files. Optional. Omit it when the data files live on a local filesystem.	`TEXT`

Amazon S3 parameters

Use these parameters to authenticate to the object storage that holds the Parquet data files. They apply only to the individual-parameter call shape.

Parameter	Description	Supported input types
`AWS_ACCESS_KEY_ID`	The AWS access key ID.	`TEXT`
`AWS_SECRET_ACCESS_KEY`	The AWS secret access key.	`TEXT`
`AWS_SESSION_TOKEN`	The AWS session token.	`TEXT`
`AWS_ROLE_ARN`	The AWS role ARN.	`TEXT`
`AWS_ROLE_EXTERNAL_ID`	The AWS role external ID.	`TEXT`

Return type

The result is a table with one row per Parquet data file in the requested snapshot. The leading columns are fixed; the trailing columns depend on the table’s schema and partitioning. Fixed columns:

Column	Type	Description
`file_path`	`TEXT`	Full path to the data file.
`file_format`	`TEXT`	File format of the data file (for example, `parquet`).
`record_count`	`BIGINT`	Number of records in the file.
`file_size`	`BIGINT`	File size in bytes.
`file_name`	`TEXT`	The file name component of the path.
`file_bucket`	`TEXT`	The storage bucket the file resides in.
`file_storage_api`	`TEXT`	The storage API used to access the file.
`deleted_positions`	`BYTEA`	A serialized roaring bitmap of row positions deleted from the file.

Per-column statistics columns: For every column in the table, two nullable columns are appended, holding the per-file minimum and maximum value:

min_<column_name>
max_<column_name>

Partition value columns: For every partition column, a partition_value_<column_name>_<field_id> column holds the file’s partition value.

Examples

Listing files using a LOCATION

SELECT *
FROM LIST_DUCKLAKE_FILES(
  LOCATION => 'my_ducklake_loc',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
);

Inspecting file sizes and record counts

SELECT file_name, record_count, file_size
FROM LIST_DUCKLAKE_FILES(
  LOCATION => 'my_ducklake_loc',
  TABLE => 'my_first_ducklake_table'
)
ORDER BY file_size DESC;

Listing files using individual parameters

SELECT file_path, record_count
FROM LIST_DUCKLAKE_FILES(
  CATALOG => 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
);

Limitations

Only PostgreSQL DuckLake catalogs are supported. The data files must be Parquet.
The table’s column types must be readable by Firebolt. See Supported data types.

READ_DUCKLAKE — Read the row data of a DuckLake table.
CREATE LOCATION (DuckLake) — Store the catalog connection string and credentials in a reusable location object instead of passing them on every call.
DuckLake functions overview — All DuckLake functions in one place.
Query DuckLake tables with Firebolt Core — A step-by-step guide to setting up a DuckLake catalog and reading it from Firebolt.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Syntax

Parameters

Using a LOCATION object

Using individual TVF parameters

Amazon S3 parameters

Return type

Examples

Listing files using a LOCATION

Inspecting file sizes and record counts

Listing files using individual parameters

Limitations

​Syntax

​Parameters

​Using a LOCATION object

​Using individual TVF parameters

​Amazon S3 parameters

​Return type

​Examples

​Listing files using a LOCATION

​Inspecting file sizes and record counts

​Listing files using individual parameters

​Limitations

​Related

Syntax

Parameters

Using a LOCATION object

Using individual TVF parameters

Amazon S3 parameters

Return type

Examples

Listing files using a LOCATION

Inspecting file sizes and record counts

Listing files using individual parameters

Limitations

Related