A table-valued function (TVF) that reads data from DuckLake tables. DuckLake keeps table metadata in a SQL catalog database (PostgreSQL) and stores the underlying data as Parquet files in object storage or on a local filesystem. READ_DUCKLAKE connects to the catalog, resolves the requested table and snapshot, and returns a table with the data from that snapshot.
The function can use either a location object or individual TVF parameters to access the catalog and data.
DuckLake support is experimental and disabled by default. Set enable_ducklake = true to use it, and expect the behavior to change. Only DuckLake catalogs hosted on PostgreSQL are supported.
Topics:
To inspect the underlying data files of a DuckLake table without reading the row data, use LIST_DUCKLAKE_FILES.
Syntax
-- Using a LOCATION object (catalog connection string and credentials stored in LOCATION)
READ_DUCKLAKE (
LOCATION => '<location_name>'
[, SCHEMA => '<schema_name>' ]
[, TABLE => '<table_name>' ]
[, SNAPSHOT_ID => <snapshot_id> ]
)
-- Using individual TVF parameters
READ_DUCKLAKE (
CATALOG => '<postgres_connection_string>',
SCHEMA => '<schema_name>',
TABLE => '<table_name>'
[, SNAPSHOT_ID => <snapshot_id> ]
[, S3_ENDPOINT => '<s3_endpoint>' ]
-- Parameters for Amazon S3
[, AWS_ACCESS_KEY_ID => '<aws_access_key_id>' ]
[, AWS_SECRET_ACCESS_KEY => '<aws_secret_access_key>' ]
[, AWS_SESSION_TOKEN => '<aws_session_token>' ]
[, AWS_ROLE_ARN => '<aws_role_arn>' ]
[, AWS_ROLE_EXTERNAL_ID => '<aws_role_external_id>' ]
)
Parameters
Using a LOCATION object
| Parameter | Description | Supported input types |
|---|
LOCATION | The name of a location object that stores the DuckLake catalog connection string, the optional S3 endpoint, the optional storage credentials, and an optional default schema and table. Firebolt recommends using LOCATION to store the catalog connection string and credentials. LOCATION must be specified as a string literal (e.g., LOCATION => 'my_location'). | TEXT |
SCHEMA | The catalog schema the table resides in. Optional. If the location object already defines a schema, the two must match. If neither the location nor the call specifies a schema, main is used. | TEXT |
TABLE | The name of the DuckLake table to read. Optional only if the location object already defines a table; otherwise required. If both specify a table, the two must match. | TEXT |
SNAPSHOT_ID | The DuckLake snapshot to read. Optional. The default is the latest snapshot in the catalog. | BIGINT |
For syntax details, see CREATE LOCATION (DuckLake).
Using individual TVF parameters
| Parameter | Description | Supported input types |
|---|
CATALOG | A PostgreSQL connection string pointing to the DuckLake catalog database, for example postgresql://user:password@host:5432/dbname. Required. | TEXT |
SCHEMA | The catalog schema the table resides in, for example main. Required for this call shape. | TEXT |
TABLE | The name of the DuckLake table to read. Required for this call shape. | TEXT |
SNAPSHOT_ID | The DuckLake snapshot to read. Optional. The default is the latest snapshot in the catalog. | BIGINT |
S3_ENDPOINT | The S3-compatible endpoint that serves the data files, for example http://minio:9000 or https://s3.us-east-1.amazonaws.com. Optional. Omit it when the data files live on a local filesystem. | TEXT |
Amazon S3 parameters
Use these parameters to authenticate to the object storage that holds the Parquet data files. They apply only to the individual-parameter call shape; with a location object, store credentials in the location instead.
| Parameter | Description | Supported input types |
|---|
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
Return type
The result is a table with the data from the requested DuckLake snapshot. Columns are read using the data types defined in the DuckLake catalog, mapped to the corresponding Firebolt types. See Supported data types.
Enabling DuckLake
DuckLake support is disabled by default. Enable it for your session before you query DuckLake tables:
SET enable_ducklake = true;
READ_DUCKLAKE runs on a user engine. It is not supported on the system engine.
Best practices
- Use a
LOCATION object to store the catalog connection string and credentials. The connection string typically contains a password, and a location object centralizes credential management so you don’t repeat secrets in every query. See CREATE LOCATION (DuckLake).
- Use views to simplify repeated queries. Wrap
READ_DUCKLAKE in a view so you can query a DuckLake table without repeating the TVF call. See Simplifying queries with views.
- Pin a
SNAPSHOT_ID for reproducible reads. By default Firebolt reads the latest snapshot. Specify SNAPSHOT_ID when you need a stable, repeatable view of the data.
- Specify all parameters using the named-parameter syntax (
TABLE => 'my_table') rather than relying on parameter positions.
How inlined data is handled
DuckLake doesn’t always write new rows to Parquet files. To keep small or frequent writes cheap, it can inline them — storing the rows directly in the catalog database (PostgreSQL) instead of flushing them to a Parquet data file. DuckLake decides this when it writes the data, based on its data_inlining_row_limit setting, which you configure on the DuckLake side when you attach the catalog. Firebolt never writes to DuckLake, so it doesn’t control inlining; it only reads whatever the catalog describes.
READ_DUCKLAKE reads inlined rows transparently. A single call returns the complete table at the requested snapshot — both the rows in Parquet files and the rows inlined in the catalog — so you don’t query inlined tables any differently.
Two behaviors follow from this:
LIST_DUCKLAKE_FILES doesn’t report inlined rows. It lists Parquet data files, and inlined rows have no file. A table can therefore return more rows from READ_DUCKLAKE than the file record counts in LIST_DUCKLAKE_FILES add up to.
- Metadata pseudo-columns use sentinel values for inlined rows. Inlined rows have no source file, so for those rows
$source_file_name (and the other TEXT source pseudo-columns) returns 'inlined', and $source_file_row_number returns 0.
Firebolt currently reads at most one inlined data table per DuckLake table. Schema evolution can leave inlined rows spread across more than one catalog table (one per schema version); reading such a table fails with Expected exactly one inlined data table. Multiple inlined data tables are not supported yet.
Inspecting the plan
When a table has inlined data, READ_DUCKLAKE plans as a union of two branches:
- A Parquet branch —
read_from_s3 over the data files, with a positional-delete filter, fed by list_ducklake_files for the file metadata and statistics.
- An inlined-data branch —
read_ducklake_inlined_data, which reads the rows directly from the catalog database.
Use EXPLAIN (PHYSICAL) to see both branches. The output below is abbreviated — the read_from_s3 arguments, the list_ducklake_files column list, and the [Types] annotations are trimmed for readability:
EXPLAIN (PHYSICAL)
SELECT *
FROM READ_DUCKLAKE(LOCATION => 'my_ducklake_loc', TABLE => 't')
ORDER BY id;
[0] [Projection] id, name
\_[1] [Sort] OrderBy: [id Ascending Last]
\_[2] [MaybeCache]
\_[3] [Union]
\_[4] [Projection] id, name
| \_[5] [Filter] (not roaring_bitmap_contains(deleted_positions, ($source_file_row_number - 1)))
| \_[6] [TableFuncScan] id: $0.id, name: $0.name, $source_file_row_number: $0.$source_file_row_number
| | $0 = read_from_s3(..., type=Ducklake, ducklake_schema='main', ducklake_table='t', ducklake_snapshot_id=<snap>, ...)
| \_[7] [MaybeCache]
| \_[8] [TableFuncScan] file_format: $0.file_format, ..., min_name: $0.min_name, max_name: $0.max_name
| $0 = list_ducklake_files(catalog => '****', schema => 'main', table => 't', snapshot_id => <snap>)
\_[9] [MaybeCache]
\_[10] [TableFuncScan] id: $0.id, name: $0.name
$0 = read_ducklake_inlined_data(catalog => '****', tables => [ducklake_inlined_data_<tbl>])
The [Union] node (3) combines the two sources. The first branch (nodes 4–8) reads and delete-filters the Parquet files; the second branch (nodes 9–10) reads the inlined rows from PostgreSQL. The [MaybeCache] nodes cache the metadata and data subresults so that repeated reads of the same snapshot can skip the work. When a table has no inlined data, the [Union] and the read_ducklake_inlined_data branch don’t appear, and the plan is just the Parquet branch.
Examples
Reading using a LOCATION
The following example creates a location object that stores the catalog connection string and storage credentials, then reads from a table:
SET enable_ducklake = true;
CREATE LOCATION my_ducklake_loc WITH
SOURCE = DUCKLAKE
CATALOG = 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db';
SELECT *
FROM READ_DUCKLAKE(
LOCATION => 'my_ducklake_loc',
SCHEMA => 'main',
TABLE => 'my_first_ducklake_table'
)
LIMIT 5;
If the location object already defines the schema and table, you can omit them:
SELECT * FROM READ_DUCKLAKE(LOCATION => 'my_ducklake_loc') LIMIT 5;
For more examples of LOCATION, see CREATE LOCATION (DuckLake).
Reading a specific snapshot
SELECT *
FROM READ_DUCKLAKE(
LOCATION => 'my_ducklake_loc',
TABLE => 'my_first_ducklake_table',
SNAPSHOT_ID => 3
)
LIMIT 5;
Reading with individual parameters
When the data files live on a local filesystem (for example, a Firebolt Core deployment with on-disk Parquet files), pass the catalog connection string directly and omit the S3 parameters:
SELECT *
FROM READ_DUCKLAKE(
CATALOG => 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db',
SCHEMA => 'main',
TABLE => 'my_first_ducklake_table'
)
LIMIT 5;
When the data files are stored in S3-compatible object storage, provide the endpoint and credentials:
SELECT *
FROM READ_DUCKLAKE(
CATALOG => 'postgresql://dl_user:dl_pw@pg-host:5432/dl_db',
SCHEMA => 'main',
TABLE => 'users',
S3_ENDPOINT => 'http://minio:9000',
AWS_ACCESS_KEY_ID => 'my-access-key',
AWS_SECRET_ACCESS_KEY => 'my-secret-key'
)
ORDER BY id
LIMIT 5;
Simplifying queries with views
You can wrap READ_DUCKLAKE in a view for easier querying. This also works with LOCATION-based calls.
CREATE VIEW users AS
SELECT * FROM READ_DUCKLAKE(
LOCATION => 'my_ducklake_loc',
TABLE => 'users'
);
SELECT * FROM users LIMIT 10;
Once the view exists, you can use it in joins and complex queries without referencing READ_DUCKLAKE directly. For more information about views, see CREATE VIEW.
Inspecting the query plan
Use EXPLAIN (ANALYZE) to see how Firebolt reads the DuckLake table, including how metadata and data caching behave across repeated runs:
EXPLAIN (ANALYZE)
SELECT *
FROM READ_DUCKLAKE(
LOCATION => 'my_ducklake_loc',
TABLE => 'my_first_ducklake_table'
)
ORDER BY r DESC
LIMIT 5;
Supported data types
Firebolt maps DuckLake column types to Firebolt types as follows.
| DuckLake type | Firebolt type |
|---|
boolean | BOOLEAN |
int8, int16, int32 | INT |
int64 | BIGINT |
uint8, uint16 | INT |
uint32 | BIGINT |
float32 | REAL |
float64 | DOUBLE PRECISION |
decimal(p, s), numeric(p, s) | DECIMAL(p, s), NUMERIC(p, s) |
date | DATE |
timestamp | TIMESTAMP (microsecond precision, no time zone) |
timestamptz | TIMESTAMPTZ (microsecond precision, with time zone) |
varchar | TEXT |
blob | BYTEA |
list | ARRAY (of the supported element type) |
The following DuckLake types are currently not supported. Reading a table that contains one of these types fails with an error:
- Integers wider than Firebolt supports:
uint64, int128, uint128.
- Time and sub-second/extended timestamp types:
time, timetz, timestamp_s, timestamp_ms, timestamp_ns, interval.
json, uuid, and variant.
- Nested
struct and map types. (Nested list columns are supported and read as ARRAY.)
- Geometry types, including
point, linestring, polygon, multipoint, multilinestring, multipolygon, linestring_z, and geometrycollection.
Limitations
- DuckLake support is disabled by default. Set
enable_ducklake = true to use it.
READ_DUCKLAKE runs on a user engine; the system engine is not supported.
- Only PostgreSQL DuckLake catalogs are supported. The data files must be Parquet.
- Reads are read-only. Firebolt does not write to or modify DuckLake tables.
- See Supported data types for the column types that cannot currently be read.
- Firebolt reads at most one inlined data table per DuckLake table. See How inlined data is handled.