READ_DUCKLAKE

A table-valued function (TVF) that reads data from DuckLake tables. DuckLake keeps table metadata in a SQL catalog database (PostgreSQL) and stores the underlying data as Parquet files in object storage or on a local filesystem. READ_DUCKLAKE connects to the catalog, resolves the requested table and snapshot, and returns a table with the data from that snapshot. The function can use either a location object or individual TVF parameters to access the catalog and data.

DuckLake support is experimental and may change. Only DuckLake catalogs hosted on PostgreSQL are supported.

To inspect the underlying data files of a DuckLake table without reading the row data, use LIST_DUCKLAKE_FILES.

Syntax

-- Using a LOCATION object (catalog connection string and credentials stored in LOCATION)
READ_DUCKLAKE (
  LOCATION => '<location_name>'
  [, SCHEMA => '<schema_name>' ]
  [, TABLE => '<table_name>' ]
  [, SNAPSHOT_ID => <snapshot_id> ]
)

-- Using individual TVF parameters
READ_DUCKLAKE (
  CATALOG => '<postgres_connection_string>',
  SCHEMA => '<schema_name>',
  TABLE => '<table_name>'
  [, SNAPSHOT_ID => <snapshot_id> ]
  [, S3_ENDPOINT => '<s3_endpoint>' ]
  -- Parameters for Amazon S3
  [, AWS_ACCESS_KEY_ID => '<aws_access_key_id>' ]
  [, AWS_SECRET_ACCESS_KEY => '<aws_secret_access_key>' ]
  [, AWS_SESSION_TOKEN => '<aws_session_token>' ]
  [, AWS_ROLE_ARN => '<aws_role_arn>' ]
  [, AWS_ROLE_EXTERNAL_ID => '<aws_role_external_id>' ]
)

When you pass AWS_ROLE_ARN, set the optional AWS_ROLE_EXTERNAL_ID to add a customer-controlled condition to your role’s trust policy.

For role-based AWS access you can additionally set an external ID. An external ID is a value you choose and control that AWS checks when Firebolt assumes your role, adding a second condition on top of your account’s unique IAM principal. Configuring one is a recommended best practice. See IAM roles.

Parameters

Using a LOCATION object

Parameter	Description	Supported input types
`LOCATION`	The name of a location object that stores the DuckLake catalog connection string, the optional S3 endpoint, the optional storage credentials, and an optional default schema and table. Firebolt recommends using `LOCATION` to store the catalog connection string and credentials. `LOCATION` must be specified as a string literal (e.g., `LOCATION => 'my_location'`).	`TEXT`
`SCHEMA`	The catalog schema the table resides in. Optional. If the location object already defines a schema, the two must match. If neither the location nor the call specifies a schema, `main` is used.	`TEXT`
`TABLE`	The name of the DuckLake table to read. Optional only if the location object already defines a table; otherwise required. If both specify a table, the two must match.	`TEXT`
`SNAPSHOT_ID`	The DuckLake snapshot to read. Optional. The default is the latest snapshot in the catalog.	`BIGINT`

For syntax details, see CREATE LOCATION (DuckLake).

Using individual TVF parameters

Parameter	Description	Supported input types
`CATALOG`	A PostgreSQL connection string pointing to the DuckLake catalog database, for example `postgresql://user:password@host:5432/dbname`. Required.	`TEXT`
`SCHEMA`	The catalog schema the table resides in, for example `main`. Required for this call shape.	`TEXT`
`TABLE`	The name of the DuckLake table to read. Required for this call shape.	`TEXT`
`SNAPSHOT_ID`	The DuckLake snapshot to read. Optional. The default is the latest snapshot in the catalog.	`BIGINT`
`S3_ENDPOINT`	The S3-compatible endpoint that serves the data files, for example `http://minio:9000` or `https://s3.us-east-1.amazonaws.com`. Optional. Omit it when the data files live on a local filesystem.	`TEXT`

Amazon S3 parameters

Use these parameters to authenticate to the object storage that holds the Parquet data files. They apply only to the individual-parameter call shape; with a location object, store credentials in the location instead.

Parameter	Description	Supported input types
`AWS_ACCESS_KEY_ID`	The AWS access key ID.	`TEXT`
`AWS_SECRET_ACCESS_KEY`	The AWS secret access key.	`TEXT`
`AWS_SESSION_TOKEN`	The AWS session token.	`TEXT`
`AWS_ROLE_ARN`	The AWS role ARN.	`TEXT`
`AWS_ROLE_EXTERNAL_ID`	The AWS role external ID.	`TEXT`

Return type

The result is a table with the data from the requested DuckLake snapshot. Columns are read using the data types defined in the DuckLake catalog, mapped to the corresponding Firebolt types. See Supported data types.

Best practices

Use a LOCATION object to store the catalog connection string and credentials. The connection string typically contains a password, and a location object centralizes credential management so you don’t repeat secrets in every query. See CREATE LOCATION (DuckLake).
Use views to simplify repeated queries. Wrap READ_DUCKLAKE in a view so you can query a DuckLake table without repeating the TVF call. See Simplifying queries with views.
Pin a SNAPSHOT_ID for reproducible reads. By default Firebolt reads the latest snapshot. Specify SNAPSHOT_ID when you need a stable, repeatable view of the data.
Specify all parameters using the named-parameter syntax (TABLE => 'my_table') rather than relying on parameter positions.

How inlined data is handled

DuckLake doesn’t always write new rows to Parquet files. To keep small or frequent writes cheap, it can inline them — storing the rows directly in the catalog database (PostgreSQL) instead of flushing them to a Parquet data file. DuckLake decides this when it writes the data, based on its data_inlining_row_limit setting, which you configure on the DuckLake side when you attach the catalog. Firebolt never writes to DuckLake, so it doesn’t control inlining; it only reads whatever the catalog describes. READ_DUCKLAKE reads inlined rows transparently. A single call returns the complete table at the requested snapshot — both the rows in Parquet files and the rows inlined in the catalog — so you don’t query inlined tables any differently. Two behaviors follow from this:

LIST_DUCKLAKE_FILES doesn’t report inlined rows. It lists Parquet data files, and inlined rows have no file. A table can therefore return more rows from READ_DUCKLAKE than the file record counts in LIST_DUCKLAKE_FILES add up to.
Metadata pseudo-columns use sentinel values for inlined rows. Inlined rows have no source file, so for those rows $source_file_name (and the other TEXT source pseudo-columns) returns 'inlined', and $source_file_row_number returns 0.

Firebolt currently reads at most one inlined data table per DuckLake table. Schema evolution can leave inlined rows spread across more than one catalog table (one per schema version); reading such a table fails with Expected exactly one inlined data table. Multiple inlined data tables are not supported yet.

Inspecting the plan

When a table has inlined data, READ_DUCKLAKE plans as a union of two branches:

A Parquet branch — read_from_s3 over the data files, with a positional-delete filter, fed by list_ducklake_files for the file metadata and statistics.
An inlined-data branch — read_ducklake_inlined_data, which reads the rows directly from the catalog database.

Use EXPLAIN (PHYSICAL) to see both branches. The output below is abbreviated — the read_from_s3 arguments, the list_ducklake_files column list, and the [Types] annotations are trimmed for readability:

EXPLAIN (PHYSICAL)
SELECT *
FROM READ_DUCKLAKE(LOCATION => 'my_ducklake_loc', TABLE => 't')
ORDER BY id;

[0] [Projection] id, name
 \_[1] [Sort] OrderBy: [id Ascending Last]
    \_[2] [MaybeCache]
       \_[3] [Union]
          \_[4] [Projection] id, name
          |  \_[5] [Filter] (not roaring_bitmap_contains(deleted_positions, ($source_file_row_number - 1)))
          |     \_[6] [TableFuncScan] id: $0.id, name: $0.name, $source_file_row_number: $0.$source_file_row_number
          |       |   $0 = read_from_s3(..., type=Ducklake, ducklake_schema='main', ducklake_table='t', ducklake_snapshot_id=<snap>, ...)
          |        \_[7] [MaybeCache]
          |           \_[8] [TableFuncScan] file_format: $0.file_format, ..., min_name: $0.min_name, max_name: $0.max_name
          |                 $0 = list_ducklake_files(catalog => '****', schema => 'main', table => 't', snapshot_id => <snap>)
          \_[9] [MaybeCache]
             \_[10] [TableFuncScan] id: $0.id, name: $0.name
                   $0 = read_ducklake_inlined_data(catalog => '****', tables => [ducklake_inlined_data_<tbl>])

The [Union] node (3) combines the two sources. The first branch (nodes 4–8) reads and delete-filters the Parquet files; the second branch (nodes 9–10) reads the inlined rows from PostgreSQL. The [MaybeCache] nodes cache the metadata and data subresults so that repeated reads of the same snapshot can skip the work. When a table has no inlined data, the [Union] and the read_ducklake_inlined_data branch don’t appear, and the plan is just the Parquet branch.

Examples

Reading using a LOCATION

The following example creates a location object that stores the catalog connection string and storage credentials, then reads from a table:

CREATE LOCATION my_ducklake_loc WITH
  SOURCE = DUCKLAKE
  CATALOG = 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db';

SELECT *
FROM READ_DUCKLAKE(
  LOCATION => 'my_ducklake_loc',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
)
LIMIT 5;

If the location object already defines the schema and table, you can omit them:

SELECT * FROM READ_DUCKLAKE(LOCATION => 'my_ducklake_loc') LIMIT 5;

For more examples of LOCATION, see CREATE LOCATION (DuckLake).

Reading a specific snapshot

SELECT *
FROM READ_DUCKLAKE(
  LOCATION => 'my_ducklake_loc',
  TABLE => 'my_first_ducklake_table',
  SNAPSHOT_ID => 3
)
LIMIT 5;

Reading with individual parameters

When the data files live on a local filesystem (for example, a Firebolt Core deployment with on-disk Parquet files), pass the catalog connection string directly and omit the S3 parameters:

SELECT *
FROM READ_DUCKLAKE(
  CATALOG => 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
)
LIMIT 5;

When the data files are stored in S3-compatible object storage, provide the endpoint and credentials:

SELECT *
FROM READ_DUCKLAKE(
  CATALOG => 'postgresql://dl_user:dl_pw@pg-host:5432/dl_db',
  SCHEMA => 'main',
  TABLE => 'users',
  S3_ENDPOINT => 'http://minio:9000',
  AWS_ACCESS_KEY_ID => 'my-access-key',
  AWS_SECRET_ACCESS_KEY => 'my-secret-key'
)
ORDER BY id
LIMIT 5;

Simplifying queries with views

You can wrap READ_DUCKLAKE in a view for easier querying. This also works with LOCATION-based calls.

CREATE VIEW users AS
  SELECT * FROM READ_DUCKLAKE(
    LOCATION => 'my_ducklake_loc',
    TABLE => 'users'
  );

SELECT * FROM users LIMIT 10;

Once the view exists, you can use it in joins and complex queries without referencing READ_DUCKLAKE directly. For more information about views, see CREATE VIEW.

Inspecting the query plan

Use EXPLAIN (ANALYZE) to see how Firebolt reads the DuckLake table, including how metadata and data caching behave across repeated runs:

EXPLAIN (ANALYZE)
SELECT *
FROM READ_DUCKLAKE(
  LOCATION => 'my_ducklake_loc',
  TABLE => 'my_first_ducklake_table'
)
ORDER BY r DESC
LIMIT 5;

Supported data types

Firebolt maps DuckLake column types to Firebolt types as follows.

DuckLake type	Firebolt type
`boolean`	`BOOLEAN`
`int8`, `int16`, `int32`	`INT`
`int64`	`BIGINT`
`uint8`, `uint16`	`INT`
`uint32`	`BIGINT`
`float32`	`REAL`
`float64`	`DOUBLE PRECISION`
`decimal(p, s)`, `numeric(p, s)`	`DECIMAL(p, s)`, `NUMERIC(p, s)`
`date`	`DATE`
`timestamp`	`TIMESTAMP` (microsecond precision, no time zone)
`timestamptz`	`TIMESTAMPTZ` (microsecond precision, with time zone)
`varchar`	`TEXT`
`blob`	`BYTEA`
`list`	`ARRAY` (of the supported element type)

The following DuckLake types are currently not supported. Reading a table that contains one of these types fails with an error:

Integers wider than Firebolt supports: uint64, int128, uint128.
Time and sub-second/extended timestamp types: time, timetz, timestamp_s, timestamp_ms, timestamp_ns, interval.
json, uuid, and variant.
Nested struct and map types. (Nested list columns are supported and read as ARRAY.)
Geometry types, including point, linestring, polygon, multipoint, multilinestring, multipolygon, linestring_z, and geometrycollection.

Limitations

Only PostgreSQL DuckLake catalogs are supported. The data files must be Parquet.
Reads are read-only. Firebolt does not write to or modify DuckLake tables.
See Supported data types for the column types that cannot currently be read.
Firebolt reads at most one inlined data table per DuckLake table. See How inlined data is handled.

CREATE LOCATION (DuckLake) — Store the catalog connection string and credentials in a reusable location object instead of passing them on every call.
LIST_DUCKLAKE_FILES — List the Parquet data files behind a DuckLake table, with per-file metadata and statistics.
DuckLake functions overview — All DuckLake functions in one place.
Query DuckLake tables with Firebolt Core — A step-by-step guide to setting up a DuckLake catalog and reading it from Firebolt.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Syntax

Parameters

Using a LOCATION object

Using individual TVF parameters

Amazon S3 parameters

Return type

Best practices

How inlined data is handled

Inspecting the plan

Examples

Reading using a LOCATION

Reading a specific snapshot

Reading with individual parameters

Simplifying queries with views

Inspecting the query plan

Supported data types

Limitations

​Syntax

​Parameters

​Using a LOCATION object

​Using individual TVF parameters

​Amazon S3 parameters

​Return type

​Best practices

​How inlined data is handled

​Inspecting the plan

​Examples

​Reading using a LOCATION

​Reading a specific snapshot

​Reading with individual parameters

​Simplifying queries with views

​Inspecting the query plan

​Supported data types

​Limitations

​Related

Syntax

Parameters

Using a LOCATION object

Using individual TVF parameters

Amazon S3 parameters

Return type

Best practices

How inlined data is handled

Inspecting the plan

Examples

Reading using a LOCATION

Reading a specific snapshot

Reading with individual parameters

Simplifying queries with views

Inspecting the query plan

Supported data types

Limitations

Related