Skip to main content
A table-valued function (TVF) that reads data from DuckLake tables. DuckLake keeps table metadata in a SQL catalog database (PostgreSQL) and stores the underlying data as Parquet files in object storage or on a local filesystem. READ_DUCKLAKE connects to the catalog, resolves the requested table and snapshot, and returns a table with the data from that snapshot. The function can use either a location object or individual TVF parameters to access the catalog and data.
DuckLake support is experimental and disabled by default. Set enable_ducklake = true to use it, and expect the behavior to change. Only DuckLake catalogs hosted on PostgreSQL are supported.
Topics: To inspect the underlying data files of a DuckLake table without reading the row data, use LIST_DUCKLAKE_FILES.

Syntax

-- Using a LOCATION object (catalog connection string and credentials stored in LOCATION)
READ_DUCKLAKE (
  LOCATION => '<location_name>'
  [, SCHEMA => '<schema_name>' ]
  [, TABLE => '<table_name>' ]
  [, SNAPSHOT_ID => <snapshot_id> ]
)

-- Using individual TVF parameters
READ_DUCKLAKE (
  CATALOG => '<postgres_connection_string>',
  SCHEMA => '<schema_name>',
  TABLE => '<table_name>'
  [, SNAPSHOT_ID => <snapshot_id> ]
  [, S3_ENDPOINT => '<s3_endpoint>' ]
  -- Parameters for Amazon S3
  [, AWS_ACCESS_KEY_ID => '<aws_access_key_id>' ]
  [, AWS_SECRET_ACCESS_KEY => '<aws_secret_access_key>' ]
  [, AWS_SESSION_TOKEN => '<aws_session_token>' ]
  [, AWS_ROLE_ARN => '<aws_role_arn>' ]
  [, AWS_ROLE_EXTERNAL_ID => '<aws_role_external_id>' ]
)

Parameters

Using a LOCATION object

ParameterDescriptionSupported input types
LOCATIONThe name of a location object that stores the DuckLake catalog connection string, the optional S3 endpoint, the optional storage credentials, and an optional default schema and table. Firebolt recommends using LOCATION to store the catalog connection string and credentials. LOCATION must be specified as a string literal (e.g., LOCATION => 'my_location').TEXT
SCHEMAThe catalog schema the table resides in. Optional. If the location object already defines a schema, the two must match. If neither the location nor the call specifies a schema, main is used.TEXT
TABLEThe name of the DuckLake table to read. Optional only if the location object already defines a table; otherwise required. If both specify a table, the two must match.TEXT
SNAPSHOT_IDThe DuckLake snapshot to read. Optional. The default is the latest snapshot in the catalog.BIGINT
For syntax details, see CREATE LOCATION (DuckLake).

Using individual TVF parameters

ParameterDescriptionSupported input types
CATALOGA PostgreSQL connection string pointing to the DuckLake catalog database, for example postgresql://user:password@host:5432/dbname. Required.TEXT
SCHEMAThe catalog schema the table resides in, for example main. Required for this call shape.TEXT
TABLEThe name of the DuckLake table to read. Required for this call shape.TEXT
SNAPSHOT_IDThe DuckLake snapshot to read. Optional. The default is the latest snapshot in the catalog.BIGINT
S3_ENDPOINTThe S3-compatible endpoint that serves the data files, for example http://minio:9000 or https://s3.us-east-1.amazonaws.com. Optional. Omit it when the data files live on a local filesystem.TEXT

Amazon S3 parameters

Use these parameters to authenticate to the object storage that holds the Parquet data files. They apply only to the individual-parameter call shape; with a location object, store credentials in the location instead.
ParameterDescriptionSupported input types
AWS_ACCESS_KEY_IDThe AWS access key ID.TEXT
AWS_SECRET_ACCESS_KEYThe AWS secret access key.TEXT
AWS_SESSION_TOKENThe AWS session token.TEXT
AWS_ROLE_ARNThe AWS role ARN.TEXT
AWS_ROLE_EXTERNAL_IDThe AWS role external ID.TEXT

Return type

The result is a table with the data from the requested DuckLake snapshot. Columns are read using the data types defined in the DuckLake catalog, mapped to the corresponding Firebolt types. See Supported data types.

Enabling DuckLake

DuckLake support is disabled by default. Enable it for your session before you query DuckLake tables:
SET enable_ducklake = true;
READ_DUCKLAKE runs on a user engine. It is not supported on the system engine.

Best practices

  • Use a LOCATION object to store the catalog connection string and credentials. The connection string typically contains a password, and a location object centralizes credential management so you don’t repeat secrets in every query. See CREATE LOCATION (DuckLake).
  • Use views to simplify repeated queries. Wrap READ_DUCKLAKE in a view so you can query a DuckLake table without repeating the TVF call. See Simplifying queries with views.
  • Pin a SNAPSHOT_ID for reproducible reads. By default Firebolt reads the latest snapshot. Specify SNAPSHOT_ID when you need a stable, repeatable view of the data.
  • Specify all parameters using the named-parameter syntax (TABLE => 'my_table') rather than relying on parameter positions.

How inlined data is handled

DuckLake doesn’t always write new rows to Parquet files. To keep small or frequent writes cheap, it can inline them — storing the rows directly in the catalog database (PostgreSQL) instead of flushing them to a Parquet data file. DuckLake decides this when it writes the data, based on its data_inlining_row_limit setting, which you configure on the DuckLake side when you attach the catalog. Firebolt never writes to DuckLake, so it doesn’t control inlining; it only reads whatever the catalog describes. READ_DUCKLAKE reads inlined rows transparently. A single call returns the complete table at the requested snapshot — both the rows in Parquet files and the rows inlined in the catalog — so you don’t query inlined tables any differently. Two behaviors follow from this:
  • LIST_DUCKLAKE_FILES doesn’t report inlined rows. It lists Parquet data files, and inlined rows have no file. A table can therefore return more rows from READ_DUCKLAKE than the file record counts in LIST_DUCKLAKE_FILES add up to.
  • Metadata pseudo-columns use sentinel values for inlined rows. Inlined rows have no source file, so for those rows $source_file_name (and the other TEXT source pseudo-columns) returns 'inlined', and $source_file_row_number returns 0.
Firebolt currently reads at most one inlined data table per DuckLake table. Schema evolution can leave inlined rows spread across more than one catalog table (one per schema version); reading such a table fails with Expected exactly one inlined data table. Multiple inlined data tables are not supported yet.

Inspecting the plan

When a table has inlined data, READ_DUCKLAKE plans as a union of two branches:
  • A Parquet branchread_from_s3 over the data files, with a positional-delete filter, fed by list_ducklake_files for the file metadata and statistics.
  • An inlined-data branchread_ducklake_inlined_data, which reads the rows directly from the catalog database.
Use EXPLAIN (PHYSICAL) to see both branches. The output below is abbreviated — the read_from_s3 arguments, the list_ducklake_files column list, and the [Types] annotations are trimmed for readability:
EXPLAIN (PHYSICAL)
SELECT *
FROM READ_DUCKLAKE(LOCATION => 'my_ducklake_loc', TABLE => 't')
ORDER BY id;
[0] [Projection] id, name
 \_[1] [Sort] OrderBy: [id Ascending Last]
    \_[2] [MaybeCache]
       \_[3] [Union]
          \_[4] [Projection] id, name
          |  \_[5] [Filter] (not roaring_bitmap_contains(deleted_positions, ($source_file_row_number - 1)))
          |     \_[6] [TableFuncScan] id: $0.id, name: $0.name, $source_file_row_number: $0.$source_file_row_number
          |       |   $0 = read_from_s3(..., type=Ducklake, ducklake_schema='main', ducklake_table='t', ducklake_snapshot_id=<snap>, ...)
          |        \_[7] [MaybeCache]
          |           \_[8] [TableFuncScan] file_format: $0.file_format, ..., min_name: $0.min_name, max_name: $0.max_name
          |                 $0 = list_ducklake_files(catalog => '****', schema => 'main', table => 't', snapshot_id => <snap>)
          \_[9] [MaybeCache]
             \_[10] [TableFuncScan] id: $0.id, name: $0.name
                   $0 = read_ducklake_inlined_data(catalog => '****', tables => [ducklake_inlined_data_<tbl>])
The [Union] node (3) combines the two sources. The first branch (nodes 4–8) reads and delete-filters the Parquet files; the second branch (nodes 9–10) reads the inlined rows from PostgreSQL. The [MaybeCache] nodes cache the metadata and data subresults so that repeated reads of the same snapshot can skip the work. When a table has no inlined data, the [Union] and the read_ducklake_inlined_data branch don’t appear, and the plan is just the Parquet branch.

Examples

Reading using a LOCATION

The following example creates a location object that stores the catalog connection string and storage credentials, then reads from a table:
SET enable_ducklake = true;

CREATE LOCATION my_ducklake_loc WITH
  SOURCE = DUCKLAKE
  CATALOG = 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db';

SELECT *
FROM READ_DUCKLAKE(
  LOCATION => 'my_ducklake_loc',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
)
LIMIT 5;
If the location object already defines the schema and table, you can omit them:
SELECT * FROM READ_DUCKLAKE(LOCATION => 'my_ducklake_loc') LIMIT 5;
For more examples of LOCATION, see CREATE LOCATION (DuckLake).

Reading a specific snapshot

SELECT *
FROM READ_DUCKLAKE(
  LOCATION => 'my_ducklake_loc',
  TABLE => 'my_first_ducklake_table',
  SNAPSHOT_ID => 3
)
LIMIT 5;

Reading with individual parameters

When the data files live on a local filesystem (for example, a Firebolt Core deployment with on-disk Parquet files), pass the catalog connection string directly and omit the S3 parameters:
SELECT *
FROM READ_DUCKLAKE(
  CATALOG => 'postgresql://dl_user:dl_pw@127.0.0.1:5432/dl_db',
  SCHEMA => 'main',
  TABLE => 'my_first_ducklake_table'
)
LIMIT 5;
When the data files are stored in S3-compatible object storage, provide the endpoint and credentials:
SELECT *
FROM READ_DUCKLAKE(
  CATALOG => 'postgresql://dl_user:dl_pw@pg-host:5432/dl_db',
  SCHEMA => 'main',
  TABLE => 'users',
  S3_ENDPOINT => 'http://minio:9000',
  AWS_ACCESS_KEY_ID => 'my-access-key',
  AWS_SECRET_ACCESS_KEY => 'my-secret-key'
)
ORDER BY id
LIMIT 5;

Simplifying queries with views

You can wrap READ_DUCKLAKE in a view for easier querying. This also works with LOCATION-based calls.
CREATE VIEW users AS
  SELECT * FROM READ_DUCKLAKE(
    LOCATION => 'my_ducklake_loc',
    TABLE => 'users'
  );

SELECT * FROM users LIMIT 10;
Once the view exists, you can use it in joins and complex queries without referencing READ_DUCKLAKE directly. For more information about views, see CREATE VIEW.

Inspecting the query plan

Use EXPLAIN (ANALYZE) to see how Firebolt reads the DuckLake table, including how metadata and data caching behave across repeated runs:
EXPLAIN (ANALYZE)
SELECT *
FROM READ_DUCKLAKE(
  LOCATION => 'my_ducklake_loc',
  TABLE => 'my_first_ducklake_table'
)
ORDER BY r DESC
LIMIT 5;

Supported data types

Firebolt maps DuckLake column types to Firebolt types as follows.
DuckLake typeFirebolt type
booleanBOOLEAN
int8, int16, int32INT
int64BIGINT
uint8, uint16INT
uint32BIGINT
float32REAL
float64DOUBLE PRECISION
decimal(p, s), numeric(p, s)DECIMAL(p, s), NUMERIC(p, s)
dateDATE
timestampTIMESTAMP (microsecond precision, no time zone)
timestamptzTIMESTAMPTZ (microsecond precision, with time zone)
varcharTEXT
blobBYTEA
listARRAY (of the supported element type)
The following DuckLake types are currently not supported. Reading a table that contains one of these types fails with an error:
  • Integers wider than Firebolt supports: uint64, int128, uint128.
  • Time and sub-second/extended timestamp types: time, timetz, timestamp_s, timestamp_ms, timestamp_ns, interval.
  • json, uuid, and variant.
  • Nested struct and map types. (Nested list columns are supported and read as ARRAY.)
  • Geometry types, including point, linestring, polygon, multipoint, multilinestring, multipolygon, linestring_z, and geometrycollection.

Limitations

  • DuckLake support is disabled by default. Set enable_ducklake = true to use it.
  • READ_DUCKLAKE runs on a user engine; the system engine is not supported.
  • Only PostgreSQL DuckLake catalogs are supported. The data files must be Parquet.
  • Reads are read-only. Firebolt does not write to or modify DuckLake tables.
  • See Supported data types for the column types that cannot currently be read.
  • Firebolt reads at most one inlined data table per DuckLake table. See How inlined data is handled.