APPROX_COUNT_DISTINCT

Counts the approximate number of unique values that are not NULL. Internally, APPROX_COUNT_DISTINCT uses HyperLogLog (HLL) sketches. Calling APPROX_COUNT_DISTINCT returns the same value as calling HLL_COUNT_DISTINCT with precision 17.

Syntax

APPROX_COUNT_DISTINCT(<expression>) [FILTER ([WHERE] <condition>)]

Parameters

Parameter	Description	Supported input types
`<expression>`	Expression on which to approximate the distinct count	Any type
`<condition>`	An optional boolean expression to filter rows used in aggregation	`BOOL`

Return Type

BIGINT

Example

When aggregating on few distinct values, APPROX_COUNT_DISTINCT has no estimation error and returns exact results:

SELECT
  APPROX_COUNT_DISTINCT(number) AS approximate,
  COUNT(DISTINCT number) AS exact
FROM generate_series(1, 1000) r(number);

approximate long	exact long
1000	1000

Rows: 1Execution time: 9.21ms

NULL values are ignored by APPROX_COUNT_DISTINCT. Including NULL in the input does not change the result:

SELECT
  APPROX_COUNT_DISTINCT(number) AS approximate,
  COUNT(DISTINCT number) AS exact
FROM (
  SELECT * FROM generate_series(1, 1000)
  UNION ALL
  SELECT NULL
) r(number);

approximate long	exact long
1000	1000

Rows: 1Execution time: 11.74ms

As the number of distinct values grows, the result becomes an approximation:

SELECT
  APPROX_COUNT_DISTINCT(number) AS approximate,
  COUNT(DISTINCT number) AS exact
FROM generate_series(1, 50000) r(number);

approximate long	exact long
50160	50000

Rows: 1Execution time: 16.67ms

APPROX_COUNT_DISTINCT also works for compound types such as arrays:

SELECT
  APPROX_COUNT_DISTINCT(arr) AS approximate,
  COUNT(DISTINCT arr) AS exact
FROM UNNEST(ARRAY[[1, 2], [3, 4], NULL, [NULL], [1, NULL]]) r(arr);

approximate long	exact long
4	4

Rows: 1Execution time: 7.19ms

Use in Aggregating Indexes

If you use APPROX_COUNT_DISTINCT in an aggregating index, every deletion from the base table requires a full table scan to update the index. Firebolt performs this automatically and transactionally consistent. However, to achieve better performance in the presence of regular deletions, you can use COUNTING_HLL_COUNT_DISTINCT instead, where no full table scan is required.

Overview

Performance and Observability

Security

Self-Managed

Managed service

Guides

SQL reference

Release notes

API reference

Legal

Syntax

Parameters

Return Type

Example

Use in Aggregating Indexes

​Syntax

​Parameters

​Return Type

​Example

​Use in Aggregating Indexes

Syntax

Parameters

Return Type

Example

Use in Aggregating Indexes