Skip to main content
Counts the approximate number of unique values that are not NULL. Internally, APPROX_COUNT_DISTINCT uses HyperLogLog (HLL) sketches. Calling APPROX_COUNT_DISTINCT returns the same value as calling HLL_COUNT_DISTINCT with precision 17.

Syntax

APPROX_COUNT_DISTINCT(<expression>) [FILTER ([WHERE] <condition>)]

Parameters

ParameterDescriptionSupported input types
<expression>Expression on which to approximate the distinct countAny type
<condition>An optional boolean expression to filter rows used in aggregationBOOL

Return Type

BIGINT

Example

When aggregating on few distinct values, APPROX_COUNT_DISTINCT has no estimation error and returns exact results:
SELECT
  APPROX_COUNT_DISTINCT(number) AS approximate,
  COUNT(DISTINCT number) AS exact
FROM generate_series(1, 1000) r(number);
approximate longexact long
10001000

Rows: 1Execution time: 9.21ms

NULL values are ignored by APPROX_COUNT_DISTINCT. Including NULL in the input does not change the result:
SELECT
  APPROX_COUNT_DISTINCT(number) AS approximate,
  COUNT(DISTINCT number) AS exact
FROM (
  SELECT * FROM generate_series(1, 1000)
  UNION ALL
  SELECT NULL
) r(number);
approximate longexact long
10001000

Rows: 1Execution time: 11.74ms

As the number of distinct values grows, the result becomes an approximation:
SELECT
  APPROX_COUNT_DISTINCT(number) AS approximate,
  COUNT(DISTINCT number) AS exact
FROM generate_series(1, 50000) r(number);
approximate longexact long
5016050000

Rows: 1Execution time: 16.67ms

APPROX_COUNT_DISTINCT also works for compound types such as arrays:
SELECT
  APPROX_COUNT_DISTINCT(arr) AS approximate,
  COUNT(DISTINCT arr) AS exact
FROM UNNEST(ARRAY[[1, 2], [3, 4], NULL, [NULL], [1, NULL]]) r(arr);
approximate longexact long
44

Rows: 1Execution time: 7.19ms

Use in Aggregating Indexes

If you use APPROX_COUNT_DISTINCT in an aggregating index, every deletion from the base table requires a full table scan to update the index. Firebolt performs this automatically and transactionally consistent. However, to achieve better performance in the presence of regular deletions, you can use COUNTING_HLL_COUNT_DISTINCT instead, where no full table scan is required.