Counts the approximate number of unique values that are not NULL.
Internally, APPROX_COUNT_DISTINCT uses HyperLogLog (HLL) sketches.
Calling APPROX_COUNT_DISTINCT returns the same value as calling HLL_COUNT_DISTINCT with precision 17.
Syntax
APPROX_COUNT_DISTINCT(<expression>) [FILTER ([WHERE] <condition>)]
Parameters
| Parameter | Description | Supported input types |
|---|
<expression> | Expression on which to approximate the distinct count | Any type |
<condition> | An optional boolean expression to filter rows used in aggregation | BOOL |
Return Type
BIGINT
Example
When aggregating on few distinct values, APPROX_COUNT_DISTINCT has no estimation error and returns exact results:
SELECT
APPROX_COUNT_DISTINCT(number) AS approximate,
COUNT(DISTINCT number) AS exact
FROM generate_series(1, 1000) r(number);
| approximate long | exact long |
|---|
| 1000 | 1000 |
Rows: 1Execution time: 9.21ms
NULL values are ignored by APPROX_COUNT_DISTINCT. Including NULL in the input does not change the result:
SELECT
APPROX_COUNT_DISTINCT(number) AS approximate,
COUNT(DISTINCT number) AS exact
FROM (
SELECT * FROM generate_series(1, 1000)
UNION ALL
SELECT NULL
) r(number);
| approximate long | exact long |
|---|
| 1000 | 1000 |
Rows: 1Execution time: 11.74ms
As the number of distinct values grows, the result becomes an approximation:
SELECT
APPROX_COUNT_DISTINCT(number) AS approximate,
COUNT(DISTINCT number) AS exact
FROM generate_series(1, 50000) r(number);
| approximate long | exact long |
|---|
| 50160 | 50000 |
Rows: 1Execution time: 16.67ms
APPROX_COUNT_DISTINCT also works for compound types such as arrays:
SELECT
APPROX_COUNT_DISTINCT(arr) AS approximate,
COUNT(DISTINCT arr) AS exact
FROM UNNEST(ARRAY[[1, 2], [3, 4], NULL, [NULL], [1, NULL]]) r(arr);
| approximate long | exact long |
|---|
| 4 | 4 |
Rows: 1Execution time: 7.19ms
Use in Aggregating Indexes
If you use APPROX_COUNT_DISTINCT in an aggregating index, every deletion from the base table requires a full table scan to update the index.
Firebolt performs this automatically and transactionally consistent.
However, to achieve better performance in the presence of regular deletions, you can use COUNTING_HLL_COUNT_DISTINCT instead, where no full table scan is required.