Skip to main content
This function takes an integer n and a text sequence, then splits the sequence into overlapping contiguous subsequences of length n.

Syntax

NGRAM( <n>, <text> )

Parameters

ParameterDescriptionDatatype
<n>An integer specifying the length of each n-gram.INTEGER
<text>The text sequence to split into n-grams.TEXT

Return Types

ARRAY(TEXT)
  • If any of the inputs is nullable, the result type is ARRAY(TEXT) NULL.

Behavior

The function splits the input text into overlapping contiguous subsequences of length n.
  • If n is smaller than the size of the input text, an array containing the single value of the input text is returned.
  • If n is smaller than 1, an error is thrown.
  • If any input is NULL, the result is NULL regardless of the other input value.

Errors

An error is thrown if n is smaller than 1.

Respect/Ignore Nulls

Propagates nulls: If any input is NULL, the result is NULL.

Examples

The following example generates 2-grams (bigrams) from the string β€˜hello world’:
SELECT NGRAM(2, 'hello world') AS result;
result array(text)
[β€˜he’, β€˜el’, β€˜ll’, β€˜lo’, β€˜o ’, ’ w’, β€˜wo’, β€˜or’, β€˜rl’, β€˜ld’]

Rows: 1Execution time: 5.56ms

The following example generates 3-grams (trigrams) from the string β€˜hello world’:
SELECT NGRAM(3, 'hello world') AS result;
result array(text)
[β€˜hel’, β€˜ell’, β€˜llo’, β€˜lo ’, β€˜o w’, ’ wo’, β€˜wor’, β€˜orl’, β€˜rld’]

Rows: 1Execution time: 5.68ms

The following example generates 1-grams (unigrams) from the string β€˜hello’:
SELECT NGRAM(1, 'hello') AS result;
result array(text)
[β€˜h’, β€˜e’, β€˜l’, β€˜l’, β€˜o’]

Rows: 1Execution time: 5.62ms

The following example generates 10-grams from the string β€˜hi’. Since the string length matches the n-gram size, the result contains the entire string:
SELECT NGRAM(10, 'hi') AS result;
result array(text)
[β€˜hi’]

Rows: 1Execution time: 5.08ms

The following example uses an n-gram size of 0, which is invalid and throws an error:
SELECT NGRAM(0, 'hi') AS result;
ERROR: Line 1, Column 8: Invalid n-gram size: 0. Must be greater than 0. Choose an n-gram size larger than 0 or NULL. The following example uses a negative n-gram size, which is invalid and throws an error:
SELECT NGRAM(-1, 'hi') AS result;
ERROR: Line 1, Column 8: Invalid n-gram size: -1. Must be greater than 0. Choose an n-gram size larger than 0 or NULL. The following example generates 2-grams (bigrams) from the Japanese string β€˜γ“γ‚“γ«γ‘γ―β€™:
SELECT NGRAM(2, 'こんにけは') AS result;
result array(text)
[β€˜γ“γ‚“β€™, β€˜γ‚“γ«β€™, β€˜γ«γ‘β€™, β€˜γ‘γ―β€™]

Rows: 1Execution time: 5.10ms

The following example generates 2-grams (bigrams) from the string of emojis β€™πŸ˜ŠπŸ‘πŸŽ‰β€˜:
SELECT NGRAM(2, 'πŸ˜ŠπŸ‘πŸŽ‰') AS result;
result array(text)
[β€™πŸ˜ŠπŸ‘β€™, β€™πŸ‘πŸŽ‰β€˜]

Rows: 1Execution time: 5.69ms