This function takes an integer n and a text sequence, then splits the sequence into
overlapping contiguous subsequences of length n.
Parameters
| Parameter | Description | Datatype |
|---|
<n> | An integer specifying the length of each n-gram. | INTEGER |
<text> | The text sequence to split into n-grams. | TEXT |
Return Types
ARRAY(TEXT)
- If any of the inputs is nullable, the result type is
ARRAY(TEXT) NULL.
Behavior
The function splits the input text into overlapping contiguous subsequences of length n.
- If
n is smaller than the size of the input text, an array containing the single value of the input text is returned.
- If
n is smaller than 1, an error is thrown.
- If any input is
NULL, the result is NULL regardless of the other input value.
An error is thrown if n is smaller than 1.
Respect/Ignore Nulls
Propagates nulls: If any input is NULL, the result is NULL.
Examples
The following example generates 2-grams (bigrams) from the string βhello worldβ:
SELECT NGRAM(2, 'hello world') AS result;
| result array(text) |
|---|
| [βheβ, βelβ, βllβ, βloβ, βo β, β wβ, βwoβ, βorβ, βrlβ, βldβ] |
Rows: 1Execution time: 5.56ms
The following example generates 3-grams (trigrams) from the string βhello worldβ:
SELECT NGRAM(3, 'hello world') AS result;
| result array(text) |
|---|
| [βhelβ, βellβ, βlloβ, βlo β, βo wβ, β woβ, βworβ, βorlβ, βrldβ] |
Rows: 1Execution time: 5.68ms
The following example generates 1-grams (unigrams) from the string βhelloβ:
SELECT NGRAM(1, 'hello') AS result;
| result array(text) |
|---|
| [βhβ, βeβ, βlβ, βlβ, βoβ] |
Rows: 1Execution time: 5.62ms
The following example generates 10-grams from the string βhiβ. Since the string length matches the n-gram size, the result contains the entire string:
SELECT NGRAM(10, 'hi') AS result;
| result array(text) |
|---|
| [βhiβ] |
Rows: 1Execution time: 5.08ms
The following example uses an n-gram size of 0, which is invalid and throws an error:
SELECT NGRAM(0, 'hi') AS result;
ERROR: Line 1, Column 8: Invalid n-gram size: 0. Must be greater than 0. Choose an n-gram size larger than 0 or NULL.
The following example uses a negative n-gram size, which is invalid and throws an error:
SELECT NGRAM(-1, 'hi') AS result;
ERROR: Line 1, Column 8: Invalid n-gram size: -1. Must be greater than 0. Choose an n-gram size larger than 0 or NULL.
The following example generates 2-grams (bigrams) from the Japanese string βγγγ«γ‘γ―β:
SELECT NGRAM(2, 'γγγ«γ‘γ―') AS result;
| result array(text) |
|---|
| [βγγβ, βγγ«β, βγ«γ‘β, βγ‘γ―β] |
Rows: 1Execution time: 5.10ms
The following example generates 2-grams (bigrams) from the string of emojis βπππβ:
SELECT NGRAM(2, 'πππ') AS result;
| result array(text) |
|---|
| [βππβ, βππβ] |
Rows: 1Execution time: 5.69ms