What is distinct values?

Distinct values is a feature that stores unique string values of a column. It essentially run the following SQL:

SELECT DISTINCT <column_name> FROM <table_name> LIMIT 50

It is useful when you have a column with a limited number of unique values, such as a column with country names or product categories. By storing unique values, distinct values can improve the accuracy of your model and reduce the memory usage.

Currently the maximum number of distinct values that can be stored is 50.

Example

If a user asks what is the average salary of software engineers in the U.S.?, the AI model may generate the following SQL

AI-generated SQL without distinct values:

SELECT * FROM salaries
WHERE country = 'U.S.'

The AI model may not be aware that ‘U.S.’ and ‘United States’ are the same country. As a result, the query may not return the correct results.

AI-generated SQL with distinct values:

SELECT * FROM salaries
WHERE country = 'United States'

By using distinct values, the AI model improves the accuracy of the where clause.