Distinct values is a feature that stores unique string values of a column. It essentially run the following SQL:
Copy
SELECT DISTINCT <column_name> FROM <table_name> LIMIT 50
It is useful when you have a column with a limited number of unique values, such as a column with country names or product categories. By storing unique values, distinct values can improve the accuracy of your model and reduce the memory usage.
Currently the maximum number of distinct values that can be stored is 50.
If a user asks what is the average salary of software engineers in the U.S.?, the AI model may generate the following SQLAI-generated SQL without distinct values:
Copy
SELECT * FROM salariesWHERE country = 'U.S.'
The AI model may not be aware that ‘U.S.’ and ‘United States’ are the same
country. As a result, the query may not return the correct results.
AI-generated SQL with distinct values:
Copy
SELECT * FROM salariesWHERE country = 'United States'
By using distinct values, the AI model improves the accuracy of the where
clause.