Handling complex databases with many tables

With our current architectures, Brewit can support databases with hundreds of tables without LLM exceeding the token limit. We use RAG + LLM framework to find relevant tables metadata based on users’ questions.

FAQ

Filter string columns

Filtering stirng columns can be challenging with LLM without seeing the actual data format. To handle this, we have a feature called Distinct Values.

It essentially run the following SQL and store results in the vector store for retrieval:

SELECT DISTINCT <column_name> FROM <table_name> LIMIT 50

Example

If a user asks what is the average salary of software engineers in the U.S.?, the AI model may generate the following SQL

AI-generated SQL without distinct values:

SELECT * FROM salaries
WHERE country = 'U.S.'

The AI model may not be aware that ‘U.S.’ and ‘United States’ are the same country. As a result, the query may not return the correct results.

AI-generated SQL with distinct values:

SELECT * FROM salaries
WHERE country = 'United States'

By using distinct values, the AI model improves the accuracy of the where clause.

Suggested roll-out strategy

  1. Initial rollout: Start with a small subset of data power users and a subset of the database to test the system and gather feedback.
  2. Gradual expansion: Gradually increase the number of users and data scope to ensure the system can handle the load.
  3. Full rollout: Once the system has been thoroughly tested and optimized, roll it out to all users.