Best Practices
Learn the best practices for using Brewit to maximize its data analysis capabilities.
Handling complex databases with many tables
With our current architectures, Brewit can support databases with hundreds of tables without LLM exceeding the token limit. We use RAG + LLM framework to find relevant tables metadata based on users’ questions.
FAQ
Is there a limit on the maximum number of tables?
Is there a limit on the maximum number of tables?
No, There’s no limit on the number of tables in a database. Brewit can handle databases with hundreds of tables without any issues.
Is there a limit on the maximum number of columns in a table?
Is there a limit on the maximum number of columns in a table?
Currently, Brewit supports up to 150 - 200 columns per table.
How large data can Brewit handle?
How large data can Brewit handle?
We have tested databases and data warehouses with 100GB+ data and Brewit can handle them without any issues. If a user ask for all data in all tables, Brewit will ask clarification questions about the actual data they are looking for (exactly what data analysts would do if they received such a request), in order to prevent query cost.
We are also planning to integrate PrestoDB to handle large data sets (1TB+).
Filter string columns
Filtering stirng columns can be challenging with LLM without seeing the actual data format. To handle this, we have a feature called Distinct Values.
It essentially run the following SQL and store results in the vector store for retrieval:
Example
If a user asks what is the average salary of software engineers in the U.S.?
, the AI model may generate the following SQL
AI-generated SQL without distinct values:
The AI model may not be aware that ‘U.S.’ and ‘United States’ are the same country. As a result, the query may not return the correct results.
AI-generated SQL with distinct values:
By using distinct values, the AI model improves the accuracy of the where clause.
Suggested roll-out strategy
- Initial rollout: Start with a small subset of data power users and a subset of the database to test the system and gather feedback.
- Gradual expansion: Gradually increase the number of users and data scope to ensure the system can handle the load.
- Full rollout: Once the system has been thoroughly tested and optimized, roll it out to all users.