Xata Agent: AI agent expert in PostgreSQL

101 points by todsacerdoti 6 months ago

simonw 6 months ago

The prompts that do most of the work are in this file: https://github.com/xataio/agent/blob/main/apps/dbagent/src/l... - strings stored in these variables:

  SLOW_QUERIES_PLAYBOOK
  GENERAL_MONITORING_PLAYBOOK
  TUNING_PLAYBOOK
  INVESTIGATE_HIGH_CPU_USAGE_PLAYBOOK
  INVESTIGATE_HIGH_CONNECTION_COUNT_PLAYBOOK
  INVESTIGATE_LOW_MEMORY_PLAYBOOK

Looks like they are orchestrated by these system prompts: https://github.com/xataio/agent/blob/69329cede85d4bc920558c0...

jasonthorsness 6 months ago

During incidents I've often found that the issue has been obvious or brewing for some time - the idea of having an LLM-driven "smart" monitoring system for key services that can recognize problems and often take action could hopefully make this less of a thing. I'll be looking at how this works to try something similar for my own company's services.

I think a key for this one is "I use preset SQL commands. I will never run destructive (even potentially destructive) commands against your database." If it's also locked down to only informational queries (and not leaking user tables to the LLM providers) I think why not try this?

I do wonder about cost of this at scale; compared to the cost of the services being monitored. Hopefully an Agent tax doesn't become another Datadog tax.

tudorg 6 months ago

> I do wonder about cost of this at scale; compared to the cost of the services being monitored. Hopefully an Agent tax doesn't become another Datadog tax.
One idea that we want to experiment with is that we let the model pick the next time that it runs (between bounds). So if the model has any reason of concern it runs more often, otherwise maybe once every couple of hours is enough.

lelandfe 6 months ago

> I support multiple models from OpenAI, Anthropic, and Deepseek.

Are there risks associated with sending DB info off to these third parties?

fforflo 6 months ago

Of course, there are, but I guess this relies on the various System views (pg_*), like most monitoring tools, which have fine-grained control access if you create appropriate roles.
vosper 6 months ago

> Are there risks associated with sending DB info off to these third parties?
You can use AWS Bedrock and get access to Claude, and then just have something that proxies from your tool over to AWS. It'll work provided you can set a provider URL for the LLM.
So provided that you trust AWS you can use at least Claude (the best LLM anyway [at least today]) with confidence.
cship2 6 months ago

You can also self host via ollama assuming you got proper GPU. Running it on CPU can take minutes.
thinkingtoilet 6 months ago

Of course!

femiagbabiaka 6 months ago

Interesting, might try it out at home.

Documentation asserts: > I use preset SQL commands. I will never run destructive (even potentially destructive) commands against your database.

This is enforced by taking the responsibility for generating SQL in order to evaluate state out of the hands of the LLM. The LLM simply interprets results of predetermined commands based on a set of prompts/playbooks: https://github.com/xataio/agent/blob/69329cede85d4bc920558c0...

iLoveOncall 6 months ago

Until it hallucinates and doesn't run the command that you expect or doesn't run the command at all?
This doesn't take anything out of the hands of the LLM.
- femiagbabiaka 6 months ago
  
  A hallucination that doesn't use a predefined command, but instead hallucinates A. random SQL queries and B. destructive queries, seems unlikely. But as a best practice the user that the agent connects to the database with should be locked down, of course.
- tudorg 6 months ago
  
  There are things that can go wrong, but not so wrong to delete your data or cause outages. So there is a level of safety that I think is important at this moment. We do want to also allow executing generated SQL, but with an approval workflow.
esafak 6 months ago

I'd define the operations I allow in a composable language like Malloy and expose them to the LLM through MCP.

upghost 6 months ago

Watched the video. That is an absolutely fantastic UI. Seriously, huge props. This makes the project actually useful (as opposed to "why not just code this myself?"). Well done Xataio!

iLoveOncall 6 months ago

The title should really include the fact that it's an expert at MONITORING PostgreSQL. It's not for writing queries from natural language.

I'm extremely interested in the latter but not at all in the first.

a_tyler_ 6 months ago

The Xata Agent sounds like a cool use of AI for PostgreSQL monitoring—having an LLM interpret logs and metrics could be super useful for catching issues early. I like that it sticks to preset SQL commands to avoid unintended actions. That said, I’d be curious about the privacy implications of sending DB insights to an AI and the cost of running LLMs at scale. Maybe a self-hosted option would make sense? Either way, it’s an interesting direction for database monitoring!

arcticfox 6 months ago

This is very cool! What is preventing the other cloud providers from being supported I wonder? is the integration not just a connection string?

tudorg 6 months ago

It is a connection string for the direct queries, but also access to the metrics/logs via cloud specific tools. In case of AWS that's Cloudwatch. We do plan to add support for all the popular Postgres services.

arjunlol 6 months ago

Looks cool! This may actually save a lot of manual dba work for myself