jasonthorsness 11 hours ago

During incidents I've often found that the issue has been obvious or brewing for some time - the idea of having an LLM-driven "smart" monitoring system for key services that can recognize problems and often take action could hopefully make this less of a thing. I'll be looking at how this works to try something similar for my own company's services.

I think a key for this one is "I use preset SQL commands. I will never run destructive (even potentially destructive) commands against your database." If it's also locked down to only informational queries (and not leaking user tables to the LLM providers) I think why not try this?

I do wonder about cost of this at scale; compared to the cost of the services being monitored. Hopefully an Agent tax doesn't become another Datadog tax.

  • tudorg 11 hours ago

    > I do wonder about cost of this at scale; compared to the cost of the services being monitored. Hopefully an Agent tax doesn't become another Datadog tax.

    One idea that we want to experiment with is that we let the model pick the next time that it runs (between bounds). So if the model has any reason of concern it runs more often, otherwise maybe once every couple of hours is enough.

arcticfox 4 hours ago

This is very cool! What is preventing the other cloud providers from being supported I wonder? is the integration not just a connection string?

lelandfe 11 hours ago

> I support multiple models from OpenAI, Anthropic, and Deepseek.

Are there risks associated with sending DB info off to these third parties?

  • fforflo 11 hours ago

    Of course, there are, but I guess this relies on the various System views (pg_*), like most monitoring tools, which have fine-grained control access if you create appropriate roles.

  • cship2 11 hours ago

    You can also self host via ollama assuming you got proper GPU. Running it on CPU can take minutes.

femiagbabiaka 10 hours ago

Interesting, might try it out at home.

Documentation asserts: > I use preset SQL commands. I will never run destructive (even potentially destructive) commands against your database.

This is enforced by taking the responsibility for generating SQL in order to evaluate state out of the hands of the LLM. The LLM simply interprets results of predetermined commands based on a set of prompts/playbooks: https://github.com/xataio/agent/blob/69329cede85d4bc920558c0...

  • esafak 5 hours ago

    I'd define the operations I allow in a composable language like Malloy and expose them to the LLM through MCP.

  • iLoveOncall 9 hours ago

    Until it hallucinates and doesn't run the command that you expect or doesn't run the command at all?

    This doesn't take anything out of the hands of the LLM.

    • femiagbabiaka 8 hours ago

      A hallucination that doesn't use a predefined command, but instead hallucinates A. random SQL queries and B. destructive queries, seems unlikely. But as a best practice the user that the agent connects to the database with should be locked down, of course.

    • tudorg 8 hours ago

      There are things that can go wrong, but not so wrong to delete your data or cause outages. So there is a level of safety that I think is important at this moment. We do want to also allow executing generated SQL, but with an approval workflow.

arjunlol 11 hours ago

Looks cool! This may actually save a lot of manual dba work for myself

iLoveOncall 9 hours ago

The title should really include the fact that it's an expert at MONITORING PostgreSQL. It's not for writing queries from natural language.

I'm extremely interested in the latter but not at all in the first.