On the deep (uncurable?) vulnerability of MCPs

Published on July 19, 2025 2:50 AM GMT

My background: researcher in AI security.

This recent study demonstrates how a common AI-assisted developer setup can be exploited with prompt injection to leak private info. Practically speaking, AI coding tools are almost certainly going to stay, and the setup described in the study (Cursor + MCP tools with dev permissions) is probably used by millions as of today. The concept of prompt injection is not new. Yet, it’s interesting to see such a common software dev setup being so fragile. The software dev scenario is one of those use cases that depend on the LLMs knowing which parts of the text are instructions and which parts are data. I think it’s necessary for AI tools to condition follow-up actions on the meanings of the data read, but different parts of the context should be tagged with “permissions”.

A potential solution to this is to embed the concept of “entities with variable permissions” during training (the RL step). Current foundation models are trained as chatbots, where the input is from a single end user, whose instructions need to carry a lot of weight for the AI to be a helpful assistant. Yet people use these instruction-following chatbots to process unsanitized data.

Any suggestions on post training solutions for this?

Discuss

Leave a Comment Cancel Reply