Prompt injections through calendar invites

Let's continue our journey into prompt injection and see how an agent can be tricked into doing what it's not supposed to do with a simple calendar invite.

I wrote a bit about getting your agent to do things it’s not suppose to in the previous post Restrict agent access to a minimum, let’s take this a bit further and see how an agent, that is not publicly accessible, could also be attacked. To set the playfield we are assuming the following:

  • you are using an AI Agent in a chat
  • the agent has access to read calendar entries (see Restrict agent access to a minimum on how to prevent this)
  • the agent has access to send emails

The Agent is just an internal agent, what could possibly go wrong here?

this is fine meme, dog drinking coffee surrounded by fire in burning house.

Just because it’s internal, it’s not immune to external prompt injections

This is very important to remember! You are, in this case at least, receiving external data that could contain prompt injections from places that you might not have expected. The Chat Interface is not the only path where data can reach the agent. This has happened to the big ones like OpenAI, Anthropic, Google … they have leaked the systemprompt which have been collected on GitHub which means this is much more common as you might think.

Our example Workflow

Here is our example workflow, it can read calendars, create new appointments and also send emails. It’s a purely internal chat agent, no public access to the chat interface.

AI Agent with chat interface, and access to calendar and email.

With this knowledge of the workflow, and a bit of creativity, you can think of ways how data can reach the agent. We have to inputs into the agent. Yes, two. One is the chat, the second one is the Calendar read. But how can the calendar read be a public accessible thing?

Calendar invites 🤯

You can of course add a prompt injection into the calendar invite. The beauty of this is that nobody needs to even accept this invite. Just send it form your calendar and if the agent will read this invite as is, it will process it and do as told.

screenshot of a calendar invite with a prompt injection.

So now you have your internal only chat, which got an externally crafted calendar invite and you ask your agent about todays schedule.

chat from use with the agent that lies about the standup meeting.

As you can see, it lied about the standup meeting, told you that you are the recipient of the email and by this point it’s already to late to undo, your appointments have leaked. Now imagine if the Agent also has access to sensitive files. It’s already a GDPR nightmare if you have leaked the names, email addresses and phone number from your contacts, but if you have employee data or other data containing sensitive personal data, you’re looking forward for lots of fun with the data protection agency of your country.

How to prevent this

Well this one to be hones prevented itself because it can only create draft emails and will not send out the emails. But an approach here would be

  • to split up the agent, does the same agent really need to have access to both calendar and email?
  • put the calendar reading into a sub workflow (as demonstrated in Restrict agent access to a minimum) and add a Guardrail BEFORE sending the data from the calendar back to the agent.

This would look like this:

Create the subworkflow with a guardrail

Again just a simple sub workflow that fetches the calendar entries but passes them through a guardrail that is set to check for a jailbreak.

a sub workflow with a guardrail.

Then update the main workflow to call this subworkflow instead of direct access to the calendar:

adapted workflow not having access to read directly from the calendar

And now ask it once more:

demo chat of user asking for the schedule

While this might seem as an error at first, looking at the log will show that the guardrail did what it was supposed to:

log of ai agent showing that the guardrail was triggered

Takeaways

I can only repeat what I have said in Restrict agent access to a minimum to restrict the agent access, also add Guardrail and remember that even this won’t be bullet proof. But definitely don’t let your agent send mails without a human in the loop!

Photo by Rudi Endresen on Unsplash

essential