ChatGPT Hacks a Website

Feb 16, 2024

It is not every day (thank god) that I wake up to a new paper called “LLM Agents Can Autonomously Hack Websites”. But some days I do. And my p(doom) rockets so high I have to read some Gary Marcus to calm myself down. Surely this is a super-complicated system that normal people will be unable to use, let alone build, right?

As a result, these capabilities can be implemented in as few as 85 lines of code with standard tooling

Yikes. 85 lines of code and access to OpenAI’s tools is apparently all it takes to create an autonomous website-hacking agent. Welcome to the future. They even put a cost to it, it comes to $9.81 to attempt to hack a website. A year ago that cost would have been in the hundreds or thousands if you were willing to send a stranger some money over the Internet. Six months from now it’s going to be $0.20 after OpenAI announces some more price drops.

What makes matters worse is the word “Autonomously”. Usually, the way to use GPT4 to find vulnerabilities in a website would be to ask it a bunch of questions, like what different attacks there are, and what queries should be run, then execute those instructions and go back to the model with the result. This requirement to have a human in the loop makes the whole process slow, clumsy and prone to failure (and frankly, not very sci-fi at all).

This has changed. Instead of just using GPT4 to find vulnerabilities, the authors built a simple system, which uses GPT4 at its core but can do a bunch of other things on its own, like accessing websites, storing its findings and prompting itself for ideas. Oh yeah, it can also read documents to learn about cybersecurity, so it can get better at hacking mainframes.

ive hacked into the mainframe tony stark Blank Template - Imgflip

These different “modules” (web browsing, doc reading, etc) are not trivial. To quote the paper “All components are necessary for high performance, with the success rate dropping to 13% when removing components.”.

If 13% is what GPT4 gets on its own then what is its success rate as part of such a system? 73.3%. That’s a massive increase and one which is hard to predict if you only have access to the base model. The obvious thing here is that making GPT4 smarter isn’t the only way to get better (or more dangerous) AI. If embedding it into a system can yield such strong improvements when it comes to hacking websites, who knows what else is possible? Thanks to Meta, and their open-source policies, we might find out the answer to that question the hard way. Is this how we get AGI? Will it cost less or more than $9.81?

Unfortunately for any wannabe hackers reading this, the authors didn’t release any of the specifics of the system. I don’t know what was the exact prompt they used, or what documents it read to learn about website vulnerabilities. What they do share is that the LLM had access to a Playwright-powered headless browser, a terminal and a Python code interpreter. It also used LangChain to handle things like context and memory and utilised OpenAI’s Assistant APIs.

On the bright side, we can at least look at their results:

Out of the 10 LLMs that were tested only GPT4 showed any promise to become a hacker. Even then, they had to run the model 5 times on each vulnerability to get above 70%. This is not as bad as it sounds though, because when looking for vulnerabilities you don’t care if you succeed every time, oftentimes you can retry and keep trying. You only need to succeed once to steal some valuable data.

They explain the sharp drop in success rate with the size of the models - smaller models are simply less capable than GPT4. I would love to see this redone with Gemini Ultra thrown into the mix. I wouldn’t expect it to perform any worse than GPT4.

This is what a simple attacking scenario looks like:

That’s a lot of steps to carry out in succession and given how prone GPT is to getting sidetracked it’s not surprising that it takes it 5 attempts to achieve a 70% success rate. Still, I am very impressed that it can carry out such a complex attack on its own, without a human in the loop to nudge it in the right direction or give ideas at key points.

They also give us the stats for how many function calls the GPT made for each attack:

48 calls for the Webhook XSS is incredible. And that’s the average. Imagine how mindblowing it would sound, just a year ago, if someone told you they had an autonomous system you could prompt in English, which was capable of figuring out which APIs and functions to call and then calling them 48 times in a coordinated way to exploit website vulnerabilities. And that this whole operation costs less than $10.

There is another point I want to make here. These systems were built on a limited budget by a research team. They were also allowed to only run for 10 minutes before a run was considered a failure. How much stronger can such an agent get if built by a state-backed entity with a virtually unlimited budget? How many more capabilities are waiting to be unlocked by someone who introduces a smarter architecture, still using the same base LLM?

To be clear, the exploits in this paper are in no way revolutionary. What is revolutionary is the autonomous aspect and the price point. If someone is willing to spend $1000 per month, that’s 100 attempted hacks. I guess that there are plenty of horribly secured websites on the Internet waiting to fall prey to this. How many of them are storing your passwords or your credit card info right now? Imagine a government website for some third-world country. How much security do you think it has? Not much would be my estimate. The good thing to come out of all this is that soon enough all such websites will be hacked and, as a result, average security will go up. The bad news is that this might lead to a lot of taxpayer information leaking.

One of the many theories doing the rounds on Twitter is that LLMs are not smart at all, they are just stochastic parrots, they can’t come up with anything truly original, and therefore are not dangerous at all. I think there is a risk that gets overlooked here. Even if LLMs on themselves never become as smart as humans, this says nothing of the systems that they will get embedded in. If using a GPT4-level LLM enables you to build a hacking system, what does GPT5 enable?

It feels like very soon the Internet will be crawling with intelligent entities, who will compensate for what they lack in smarts with the ability to ingest vast amounts of information, never sleep, and never get bored. A lot of websites will get hacked, and a lot of people will get scammed. And this feels like it’s just the beginning.

Valentin’s Journey

Discussion about this post