There are a number of factors that can affect or void the confidentiality of a web browsing session. Notable items include not using a secure channel (HTTPS) or using an ill-configured HTTPS endpoint, Cross-Site Scripting that can be used to intentionally transmit private information, session fixation, and crossbreed attacks such as CRIME (presupposes some session control as well as eavesdropping), and others.
While the HTTP Referer (sic) header has its perks, predominantly within the realm of web usage metrics, those are offset by the security risks introduced by the mechanism.
Disabling the Referer header altogether in your browser does not tend to introduce any noticeable disruption of service from any half-decently programmed website.
While a small minority of contemporary web applications may have imbued Referer header values into their logic, – and require them to function properly –, the majority of these headers are created only to show up in Google Analytics reports and server log files, to generate redirection pages, or to be immediately forgotten upon transmission.
From the user’s perspective, it’s a rather useless affair.
Privacy proponents opt for ubiquitous use of HTTPS, even for small websites with only static content, so as to obscure the user’s navigation to anyone who has no business of knowing, and while I advocate this as well, from a confidentiality standpoint it becomes rather meaningless to do so if no active attempt is made to control at which points in the user’s navigational path which Referer data is conferred to whom.
Granted, Referer headers leak only the location of the present page, but these URLs more often than not summarize a page’s content in normal, human language, rather than indecipherable, robotic gibberish. Serving URLs comprised of irreversible, randomized strings to serve as page identifiers is technically quite feasible (to the server, the automaton, it makes no difference), and incidentally you see these around, but this approach kind of defeats the URL’s common purpose, ie., to enable humans to memorize and recognize page identifiers, to pair human semiotics to proper identification by automatons.
So far I’ve discussed the “passive interception of metadata”, but there’s more. URLs often include variables in the form of query parameters. Query parameters exposed to third parties usually don’t pose a security risk (beyond conferring more metadata), since a proper web application will respond with some variant of 403 Forbidden if a user without proper credentials (in web applications typically stored in client-side cookies) attempts to load a page by its correct path and query parameters. That is to say, if a certain user is authorized to view a certain file by retrieving http://www.website.com/?file_id=123, then that exact same URL should yield some kind of ‘forbidden’ or ‘not authorized’ page for any user who isn’t authorized. Most mature websites that I have reviewed adhere to this logic and Referer leaking of the URL has no further consequences.
However, if URLs contain secret values or tokens that are intended to be consumed only by the single legitimate recipient, – the user browsing the page –, and have the property of being a ticket to an otherwise forbidden action, ie., if they act as ‘keys’ to something, then location leaks through the Referer header may completely void the security intent of such a mechanism.
CSRF tokens come to mind; it is the purpose of a CSRF token to prove that only the legitimate user can execute an authorized action. For CSRF tokens that are transferred through POST request to the server, this generally works as expected. However, CSRF requests passed in a GET request are consequently reflected in the URL. Since the Referer header is a reflection of the current URL, the GET CSRF tokens are automatically siphoned off to third parties whose assets are embedded in the current page, or when the user clicks a link to an external website. For instance: clicking http://www.website.com/? csrf_token=9udNpEow8I86, where the query component constitutes a GET-based CSRF token for website.com, the token will be exposed outwardly to any external domain that website.com retrieves its assets from. Luckily, this particular construct isn’t seen very often either.
A new type of attack
Another pattern that is very common these days, with the emergence of websites that offer users a method to internally (ie., on the same website or platform) share files with one another, is this.
Consider a website that allows file transfers/sharing between its users. These files are private insofar as only the intended recipient or recipients should be able to view a file that another person uploaded, and other people shouldn’t. This might be a social networking site, a modern multi-user chat application, a dedicated cloud service, a web site’s customer support or so on. Very often, the application’s logic is built like this:
- User X wants to share a file with user Y. User X uploads the file and the application takes note of the desired permissions (only X and Y are allowed to see it).
- User Y sees that X sent a file and performs some action in order to view it.
- The application prepares by putting the file on cloud storage, for example Amazon S3. This procedure yields a special, secret URL on the domain of the cloud service provider. This URL is given to user Y.
- User Y’s browser retrieves the URL.
Sometimes, the file is marked as an attachment and this will show the user an ‘open/save as’ dialog window.
However, often the file is displayed in the browser if the browser has a handler for the file’s mime-type. In Firefox and Chrome, PDF files are rendered in-browser by default, using these browsers’s built-in PDF.js ( https://github.com/mozilla/ ).
PDF files bear a semblance to regular web content (HTML) insofar as they generally contain human-readable text, and clickable hyperlinks to other resources. This is the culprit of the vulnerability I am describing; clicking hyperlinks means Referer stamping, and Referers reflect the current URL, which in this case should stay secret.
If the PDF document that user Y is viewing contains a hyperlink to http://www.website.com, and user Y decides to visit this link, http://www.website.com retrieves the secret URL through the Referer header. Website.com can now download the private document.
In some of the cases I’ve analyzed, the ‘secret URL’ times out after a period of time, eg. 15 minutes. This at least forestalls external websites queuing an extended backlog of potentially secret files by perusing their server log files for URLs once their administrators become aware of this vulnerability.
With the popular configuration of cloud storage (temporary or otherwise), these secret URLs (hitherto deemed safe), the PDF viewer built into the browsers, and the ancient Referer header, there is a significant likelihood that access to private documents is now gratuitously strewn around the web. I don’t have any real empirical data that I can use to gauge the actual incidence of leaking, let alone the active exploitation of it. The whole process presupposes a human action to be performed (clicking a link in a document) that cannot be coerced by any technical means, but only by the user’s choice or through social engineering. This action is however a logical thing to do when viewing an electronic document (hyperlinks are inviting to be clicked) and as for the incidence of the actual vulnerability; I have found it in several fairly large websites (hundreds of thousands to millions of users), so there’s some reason to be vigilant over this issue.
Incidence of Referer leaks
As an exploratory effort to estimate to which extent websites rely on external websites to load assets from, I accessed each of Alexa’s top 50 websites’ front page using an actual browser (as opposed to using curl or wget) and intercepted the totality of HTTP/HTTPS traffic between both endpoints using the excellent mitmproxy and a custom script to launch a fresh browser session for each website.
Using a small Python script I then parsed the traffic log for each website visited and extracted each and every Referer header from the complete set of HTTP(S) retrievals along with the full URL from where the request was initiated. From the resultant set of pairs (from, to) I purged all combinations wherein the source URL domain was equal to the Referer domain, ie., requests to http://www.google.com originating from http://www.google.com were removed.
From the 50 websites accessed, 41 contacted another domain with the Referer header set. In some cases it was obvious that the external domain was part of the company’s CDN (for instance, yahoo.com leverages *.yimg.com to serve almost all of its static content), thereby constituting a company-wide closed circuit of content delivery, and I didn’t correct for these, but access to a range of servers used for advertisements and analytics is visible as well.
Click here to view the data.
Preventing Referer stamping can be achieved using several methods.
First off, HTTPS only leaks Referer data to other HTTPS data. This at least limits the exposure to other, specific websites rather than saturating public WiFi networks and other untrusted infrastructures with it.
On the hyperlink level, adding rel=’noreferrer’ to an tag will prevent referer stamping for that hyperlink. On the document level, you include <meta rel=’noreferrer’> in the document’s head.
On the protocol level (content type agnostic), setting the referrer attribute of the Content-Security-Policy to no-referrer will achieve this.
Another option is to include the following header in the server’s response: Content-Disposition: attachment. Any sane browser will then prompt the user to either open or save the file to disk; viewing the file in-browser is thereby bypassed. However, this may incur a trade-off between security and usability.
As for content sharing patterns that involve a cloud service, the latter two options should suffice. If you have extended control over the serving platform, traditional solutions such as cookie-based authentication should be used in lieu of or in addition to secret URLs.
Note that if you’re not explicitly taking care of Referer stamping, then file names can still leak outwards.
One of digital security’s special characteristics is that the challenge it poses does not spring from rigidly delineated rules such as in a game, but rather from the ability to locate and intercept uncalled for “features” that bleed from between the cracks of the patchwork of disparate technologies. It’s this that lends digital security as a profession or pastime such vibrancy. But an inability to systematically assert a system’s security, when the danger is unknown or is hiding in plain sight, is also what instills frailty into defenses.
It is my personal opinion that, however eclectic a vulnerability may be, and regardless of to what extent its remediation incurs a penalty for its usability, as the proprietor of a website, the onus is on you to ensure the discretion of whatever is deemed private by your users, and at the end of the day you will be held responsible for the moral and commercial ramifications of deciding to shrug it off as an inevitable consequence of the wayward workings of technology.
One company that I contacted in regard to this vulnerability, with several million of users and whose name I shall not disclose, decided not to fix it while acknowledging the problem, stating that its current convenient usability should not suffer from a fix. Under circumstances like these, it might be sensible to heed the old literary adage “kill your darlings” and apply it to your software writing.