In Part IV of the Website Hacking series, we are going to look at:
- Storing your email address and telephone number in <a href=mailto:*> and <a href=”tel:*> and the inherent drawbacks of these methods
- Shortcomings of disguising email in markup to avoid spam and other malicious requests (disguise such as mail [at] mail [dot] com)
- Pitfalls of CDNs (Content Delivery Networks)
- A widely used code snippet that is subject to XSS attacks
- Relying on the HTML markup for important data for the application (such as product prices)
- A loose security mechanism in an Australian governmental website.
Avoid giving sensitive information in a plainly visible way in the HTML markup
We all know of the <a href=mailto:email@example.com>Mail Me!</a>. However, giving out your email like this makes it pretty easy for bots to filter out your email and place it in a database/file, whatever they desire, making you subject to spam and other malicious attacks.
To illustrate, I have created a sample script that gets all results for “mailto:” and <a href=”tel: xxx-xxx-xxxx”> from https://meanpath.com/ and stores it in a file or displays it to the browser. This script is just a sample and assumes that all results are on a single page. Furthermore, MeanPath shows only 100 rows from the results unless you pay to get all of them.
Regular expressions are used to filter only those results that contain valid email addresses shown directly in the anchor tag. And for the telephone mining, it just gets the phones that are in the xxx-xxx-xxxx format. The code also ensures that no duplicate emails/telephones are entered into the list of the data.
Figure 1: A view of some of the collected emails and telephone numbers.
A better way to mine such data would be through http://www.nerdydata.com.
Figure 2: First part of the code. It creates a MeanPath class with a function called mine_elements() which gets all results from MeanPath and stores it in an array. The other function filter_elements() filters only the elements that match the query that was given in the instantiation of the MeanPath object and also makes sure there are no duplicate entries.
Figure 3: Second part of the code. The function display_data() shows the data in a ordered list in the browser. The function save_data_to_file() saves it on a random file, given when calling the function. Lastly, the MeanPath class is instantiated and data saved to a file and displayed in the browser.
Thus, it should be evident by now that giving personal information should involve some safeguards. Of course, this is not always necessary.
I also have to say that “encoding” the email in a format such as “sample [at] sample [dot] com” or “sample [at] sample . com” does not make it any more secure.
Here we have a short snippet of code. There is an HTML paragraph with an email given in it in that format and a PHP code that gets the file and extracts the email with a silly regular expression that extracts it and saves it into an array. The regular expression checks for any number of characters followed by [at] or @, followed by any number of characters after which there are some of the top-level domains.
Here is the browser result of the search:
matches is the full expression that matched our search, matches is the first parenthesized part of it that matched, and matches is the second parenthesized part of the regex, and so on.
Use CDNs but be aware of security implications
CDNs (Content Delivery Networks) are a great way to decrease page load time (both because they often provide the script in numerous countries and load the one that’s closer to the user) and because users may have already cached the script by visiting another site which uses that particular CDN. However, there are security risks in that you have no control over what is stored in the loaded file. If the CDN gets compromised, the code in the file you are loading may change, and that can lead to more than just cookies being stolen. Also, the script loaded from the CDN can become unavailable, temporarily or not, leading to a frustrating user experience. If the file was on your domain and your site went down, the users would know there is a problem with the site, however if a CDN file such as jQuery gets unavailable and you are relying on it heavily – they would not know what is happening – the site would be up but it would look completely out of whack.
First off, the attacker can change the source code of the delivered script, and:
- Replace the code to frustrate users, redirect your site to another one, steal users’ cookies and load any kind of exploit code he wants.
Figure 5: Here is how the page looks in Internet Explorer 7.0 on Windows XP (Emulated from browserstack).
As we can see, the expression ‘do’ gets evaluated and the cookie shown. We can do much more than this, but we will leave that to your imagination.
Figure 6: The same page in a contemporary browser (latest version of Chrome)
Another thing that can be done with control over the CSS is the following:
Figure 8: Using a CSS CDN – possibilities
Now, not only does a probable CDN have the ability to remove all of your content from being displayed to users, but it can add custom text to it, possibly acting as a fake redirect message and scaring/confusing your users enough to make them never come back to your site.
Only two CSS selectors are necessary to make such a change to your page, and the code does not assume any level of knowledge of the HTML markup in the page:
Then in the file we executed the following:
And in the PHP script we had this:
This worked as the picture below shows:
Now the attacker has much more power over both the server and the website.
You can see one of the previous articles of the series that concerned PHP Injection and learn some of the things that can be achieved from now on.
XSS is everywhere
Another thing to be on the lookout for is XSS. It comes in many forms, but here we will give one example that is sometimes unknown for developers. In PHP $_SERVER[‘PHP_SELF’] is vulnerable to cross-site scripting. Often, it is used as part of a <form> action attribute. It should always be escaped with something like <form method=”GET” action=”<?php echo htmlspecialchars($_SERVER[“PHP_SELF”]); ?>”>.
Figure 9: An example of a page vulnerable to XSS. Do not pay attention to the poor nesting.
What the <form> action does is just echo the URL that the user is currently in. This means that he can easily manipulate the URL to include code and the browser will execute it. He can also give links with manipulated URLs of the site to third-users and get something from them. All he has to do is close the action attribute and close the <form> tag. Thus, he has to type “> and insert HTML code after that in the address bar.
Figure 10: XSS example
Setting important data for an application in the markup
To illustrate, I have taken some screenshots from a search in nerdydata.com for the keyword data-price=”
Figure 12: data-price”75″ and a data-tid=”51127″ (the ID of the product is also set in the HTML markup)
Figure 13: close to 14,000 websites with price set in the HTML markup. At least a couple of them will be vulnerable to user manipulation of the price.
I do not think that I am wrong by saying that at least a couple of sites from those 14,000 will not have proper mechanism set up on the server side to check for the product’s price by checking the product ID or name and getting the prices from a database.
I also want to show this way of storing the price:
Figure 14: <input type=”hidden” name=”price” value=”59″>, <input type=”hidden” name=”item_id” value=”…”>
Figure 15: 7087 websites with hidden input containing the price of the product.
Mo’ Captcha, Please
Figure 16: Site of Australian government…
In the picture we see a page from the site of the Australian Customs and Border Protection Service which does not have any of the fancy CAPTCHAs (I agree that they may prevent also legitimate users from using the site, but it is more positive than negative).
The security mechanism in the abovementioned site seems static; the bold numbers do not change. In case they changed:
I have created a snippet which fetches the numbers, considering they weren’t static, and this was their security verification method – showing and making you type different numbers in bold text next to other characters, all of which are fully visible in the HTML markup.
Figure 17: This snippet fetches the page from their site and displays it in an unseen div so I can traverse the data with jQuery. The jQuery code then loops for each <strong> tag and adds its text to a variable, separating different strong tags by whitespace on both sides. Then a regex is called, which matches only numbers followed by whitespace on both sides and converts the resulting array to a string. Finally, all whitespace is removed from the mined numbers and their values are shown in the console and in an alert.