3500 km to Holte: Tutorial: How to protect your website from hackers

Previously, we saw how hackers spend a lot of time surveying websites they want to attack, building up a detailed picture of their targets using information found in DNS records, as well as on the web and from the site itself.

This information helps hackers learn the hardware and software structure of the site, its capabilities, back-end systems and, ultimately, its vulnerabilities. It can be eye-opening to discover the detail a hacker can see about your website and its systems.

The way the internet works means that nothing can ever be entirely invisible if it's also to be publicly accessible, and anything that's publicly accessible can never be truly secure without serious investment, but there's still plenty you can do.

Now we're going to examine some of the steps you can take to ensure that any hacker worth their salt will realise early on that your web presence isn't the soft target they assumed it was, and to get them to move on.

Robot removal

Many developers leave unintentional clues as to the structure of their websites on the server itself. This tells the hacker a lot about their proficiency in web programming, and will pique their curiosity.

Many people dump files to their web server's public directory structure and simply add the offending files and directories to the site's 'robots.txt' file.

This file tells the indexing software associated with search engines which files and directories to ignore, and thereby leave out of their databases. However, by its nature this file must be globally readable, and that includes by hackers.

Not all search engines obey the 'robots.txt' file, either. If they can see a file, they index it, regardless of the owner's wishes.

fight against hackers

GIVEN UP BY GOOGLE: 'Robots.txt' files are remarkably easy to find using a Google query

To prevent information about private files falling into the wrong hands, if there's no good reason for a file or directory being on the server, it shouldn't be there in the first place.

Remove it from the server and from the 'robots.txt' file. Never have anything on your server that you're not happy to leave open to public scrutiny.

Leave false clues

However, 'robots.txt' will also give hackers pause for thought if you use it to apparently expose a few fake directories and tip them off about security systems that don't exist.

Adding an entry for an intrusion detection system, such as 'snort_data' for example, will tell a false story about your site's security capabilities. Other directory names will send hackers on a wild goose chase looking for software that isn't installed.

If your website requires users to log into accounts, ensure that they confirm their registrations by replying to an email sent to a nominated email account.

The most effective way of preventing a brute force attack against these accounts is to enforce a policy of 'three strikes and you're out' when logging in. If a user enters an incorrect password three times, they must request a new password (or a reminder of their current one), which will be sent to the email account they used to confirm their membership.

If a three strikes policy is too draconian for your tastes, or you feel that it may lead to denial of service attacks against individual users by others deliberately trying to access their accounts using three bad passwords, then it's a good idea to slow things down by not sending the user immediately back to the login page.

After a certain number of failed attempts, you could sample the time and not allow another login attempt until a certain number of minutes have passed. This will make a brute force attack very slow, if not practically impossible to mount.

Interacting with your website like a normal user will provide a hacker with a huge amount of free information about the way the site works. They will spend a long time reading the code loaded into their browser. The browser and the code (including HTML) served as part of each page is what's known as the client side of things.

For example, one common technique used to keep track of user data is to send information about the user's session (their username and so on) to the browser and expect it to be sent back. In other words, the site has the browser keep track of which user is interacting by having it announce their credentials each time it submits any information.

In times past, these credentials might have contained a whole shopping cart, meaning people could simply edit the values of cart items before pressing the checkout button, thereby managing to purchase items at rock bottom prices without the site owner realising anything was wrong.

This led to the upsurge in remote shopping carts, where the only information handled by the browser is an encrypted cookie, which is passed to a remote payment handling system such as Google Checkout or PayPal.

Perhaps worse is the use of obviously named, unencrypted variables in the URL, which are passed to a server-side script to tell it which user is interacting with it. Without appropriate checks, this can lead to serious vulnerabilities.

When I was a network security consultant, one assignment was to assess the internal security of a company's network. I found unencrypted usernames and passwords going by on the network and headed for an internal time management system with a web interface.

After using these to log in, I was dismayed to discover that the user's account number on the system was part of the URL. What happened if I incremented the account number by one? I got full read/write access to someone else's data.

Sometimes, however, variables in URLs can be exploited in benign, useful ways.

For instance, when searching for messages in a forum, you might be presented with a large number of pages and no quick way of going directly to one in the middle of the range. The URL might contain the page number or even the result number that begins the current page. Try modifying this and pressing [Enter] to see if you're taken to the page you want to access.

There are also plenty of other pieces of information that a site might expect to receive from the browser verbatim, which can be manipulated or simply read for the useful information they contain.

Many of these pieces of information are contained within hidden fields. All the hacker needs to do is edit the page's source code locally, re-read it into a browser and click the appropriate link to send it back to the server.

fight against hackers

ON SHOW: Hidden variables embedded within a web page. What might these variables do, and what would happen if one was changed?

Consider a field called 'Tries'. As part of on a login page, there's a good chance that this contains the number of login attempts the user has made. Resetting it to '1', '0', or something like '-1000' could provide the hacker with a way of bypassing a three strikes login attempt rule if the server only checks that the variable has a value above three.

Fields that hold usernames and passwords are meat and drink to keylogging and other snooping software.

Input box names

Another vulnerability involved in having the client side keep track of the user's session is a web page that uses the same names for any input boxes each time.

While it may be convenient for the site's users, who can use autocomplete for web input forms and select from previous input box values, if they wander away from their computer without locking the screen, anyone can select from these lists.

If the browser also fills in passwords, an interloper can access pretty much any site where the user has an account. Banks have started randomising the names of input boxes to prevent this problem, but most privately owned commercial websites don't.

Never ask client-side code to keep track of a user's session using unencrypted data. Instead, use an encrypted session cookie to store a session ID, and keep track of the session in a back-end database.

fight against hackers

LIMITED INPUT: Decide which inputs you will allow in an input field rather than trying toguess everything that a user may enter – deliberately or accidentally

Cross-site scripting vulnerabilities (or XSS for short) are a class of bugs that hint at how much ingenuity there is in the online security community. XSS vulnerabilities can allow malicious hackers to inject code into served web pages that in turn can steal server-side information.

An XSS attack takes the form of a malicious link to a third-party site embedded in a hyperlink. It might be sent in spam or embedded in a site itself.

This is possible because hyperlinks can contain parameters designed to pass information to the back-end server, such as the current session cookie.

It's possible to supply the value for a variable using the