There are many articles that cover PHP vulnerabilities, but I've run across a lot of programmers and code that seems oblivious to them. When interviewing programmers, I look for an understanding of these types of vulnerabilities, and how to prevent their programs from being vulnerable to them.
Aside from register globals issues, most of these attacks are not specific to PHP.
Register Globals issues
From early on, the developers of PHP had this great idea: accept any parameters passed from the browser, and automatically turn them into variables available in the code. Well, it turned out to not be such a great idea--it meant that improperly initialized variables could be seeded by attackers to potentially do all sorts of damage. Worse, sometime after PHP 5 came out, someone figured out that you could pass a particular variable that would load and execute any PHP file before running the actual code--and this file could be on a completely different server, in a regular PHP installation.
Most other web languages never offered this convenience--you have to retrieve parameters from a browser through a specific module or array. PHP now provides arrays like $_GET, $_POST, $_REQUEST that are simple to use, but make it so you need to specifically request the variable you want from your code.
Any code that depends on register_globals being set is completely broken, as far as I'm concerned. If it's on a server with an older version of PHP, it's just waiting to get cracked. Any developer that relies on registered globals is programming for 10 years ago, and needs some serious education.
SQL Injection vulnerabilities
This is the next most serious issue, and it affects pretty much all web languages, not just PHP. The most common way to interact with a database is to use a language called "structured query language" (SQL) to select rows of data from the database, update data, insert new data, or delete things. Once you learn the basic syntax and structure, it's very easy to use. The problem is, you nearly always depend upon the user to identify what data to retrieve, or to provide the data to add or change.
Once again, we can't ever trust data from the user. Most databases accept more than one query at a time, and most information used to select rows in a database is wrapped in single quotes:
SELECT first_name, last_name, salary FROM employees WHERE first_name LIKE 'John';
Beginner programmers drop the variable containing the search from the browser into the query, wrapping it in single quotes: LIKE '$firstname';
Attackers simply put a single quote in the field, and then add another SQL command to do something malicious. Like delete the entire database.
Now, when you know there might be a quote in the variable, you can escape it by adding a backslash in front of it. PHP actually does this for you automatically if you have an evil setting called magic_quotes_gpc turned on. That's why you often see a lot of backslashes in forums, blog comments, etc by the way. But there are ways of getting around that, as well.
At a minimum, all variables used in a query should be escaped using a function known to handle all possibilities, usually those provided for the specific database engine. What I look for in code is someone using a database abstraction layer or interface that allows for parameterized queries: instead of putting the variables directly in the query, you create a query with placeholders (usually a question mark, ?) where variables are to be substituted, and then pass an array of variables. The abstraction layer handles all of the escaping for you, and you end up with much cleaner code.
We use PEAR::DB as a database abstraction layer in most of our projects. Others include ADODB, or PEAR::MDB. PHP5 provides a mysqli interface capable of this, as well. If I see a mysql_query command in general application code, it gets marked way down in my book.
Mail Header Injection
Many programmers don't realize it's not safe to use the PHP mail() function without special protection. I didn't believe this was a vulnerable function until one of our clients got attacked with it. Basically, the mail() function on a Linux system is a wrapper to the system sendmail command. Sendmail takes a plain text email, looks for a To, CC, and BCC addresses, and sends the message on its way. The problem is, attackers can inject fake headers into the message that basically hijacks your server to send spam. Any field that ends up in the header of a message--to, from, subject, or any other arbitrary header you collect could be used for this purpose.
I haven't tested to or subject recently--there may be some built-in protection for these fields now. But to set the from address of a message, you pass it in an array or a string to the "header" parameter of mail(). This is ripe for exploit. All the attacker has to do is insert a newline, and then they can supply their own bcc field with hundreds of email addresses to spam. PHP and the sendmail binary will happily spew your attacker's message to hundreds of users at a time. The next thing you know, your server will get on a blacklist for spamming, and nobody on that server will be able to send mail to domains like AOL or Comcast and other places that actively reject mail from known spammers.
Some kind soul posted a function to filter headers and ignore anything after a newline character to the comments section of the PHP documentation for the mail() function (the PHP documentation, and the comments, are a fantastic resource, and one of our favorite features of PHP). We have a simple safe_mail function that runs all the headers through this function, which also makes for a convenient way to intercept mail on a test environment.
This one isn't talked about that much, but a programmer that protects a mail function properly is an indication of an experienced PHP developer.
Cross-site scripting (XSS)
Cross-site scripting is the current favorite exploit of attackers. Unlike the other attacks, they're not attacking your site directly but exploiting it to attack your visitors. Of course, if your visitors have access to an administrative interface on the site, they could then use this to attack your site.
The real problem is that cross-site scripting is a great way to spread spyware, and so many sites are vulnerable to it. MySpace was long a victim of XSS. Ebay, too. Basically any site that allows users to add content that is shown to other users is vulnerable to XSS, unless the application developer has taken specific measures to prevent this. In this age of social networking, that is a huge number of sites.
If they can load an arbitrary script of their choosing, they can view anything on that page and watch anything the visitor types into that window. That's expected, defined behavior, and that's not going to change. So at a minimum, they can get passwords to your site and from there, they can do anything on your site that an attacked user can do.
But they don't start there. Both Internet Explorer and Firefox have contained vulnerabilities that allow an attacker to escape the sandbox of that browser window to be able to monitor other windows, or even at worst install malicious software on the user's computer. That is how spyware is spread. And once they have their own malicious software installed on your computer, they own it--they can monitor every mouse movement and keystroke, they can use it to send spam or attack other computers or do whatever they want.
Cross-site scripting is diabolical. It doesn't usually harm your site, because attackers don't want you to know you're carrying their malware. Application developers ignore these issues to the peril of the entire Internet...
Web applications differ from most other applications in that they are considered "stateless". That is, the server does not know the state of anything the user is doing, and starts in exactly the same condition for every request. In most applications, however, you are working through some sort of process and what you do next depends on the action you take when you're in a particular state. What actions you have available to you depends upon the state of the object you're working with.
For example, if you're working with a user object, it might have several states: "unconfirmed", "logged in", "not logged in", "suspended". For users that are suspended, the application would prevent access to private data. For users who are unconfirmed, the application might offer to resend a confirmation link. For users who are logged in, the application would provide access to appropriate parts.
In a web application, it's up to the programmer to define these states and handle them appropriately--PHP has no internal concept of state at all. Every request coming into your application must do all the work of loading the appropriate objects, defining what state they're in, and doing whatever action is necessary.
PHP and other languages do however provide a mechanism for keeping track of users, with something called a session. PHP basically provides an automatic mechanism for storing variables associated with a user session on the server, instead of the browser. Since as we know well by now, you can't trust anything coming from a browser, a session is a much safer place to store critical data to help you determine the state of your application and not have to reconstruct it completely on every page. It's especially used for logins.
The problem is, sessions can be hijacked. PHP and other languages use a cookie to store a simple unique identifier for the session in the browser, which the browser helpfully returns on every request. If the browser has been compromised (by a cross-site scripting attack, or spyware, etc) an attacker can read these cookies and pass somebody else's session identifier into your application, and if you don't protect against this, hijack the original user's session.
That takes some effort, however. Much more of the problem is when a user turns cookies off. Back in the late 1990s/early 2000s, many users got completely paranoid that cookies identified them wherever they went on the Internet, and many applications help users manage their cookies. So this general paranoia about cookies actually makes the situation worse, because if the user turns off cookies, your application either needs to force them to reauthenticate, or allow the browser to pass their session identifier through another means.
PHP has yet another configuration parameter to automatically allow session ids to be passed via a GET request instead of a cookie. The problem is, when this is done, the session identifier becomes part of the URL in the browser address bar. Users then bookmark their session id, post it to their blog or a forum, do whatever with it they want. And if your application is not written to handle this, other completely innocent users may find themselves logged into your application under a hijacked session id!
Applications using sessions must use some other source to verify that the session corresponds to the right user. In some cases, it may be enough to just require cookies and not allow session identifiers to come through any other vector. In others, programmers may need to consider using http authentication or other methods to verify that they have the right user.
Session hijacking is one of the toughest vulnerabilities to manage, if you need to protect any sensitive data. Even if you don't, the application should deal appropriately with accidental session hijacking, because it's very common and easy for users to do.
The list doesn't stop there, but those are the serious mistakes I see, sometimes on a weekly basis. It's hard to write secure code, but starting with security as a mindset goes a long way towards preventing problems down the road.
To summarize, here are some general tips to keeping applications safe from these types of attacks. If I'm interviewing you for a programmer position, I will be asking you about these:
- Never trust input from the browser.
- Turn off register_globals, but always assume it's on and protect your variables anyway.
- Use a database abstraction layer, and parameterized queries.
- Be extra careful with database statements that cannot be parameterized.
- Never trust input from the browser.
- Use wrapper functions to add extra protection to common functions like mail().
- Be extremely careful with sessions that are used to authenticate users.
- Provide an appropriate level of protection for private data.
Any other vulnerability types you care about, when writing or reviewing web application code?