Cross Site Scripting and letting the framework deal with it
Rob Conery asks how the MS MVC platform should handle XSS attacks. In general, I feel that frameworks should do their best to ensure that to be secure by default. This means that I feel that by default, you should encode everything that comes from the user to the app. People seems to think that encoding inbound data will litter your DB with encoded text that isn’t searchable and consumable by other applications.
That may be the case, but consider, what exactly is getting encoded? Assuming that this is not a field that require rich text editing, what are we likely to have there?
Text, normal text, text that can roundtrip through HTML encoding without modifications.
HTML style text in most of those form fields are actually rare. And if you need to have some form of control over it, you can always handle the decoding yourself. Safe by default is a good approach. In fact, I have a project that uses just this approach, and it is working wonderfully well.
Another approach for that would be to make outputting HTML encoded strings very easy. In fact, it should be so easy that it would be the default approach for output strings.
Here, the <%= %> syntax fails. It translate directly to Response.Write(), which means that you have to take an extra step to get secured output. I would suggest changing, for MS MVC, the output of <%= %> to output HTML encoded strings, and provide a secondary way to output raw text to the user.
In MonoRail, Damien Guard has been responsible for pushing us in this direction. He had pointed out several places where MonoRail was not secure by default. As a direct result of Damien's suggestions, Brail has gotten the !{post.Author} syntax, which does HTML encoding. This is now considered the best practice for output data, as well as my own default approach.
Due to backward comparability reasons, I kept the following syntax valid: ${post.Author}, mainly because it is useful for doing things like output HTML directly, such as in: ${Form.HiddenField("user.id")}. For the same reason, we cannot automatically encode everything by default, which is controversial, but very useful.
Regardless, having a very easy way ( !{post.Author} ) to do things in a secure fashion is a plus. I would strongly suggest that the MS MVC team would do the same. Not a "best practice", not "suggested usage", simply force it by default (and allow easy way out when needed).
Comments
I agree with the latter part of your post completely. I couldn't quite understand what you were saying about encoding input data. I've always been of the mindset that you html decode input data and html encode user-generated output; otherwise you'll double encode input.
People use the & entity more than I would've thought.
To address the <%= %> issue you could maybe build a custom ExpressionBuilder that would encode your output. <%$ Encode:myText %> I wonder if that would work.
Let's say you have a text field for company name.
User enters: AT&T
Data is encoded in database as AT&T
User wants to edit so we encode on output and we have
<input type="text" value="AT∓T" /> which looks like: AT&T (double-encoding :-O)
Now database is stored as AT∓T UGH!
This is why I feel that encoding should be done on output only.
P.S. I hope the example comes out right since there's no preview.
Secure by default is an approach I tend to disagree with (though I like 'easily securable').
I see two problems with it.
The first one is that framework should not do too much assumptions about my code. The ASP.NET likes to assume that < and > are dangerous, and it is extremelly annoying for users -- since they are not trusted to write a > b, and extremelly annoying for me, since I have to remember to turn it off. I had similar problem in PHP, where it required me to dig at least two arcane variables to turn off the automatic escaping of singe quotes.
Another assumption is that I only do html or prefer html. I want to see ASP.NET view engine as a content-type-independent engine, and I always saw <%= %> as 'here be it'. Engine should not be unusable if I use text/plain instead of text/html and should not require workarounds to do it.
The second problem is that secure by default gives a developer a belief in automatic security. But it is much easier for me to show someone that they should think about security when I can do it by SQL injection in their login form or by implanting alerts in their comment pages. But when all simple bases are automatically covered, it is too easy to forget about more advanced steps that could cause the same breaches.
Sorry, that should be AT&T
But you get the point.
Matei,
That is possible, it is just six times as awkward as simple <%= %>
Andrey,
1/ I am using ASP.Net in web world for 99.9% of the time. All considerations for that should take this into account. I have no issue in making it harder to use if for other stuff is I get better results for the web.
2/ Saying that secure by default makes is "too easy to forget about more advanced steps that could cause the same breaches" means that you are now putting all the burden on the developer. They will be so busy implementing the simple layer of security that they will never have time to the more advance scenarios.
Kiliman,
Good point, and I agree. But I would rather have "dirty input" problem rather than security issues.
Encoding on input is totally unnecessary and causes confusion and problems trying to understand what to output.
It's not just HTML but " ' <> etc. A lot of people have ' in their name and if people have written HTML 4 style entities such as <input value='<%=Something%>'> then something can easily break out without the encoding.
[)amien
I agree with the proposition. How often do you have the opportunity to pick the safer approach by design....
@matei
It would be simpler to add an extension method:
public static string H(this Control control, string text){
return HttpUtility.HtmlEncode(text);
}
Then your pages/ascx: (provided the namespace was included in your pages or web.config)
<%= H( unsafeHere ) %>
typeo...
XSS is cross-site scripting, not cross-side. :)
But my security is already alerted to look for cross eyed scripts!
Brain,
Thanks, fixed.
This is where I have no arguments but still would never agree. I can not think of a place in ASP.NET basic markup that is specifically made for HTML. I may like to generate CSS, I may even like to generate JS. They are Web as well, just with different mime types.
No, because I am saying that simple security should be also simple to implement and be the default best practice aproach (but not framework-enforced default). Like using SqlCommand arguments in ADO.NET is an easy best practice to prevent SQL Injection (which may occasionally teach developer on what the SQL Injection is).
It is not too hard to write <%= H( unsafeHere ) %> each time you want to escape something, isn't it?
You have worked with framework that likes to think for you and hide simple details -- that's ASP.NET Web Forms.
Andrey,
Implementing simple security may be simple to implement, but it is _t_e_d_i_o_u_s_.
Passing parameters to the DB is a good example of very tedious coding.
Ok, it seems my conceptual problem is with levels of abstraction. I do not have to pass paramaters by hand, I would definitely pass parameters NHibernate or my own NIH framework. But the lowest level (ADO.NET, ASP.NET markup) should be as simple as a blackboard -- I do not want to configure and mess with it, I want to build on it.
I would love it if the default behaviour of <%= ... %> was to HTML-encode the result, since that's (a) safe and (b) what you want 90% of the time. Why should the default be to do a dangerous and unusual thing?
Here's a quick mechanism for changing the <%= ... %> behaviour yourself: http://blog.codeville.net/2007/12/19/aspnet-mvc-prevent-xss-with-automatic-html-encoding/
The nice thing about the above method is that the developer doesn't even need to choose whether or not to encode the output the vast majority of times - the decision is made in terms of the parameter passed. Normal strings are encoded, but the output from HTML helper methods isn't.
Html-encoding on input is a an example of a badly-leaked abstraction that creates a variety of downstream problems. The leak itself is that you've allowed the fact that your resulting presentation is html-based to leak into the data captured in your database. If your presentation technology requires that application data be encoded for display, encode it only at the point of displaying it.
Comment preview