Should you sanitise your HTML before or after you save to your database?

In starting a new project completely from scratch we need to decide whether to sanitise html (that is, convert & to &amp;, < to &lt;, and strip out blacklisted html) before or after we saved to database.  In projects I’d worked on in the past this had been done inconsistently in different areas of the application – ie, pre save in some areas and on select in others.  After having thought through the problem I think I’ve decided on which option I prefer: Sanitise after selecting from the database, rather than before saving to the database.

My reasons for this decision are as follows:

  1. Database collation.  Searching for strings LIKE '%Ed, Ed & Eddie%' starts to get complicated if you have converted & to &amp; – you have to remember to convert entities in your search string as well.  This could also potentially mess up the effect ORDER BY clauses in the SQL queries you write have on the dataset.
  2. Data Integrity.  If in the future I decide that I all HTML, or just specific previously blacklisted HTML tags to be allowed, the data is all still complete and certain aspects of the data have not been removed.
  3. Application Consistency.  I know that if I am ever displaying data selected from the database it will need to be ‘sanitised’ before being displayed to the user either in repopulated ‘edit’ forms or in a normal ‘view’ type screen.

I also did a little bit of research in the cakephp manual (as we will be using cakephp for this next project) to see what cakephp creators thought about this and it seems they reached the same conclusion (quoted from this page):

For sanitization against XSS its generally better to save raw HTML in database without modification and sanitize at the time of output/display.

They don’t give any specific reasons but I imagine that their thought process was similar to mine.

2 Responses to “Should you sanitise your HTML before or after you save to your database?”

  1. Tom


    25 April 2010 00:12

    In my mind it’s clear – always perform data filtering on output. Why? Because it offers more flexibilty. Sure, you may just be displaying data on a web page for now, but if you want to be more creative in future (e.g. export to Excel or PDF) you’ll likely not want special characters escaped to HTML entities and the like.

  2. Ed Yarnold


    26 April 2010 11:14

    Can’t believe I missed that point, Tom! I had that very situation a few weeks ago. A site had been built where all data was htmlentitied. We wanted to export a stock list to PDF but had to unhtmlentity everything first. Not a pretty solution, and it could have been avoided with a bit of foresightedness!

Leave a Reply

(required)

(required)


XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>