In my infancy as a web developer, I was often caught out by a lot of really simple problems. One of these really simple problems was slow SELECT queries on a MySQL database when using “JOIN” or “WHERE”. I was quite often baffled; the site seemed really quick when I first launched it, and I hadn’t made any code changes, so what could have changed that made the “News Listing” page so slow? The answer was staring me in the face, but it still took me a bit of time to find it. The problem? The database table had a higher number of records in it, and when JOINING or searching, the application was forcing MySQL to do what are called “full table scans”. Once I discovered this was the problem, I was amazed at the difference in performance something so simple could result in.
Before I go any further, I want to explain (or rather, quote someone else explaining) what indexing does. A really useful source of information for me when I was investing this issue, was entitled “MySQL Cookbook” – an 8 year old but incredibly useful book that I have successfully counted on each time I have hit a brick wall with MySQL.
Typically, indexing a column that you query frequently helps SELECT statements run faster because the index allows MySQL to avoid full table scans. If you update this row more frequently than you SELECT or JOIN on it, you may be better off not indexing, as every time an UPDATE or INSERT query is ran, mysql must update the indexes for those columns.
Because a join can easily cause MySQL to process large numbers of row combinations, it’s a good idea to make sure that the columns you’re comparing are indexed. Otherwise, performance can drop off quickly as table sizes increase.
Example
You have a “product” and “product_image” table. The primary key in “product” is product_id. You have a column in “product_image” called “product_id”, which “links” the product image to the product that it belongs to. product_id in both tables should be indexed!!
To give an example of the kind of speed increase this results in…
An application with this exact issue had a query which was taking on average, 8 to 10 seconds. After adding the index to the second table, that same query took only 0.02 seconds. This is a quite frankly amazing difference, and all achieved just by indexing a single column.
Conclusion: As a general rule of thumb, any column in a database called (something)_id should be “indexed”. Exceptions will probably exist (in frequent-update and insert situations, for example), but if – as is often the case with web-apps – you SELECT more often than you UPDATE or INSERT, it’s a pretty concrete rule that you can stick to.