We all know that SQL can be a harsh mistress, and relational theorists such as Chris Date have long argued that the language is fundamentally flawed and that vendors have been misapplying relational theory from the start.
Now a Perl developer on perlmonks.org has been reading Date's book and finds that it explains all of his frustrations with databases. One tip for addressing their shortcomings is to add the handy DISTINCT keyword to every single query, because stupid old SQL doesn't automatically apply the degree of uniqueness you might have in mind:
In fact, one of the founders of relational theory, C.J. Date, recommends that every SQL SELECT statement be written as SELECT DISTINCT ... Unfortunately, many folks get upset with this for a variety of reasons. First, DBMS do a horrible job of optimizing DISTINCT queries. There are many cases where the DISTINCT can be safely ignored but in practice, using DISTINCT with large sets of data and complicated queries can have an abysmal impact on performance. I think this can be put into the "premature optimization" category, though. Until we know that DISTINCT is causing an impact, isn't it better to improve the chance of getting correct results and worry about performance afterwards?
I had to read that a couple of times to realise that adding a DISTINCT to every single query in the hope that it might mask some unknown deficiency in your model, your query, or the SQL language itself is not the "premature optimization" being referred to here - he means the idea that doing so might affect performance. After all, it might not, right?