First things first. Google it up. Doing so I found very useful article "How to find missing values in a sequence with SQL". Very inspiring. The idea is to check the list against itself only shifted by one. I tried the following query:
SELECT DISTINCT m1.id + 1I had to use DISTINCT as mytable is just a helper table for many-to-many relationship and same id may appear many times. The query worked straightaway and was quite quick as I had and index on id field. Here are the basic information about the data:
FROM mytable as m1
LEFT JOIN mytable as m2 ON m1.id + 1 = m2.id
WHERE m2.id IS NULL;
- no of records: 3360
- no of unique ids: 228
That is not bad. Although I thought I would try a sub-query as well, even though they are usually discouraged as being too slow. I have re-written the query slightly:
SELECT DISTINCT id +1The query above took 0.014s to execute on average.
FROM mytable
WHERE id + 1 NOT IN (SELECT DISTINCT id FROM mytable);
That means sub-query was faster than the join! I do not think it is a big surprise, though. Given the id column does not held unique values, and there are around 15 occurrences of each id, apparently it takes longer to create a join on all the values than to create a temporary table for the sub-query.
Concerns
- Biggest missing value returned
It is always max(id) + 1, so unless you need it, you might want to deal with it differently. - Lowest missing value returned
The query does not recognize a gap between 0 and the lowest value in the sequence. Therefore the lowest missing value would be always bigger than the minimum.
Another issue is that it does not seem to capture consecutive missing numbers...
ReplyDeleteThank you, just what I needed: find the first missing id
ReplyDelete