Sunday, July 5, 2009

Find Missing Value in a Sequence in SQLite

Sometimes it is important to find gaps in sequenced values in your table, for instance id. How do you do it in SQLite? How do you do it in an efficient manner?

First things first. Google it up. Doing so I found very useful article "
How to find missing values in a sequence with SQL". Very inspiring. The idea is to check the list against itself only shifted by one. I tried the following query:
SELECT DISTINCT m1.id + 1
FROM mytable as m1
LEFT JOIN mytable as m2 ON m1.id + 1 = m2.id
WHERE m2.id IS NULL;
I had to use DISTINCT as mytable is just a helper table for many-to-many relationship and same id may appear many times. The query worked straightaway and was quite quick as I had and index on id field. Here are the basic information about the data:
  • no of records: 3360
  • no of unique ids: 228
The query above took 0.030s to execute on average.

That is not bad. Although I thought I would try a sub-query as well, even though they are usually discouraged as being too slow. I have re-written the query slightly:
SELECT DISTINCT id +1
FROM mytable
WHERE id + 1 NOT IN (SELECT DISTINCT id FROM mytable);
The query above took 0.014s to execute on average.

That means sub-query was faster than the join! I do not think it is a big surprise, though. Given the id column does not held unique values, and there are around 15 occurrences of each id, apparently it takes longer to create a join on all the values than to create a temporary table for the sub-query.

Concerns
  • Biggest missing value returned
    It is always max(id) + 1, so unless you need it, you might want to deal with it differently.
  • Lowest missing value returned
    The query does not recognize a gap between 0 and the lowest value in the sequence. Therefore the lowest missing value would be always bigger than the minimum.

1 comment:

  1. Another issue is that it does not seem to capture consecutive missing numbers...

    ReplyDelete