• WAL timelining Correlated subquery

    Code:
    SELECT * FROM messages
    WHERE sfcmd5 NOT IN 
      (SELECT sfcmd5 FROM messages WHERE sfIsLive = 'True') 
    GROUP BY sfcmd5
    A correlated subquery is a subquery (a query nested inside another query) that uses values from the outer query.

    Within the Forensic Browser for SQLite, records that are marked as "True" in the sfIsLive column are those records that the user would see if they used SQLite to investigate the database, or in other words, if you are looking at a messaging app it is the messages that are not deleted.

    If there sfIsLive is "False" it does not mean that the record is deleted (and when journal or WALs have been processed usually doesn't). False indicates "not live" and this includes copies of a live record in duplicate pages (i.e. journals), copies of a live record in unallocated/unused space, corrupt or partially recovered records and of course it includes deleted records. It is down to us to determine what is deleted.

    That said if we can identify unique records that do not appear in the list of live records based on the MD5 hash of all of the fields (not including the fields that the Browser adds) we can determine for ourselves which appear to be deleted.

    In summary there are four sets of records:
    1. Live records
    2. Copies of Live records
    3. Deleted records
    4. Copies of deleted records

    Copies of live records have the same MD5 hash as the actual live records so we just need a query that returns a list of the MD5s for the live records and then a further query that all the records that don't have one of those MD5s.

    Getting a list of MD5s of live records is simply

    Code:
    SELECT sfcmd5 FROM messages WHERE sfIsLive = 'True'
    This becomes the subquery part of the correlated subquery and the main query now becoomes:

    Code:
    SELECT * FROM messages
    WHERE sfcmd5 NOT IN 
      (SELECT sfcmd5 FROM messages WHERE sfIsLive = 'True')
    This simply says in English, select all of the records that don't have an MD5 matching one of those in the list of live records.

    The final task is to deduplicate the deleted records :

    To do this we just add a GROUP BY clause to just show one instance of each identical row:

    Code:
    SELECT * FROM messages
    WHERE sfcmd5 NOT IN 
      (SELECT sfcmd5 FROM messages WHERE sfIsLive = 'True') 
    GROUP BY sfcmd5