• Chrome history with Recursive Common Table Expressions

    Like many applications Google Chrome uses a SQLite database (or rather a number of SQLite databases) to store information relating to pages visited. One of these databases is the history DB which uses a set of normalised tables which, amongst other things, holds a table showing the date and time of every page visited, where appropriate the previous page (the page on which the user clicked a link/referring page) and an internal ID number of an entry in the URL table that contains the text of the page itself.

    The purpose of this article is to show how we can generate a list of the pages that a user has followed while browsing the internet, or more correctly showing the chain of webpages that led to a specific page. To achieve this we will use a feature of SQL known as Recursive Common Table Expressions (RCTE), donít worry it sounds worse than it is!

    But first letís look at a simple RCTE (from the SQLite web site) and see how it all works:

    Code: [View]
    WITH RECURSIVE cte(x) AS (
        SELECT 1
      UNION ALL
        SELECT cte.x + 1
        FROM cte
        LIMIT 10)
    SELECT *
    FROM cte
    So what does the above query do? Very simply it creates a simple table with one column (x) containing the value 1 and writes out the value, then adds 1 to this column and then writes out the value until it has written all the results from 1 to 10. i.e.:



    The essential feature of any RCTE is that there is a UNION between two tables, the first part of the UNION (SELECT 1) sets up the starting conditions of the query and the second half of the UNION ( SELECT cte.x + 1 FROM cte) refers back to the first half (this is the recursive element).

    If thatís not clear, then hopefully things will become clearer as we work through this explanation.

    But first a bit more background on the tables we are interested in.

    The first table we want to look at from the Chrome history is the visits table, this table contains a number of columns, we are just interested in the first four (the purpose of the remainder is outside of the scope of this article).



    Theses columns are:

    • Id - a unique identification number of this record (this is the tables primary key)
    • url - the id of the entry in the URLs table that contains the text of the URL visted
    • visit_time - the time of the visit encoded as a Chrome date (we will decode this later)
    • from_visit - the id in this table of the previous page.


    The keen eyed amongst you will have noted that number in the last entry in the from_visit column is sometimes the same as the id of a previous row, this is our link of pages visited and we can follow this list back until the from_visit entry is 0 which indicates the start of a viewing chain.

    • The row with id = 27585 has a from_visit = 27584
    • The row with id = 27584 has a from_visit = 27583
    • The row with id = 27583 has a from_visit = 27582
    • The row with id = 27582 has a from_visit = 0


    Clearly from an investigation standpoint it would be useful to have a report of each of the pages in a chain displayed in order, showing the route through the internet history to a given page.

    The second table is the URLs table. The URLs table contains a number of columns that would interest a forensic investigator, for this article we will concern ourselves with just the first two columns.

    • id - again the primary key and a unique identifier for this row in this table
    • url - the URL of the page visited.




    These two tables can be joined via a LEFT JOIN (see our article on JOINs) on the visits.url field to the urls.id table. When we do this and use a Chrome conversion on the visit_time column we get a results table that looks like this:



    The query that we use to create the results table above is:

    Code: [View]
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
          LEFT JOIN urls ON visits.url = urls.id
    For the sake of this article we will follow the chain of web sites back from entry ID 27585 and as mentioned above we will use a Recursive Common Table Expression (RCTE). But before we do that we will digress a little and look a little at what a Common Table Expression (CTE) actually is.

    A CTE is analogous, in simple terms, to a temporary table (or to a view), but unlike a temporary table or view it exists just for the life of the query. As discussed above the query to select the rows we are interested in, is:

    Code: [View]
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
          LEFT JOIN urls ON visits.url = urls.id
    Expressed as a Common Table Expression this would be:

    Code: [View]
    WITH cte AS (SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
          LEFT JOIN urls ON visits.url = urls.id
    )
    SELECT *
    FROM cte
    The first five lines simply set up the Common Table and the final two lines are the query, basically put, cte becomes an alias for the select expression. Another way of thinking of the above expression is the the Common Table Expression "cte" is an alias for the SELECT query and this alias can then be referred to in place of the query.

    Recursion

    As beautifully shown in the tongue in cheek dictionary definition of Recursion

    Recursion Ė def: see Recursion
    a Recursive Common Table Expression is an expression that references itself. The basic format of a Recursive CTE (RCTE) is

    Code: [View]
    WITH RECURSIVE (alias_name) AS (
    Initial-select
    UNION ALL
    Recursive-select)
    SELECT * from alias_name
    Or from the SQLite web site:



    Where the initial-select initialises the query and the recursive-select part of the UNION provides the recursive element.

    We can now work with our RCTE skeleton above and replace a section at a time to build up our final query.

    Start by giving the RCTE an alias-name, weíll call this (Imaginatively) rcte:

    Code: [View]
    WITH RECURSIVE rcte AS (
    Initial-select
    UNION ALL
    Recursive-select)
    SELECT * from rcte
    Now we need to replace the initial-select portion of the query (the first half of the UNION), as discussed before, the basic query we will be using to create the table is reproduced here:

    Code: [View]
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
          LEFT JOIN urls ON visits.url = urls.id
    We need however to take this one step further and decide which record we want to start with, as mentioned above we want ID = 27585. So the initial-select becomes:

    Code: [View]
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
         LEFT JOIN urls ON visits.url = urls.id
    WHERE visits.id = 27585
    We can copy and paste this directly into our Ďin-progressí query which now becomes as follows:

    Code: [View]
    WITH RECURSIVE rcte AS (
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
         LEFT JOIN urls ON visits.url = urls.id
    WHERE visits.id = 27585
    UNION ALL
    Recursive-select)
    SELECT * from rcte
    Now the second, recursive part of the UNION; again we use the starting query but with a slight modification, the query needs to be recursive, i.e. it needs to refer to itself (I have highlighted the changed element of the query):
    SELECT visits.id,
    visits.url,
    visits.visit_time,
    visits.from_visit,
    urls.url AS URL1
    FROM rcte
    LEFT JOIN urls ON visits.url = urls.id

    This is not quite enough though, we also need to tell the query how to follow our chain of pointers, we can do this using an INNER JOIN (see my previous article on JOINS for more information) and we JOIN the two tables on the from_visit and the id fields

    Code: [View]
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM rcte
         LEFT JOIN urls ON visits.url = urls.id
          INNER JOIN visits ON CTE.from_visit = visits.id
    Now we can copy and paste this subquery to our in-progress query:

    Code: [View]
    WITH RECURSIVE rcte AS (
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
         LEFT JOIN urls ON visits.url = urls.id
    WHERE visits.id = 27585
    UNION ALL
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM rcte
         LEFT JOIN urls ON visits.url = urls.id
          INNER JOIN visits ON rcte.from_visit = visits.id
    )
    SELECT * from rcte
    Although this query works (you can open a SQLite command window and try it Ė but make sure you use a valid visits.id) I want to make two very small changes.

    The first is easy and simply orders the results by ID. The second limits the number of iterations through the recursive loop, while not essential if you donít do this and create a query incorrectly, or the table you are querying loops back on itself then a recursive query can just hang. The final query is below with the two changes highlighted:

    Code: [View]
    WITH RECURSIVE rcte AS (
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM visits
         LEFT JOIN urls ON visits.url = urls.id
    WHERE visits.id = 27585
    UNION ALL
    SELECT visits.id,
          visits.url,
          visits.visit_time,
          visits.from_visit,
          urls.url AS URL1
        FROM rcte
         LEFT JOIN urls ON visits.url = urls.id
          INNER JOIN visits ON rcte.from_visit = visits.id
        LIMIT 100
    )
    SELECT * from rcte
    ORDER BY rcte.id
    By way of explanation, the recursive-select part of the UNION references back to the initial-select part and the query will continue through each row until rcte.from_visit does not equal visits.id. INNER JOIN will only match rows from the visits table that are also in the cte table Ė see my previous article on joins for more information.

    In English, it works like this:

    1. The initialising select runs and selects the one row from the visits table that has ID 27585. This row is inserted into our results table
    2. The recursing select uses this value and selects the rows (in our case a single row) which has a from_visit equal to the ID in step 1
    3. The row from step 2 becomes our new step 1 and the process repeats until step 2 fails to return a row, or we have been though this loop 100 times


    In The Forensic Browser for SQLite this looks as follows:



    We can of course display the Chrome date in a more meaningful format using the built in options in the Forensic Browser for SQLite and the report produced can be exported to PDF, HTML, XLSX or csv.