ScenarioThis article was originally published in blog:
You have a document that you need to know the provenance of, or in my case you need to find an earlier version. There are no obvious backups and you have checked the existing shadow copies (using vssadmin) and there is nothing of interest there –however you have good intel to show that the file had been modified, but unfortunately the file was binary and it is not easy or possible to do a keyword search for the older version.
Volume Shadow Copies (VSC’s) are part of the Volume Shadow Copy Service (VSS) that allows windows (Windows Vista, 7 and server 2003 and 2008) to take a snap shot of your system at various times, essentially creating a backup of the blocks of the disk (well certain areas of the disk, including some user files) that have been changed since the previous backup. VSC’s are normally created once a day and whenever any new software is installed.
Format of the Shadow files
Shadow copies are made of 16K blocks which are a combination of a header block, index blocks and interspersed data blocks.
The index block has a header section which starts with a unique signature and is followed at offset 24 by three 64 bit values which are respectively, the file offset for this index block, the absolute volume offset (AVO) of this index block and the AVO of the next index block.
AVO’s are used simply because it makes parsing the file and searching for data much quicker.
A screenshot of the start of an index block follows showing the signature highlighted in grey, followed by the logical offset of this block (in green) then the AVO of this index block (yellow) and then the AVO of the next index block (blue). The 32 bytes highlighted in teal are the first 32 byte index record.
Following the index block header at byte 128 there are 508 32 byte records each of which points to a data block within the index file. The records contain two pointers to the data within the VSC (the logical offset of the block within the VSC and the AVO of the same block) and a pointer to the original location (followed by another 8 byte value who’s purpose is more complex but for my scenario irrelevant).
The following screenshot shows the AVO of the 16K of data that has been backed up in yellow (i.e. the data in the VSC) the logical offset of the same data within the VSC (aqua) and the AVO of the original data in green. The data in grey was not used in my specific case.
Data blocks are just that, 16KB blocks of data.
Searching for data from a file of interest.
In my case I had an existing file and there were no indications that the file had been moved (defrag etc.) so I knew the physical location of the file on the disk. I knew that the file had been modified, well had intelligence to say so, the file was also binary data and I could not search for keywords from the file.
As I knew the sectors I was interested in it was easy to determine the start of the block I was after – remember data blocks in a VSC start at 16K boundaries not at a sector or even, usually, a cluster boundary. Just take your cluster number multiply by the cluster size then do an integer division by 16384 and then multiply by 16384 – this has the effect of rounding down to the nearest 16384 bytes.
All I needed to do now was to find deleted shadow files indexes, search through the 32 byte index looking at the third entry to see if one of them contained my addresses of interest. Once I ghad found an entry of interest I could then then jump to the absolute offset of the data in the shadow file and see what data had been put there.
So I wrote a program to parse through my image looking for an index block signature and once I have found an index block I followed the chain of AFO’s pointing to the next index block checking for headers and extracting the 508 x 32 byte records (16256 bytes from offset 128) to a CSV. I repeated this for each run of index files that I could find. At the end of this process I had a single file containing thousands of back-to-back 32 byte index records. While I could then have written a parser for the CSV it was a simpler process (I needed a fast result not a forensic program) to load it into Excel and search for the blocks I was interested in.
Once I found the blocks I used my favourite forensic tool to examine the data at that location, send it to my client and then sit back and feel clever.