Initialization vectors: JSON-ception and the need for mobile DFIR scripting courses

Wednesday, August 1, 2018

JSON-ception and the need for mobile DFIR scripting courses

In my previous post I reviewed the Microsoft Translator (MT) app for Android. Usually I do reviews for apps that I have either worked with in the past or apps that are not supported by major forensic tools. In my limited experience it seems that recently many more apps are using XML/JSON data stores as opposed to the typical SQLite databases structures we are used to and that take a big chunk of training time on mobile forensics courses.

In my particular analysis of the MT app I found that the relevant data was stored in an XML file whose 'string' tags contained JSON which they themselves reference further JSON structures with some ending in a list. Jeez.

while true: list.append(JSON)
The data in my sample was limited to 4 OCR and 4 spoken phrase translations. It was small enough that it was  easy to copy the relevant JSON from the XML files and copy it onto a separate file that I then convert it into HTML. It goes without saying that such output wasn't ready made for a report and further copy-pasting would be needed to do that. It was obvious this would not be scalable with a large data set in the target XML file.

As I was thinking about this Jessica Hyde made a nice comment on my reuse of a JSON to HTML script I had put together. This spurred the following exchange:

Key items in large font.
Is the future of mobile forensics all JSON all the time? Might be. I hadn't even considered IoT (and thank goodness for Jessica's upcoming book on IoT forensics.) Are we as examiners getting ready for it? As trainers and matter subject experts, are we thinking of incorporating scripting and parsing  how-to blocks into the training materials we develop?

With this in mind I tried coding a simple parser in Python to process this particular piece of incepted XML/JSON. The script parses the relevant JSON values within the XML and places them in a SQLite database. The database has two tables, one for the OCR content and another for the spoken phrases. The reason I decided to use SQLite as the end product of the script is that I like time formatting via SQL query as seen here


It takes data like this that looks like this

And this is just a highlighted portion.
and turns it into the following two tables in the database:

Table 1: OCR

Notice translated text at the right.

Table 2: Phrases

Clean layout.

One does not have to be the ultimate programmer to achieve positive results on a case specific tasking. It is true that my script needs further refinement and proper error handling needs to be added in some sections but for the purpose of getting the pertinent data out for my review it works. I know this because as an examiner I took the time to understand the data store formats, I analyzed the content for relevance, and I verified that my script output adequately represents the content of the data store in question. At the end of the day validation is king in all we do. 

My takeaway from this is that as DFIR instructors and examiners we need to focus more on foundational skills rather than just third-party tool usage. The first makes the second work to full capacity.

As time goes by and more apps depend on API returned JSON data the teaching on how to parse it will be as important as instructing folks how to join SQLite tables. With IoT possibly even more. 

As always I can be reached on twitter @alexisbrignoni and email 4n6[at]abrignoni[dot]com.

PD:
In regards to Jessica's astute observation on using native tool support to view targeted data sets here is an example on how to do so in an Android emulator.