Initialization vectors: Initial thoughts on Android 10 parsing

Saturday, February 15, 2020

Initial thoughts on Android 10 parsing

When Josh Hickman (@josh_hickman1) told me he was working on creating an Android 10 full file system image as part of his testing images series I was stoked. After suggesting some apps to test he diligently worked on it and made the image public for all to use. Go get it here. Before I continue I want to thank Josh for putting this work out and to express how useful it is to everybody. Thank you!

After running the image by two commercial digital forensic tools I noted a few things.
  • When parsing the image with commercial DFIR you will see 99% of what you expect to see. This is good and speaks to the maturity of Android as an operating system and the responsiveness of vendors in this space. Still, as expected, a new OS version will break artifact parsers third party apps and native files. It is our job to figure out where the known but now lost items are as well as finding new artifacts we weren't aware of. This is how toolmakers can focus effectively on what needs to be done, it is us doing the work and telling them what's important to us.  For example chat messages from Discord and TikTok seem to be missing even though they are there. In the case of TikTok the old database query to extract chats still works. SQL queries can be found here.
  • One example of a native OS file changing format is the UsageStats files. These keep track of application usage. It is similar to KnowledgeC database entries in iOS. For details see here. Traditionally these UsageStats files where XML formatted. With Android 10 they are now protobuf encoded.All credit goes to Yogesh Khatri since he did all the heavy research work on it. His blog post is required reading. It can be found here. Not only did he identify the change in format he also updated my old UsageStats XML parser to make it protobuf encoded capable. The script can be found here. These protobuf encoded files were not decoded by the digital forensic tools. As said before, this is not a bash on digital forensic tool developers. It is a call to action to the community to test, discover, and help focus our development efforts on the artifacts we need and deem to be relevant.
  • It is rare. Haven't seen it happen on a case yet but never assume you never will. Multiple user accounts on an Android device. Artifacts left behind by a second account seem to be missing or come out jumbled together after parsing. For example if the examiner looks at app data she might find that in one case a parsed report for a database might show data for both accounts while in another artifact the data available is from the currently active account only. It is important that we identify the presence of multiple user accounts on the device and take steps to validate our output accordingly. A quick check for multiple user accounts can be done by looking at the contents of the /data/user_de/ directory. If you see another folder other than folder 0 then you have multiple user account on the device.
Multiple user accounts. Usually account #2 is 10 but who knows why it went to 11.


As an example of how tool design might affect report output I will show how my own UsageStats parser script comingles in one report the data from the two Android user accounts on the device.

After extracting the UsageStats directory the script is run.



Notice it processed 11099 records from the files.
Next I separately processed the data from each user directory. To do so I processed the usagestats directory with either directory 0 or directory 11 present.

Directory 0 and directory 11.
Data processed from user directory 0:


Records processed number went down to 8796.
Now user directory 11:


Records processed number is 2303.
Even without looking at the contents of each report we can determine which account was used the most easily. This insight would have been lost if the data was shown all together in one report.

As examiners we own the data we are tasked with processing and it is our responsibility to verify that any inferences gathered from it are exact and backed up by the source. We are uniquely positioned to identify gaps in knowledge, to work in filling them up, and sharing that knowledge with others that can automate the process to the benefit of the greater community of practitioners. If you feel bored while working in this field you are definitely not paying attention. Your perspective is needed, your expertise is essential. Make it known.