Initialization vectors: Finding TikTok messages in Android

Friday, November 9, 2018

Finding TikTok messages in Android

Short version

The Android TikTok app keeps message related data in SQLite databases located in the following location:
userdata/data/com.zhiliaoapp.musically/databases/
The database containing user data, both the local user and friends, is named db_im.xx.
The database containing the messages is named in the following regex format: ([0-9]{19})(_im.db)$ where the filename is 19 character numeric sequence ending in the _im.db extension.

Queries that can be used as templates to extract messages from the database can be found at:
github.com/abrignoni/DFIR-SQL-Query-Repo/tree/master/Android/TIKTOK


In order to view the public TikTok profiles of the users found in the db_im.xx table add the user name to the end of the following URL:
https://m.tiktok.com/h5/share/usr/(insert username number from DB).html
For one of the test accounts used in this blog post the URL looks like this:
https://m.tiktok.com/h5/share/usr/6619791930123403269.html 
Multiple XML app files can be located at:
userdata/data/com.zhiliaoapp.musically/shared_prefs
Some of the app related info contained within the XML files includes:

  • Total traffic
  • Collect traffic time
  • Recent search history 
  • Mobile traffic 
  • Language
  • Region
  • First open time
  • App install time
  • Last update time
  • Mac address
  • Last wifi bssid
  • Last time check bssid
The previous are just a few examples of the type of content the XML stores. Additional user info can be found in the aweme_user.xml file.

Videos created with the app can be found in three files. One contains the video, another the audio for it and a third one combines both. The files are located at:
userdata/data/com.zhiliaoapp.musically/files/
The filenames are a dash separated timestamp followed by a numeric sequence that ends with -concat-v for the video and -concat-a for the audio. A sample audio filename would be something like this: 2018-11-03-210557702-mix-concat-a.

For the combined video and audio file the filename will follow the previous format with the addition of the synthetise_ prefix. For example: synthetise_2018-11-03-210556218-concat-v.


Long version

The Android TikTok app is one of the more popular apps in the Google Play store with over 100,000,000 downloads. The app is used to create short videos where the user can easily edit the sounds, visuals, and share them in within the social media environment it provides.

This app is ridiculously popular with teens.
 As a most social media platforms the app provides a way for user to send messages to each other.


Testing platform

Details of hardware and software used can be found here.

Extraction and processing

Magnet Forensics Acquire - Full physical extraction. 
Autopsy 4.8.0
DB Browser for SQLite

Analysis

The TikTok app directory structure looks as follows:

Usual Android app file structure.
As stated at the beginning of the post the main messaging content SQLite database is named by the following pattern  ([0-9]{19})(_im.db)$. The 19 character number at the start of the file name is the same as the logged in user of the app. The messages table does not contain the actual user names, that information resides in a second table called db_im.xx. The table name for the message is appropriately called msg.

The following image shows some of the more relevant fields in the msg table:

JSON, we meet again!!!!
As expected the creation time of the messages is unix epoch and the actual test content is in JSON format. The extract messages query at the top of the blog uses the json_extract function to separate the relevant JSON into its own database response columns.

It is of not that some of the messages have the read_status value in the last column set to zero. This means that the message did not reached the server. In my test those messages were sent before the target account had followed account initiating the message. The local info column contained relevant information that will help the analyst understand the reason for a read_status of zero. Again in this instance the local info message read as "This person hasn't followed you yet and may not be able to receive your messages."

Local_info column value. Long hand for sorry can't do that Dave.
Next is the user data table contents named SIMPLE_USER in the db_im.xx database.

Notice the user id, nickname and unique_id values. The avatar thumb url, in JSON format, is there as well. The SQL query for messages joins both the messages and user data tables to present a unified result for all messages sent and received. As always it calculates the time from unix epoch to local time and extract all the relevant JSON to its own result columns. For the query to work one of the tables needs to be attached so the query can have access to both databases and the necessary tables from each.

The next image show a portion of the message extract messages query results. See how if the user responds with a GIF the URL and display name are extracted from the JSON and given their own columns. These URLs are accessible without authentication.

Love it when a plan comes together.
One useful trick is to take the UID values and insert them into a specific URL in the following manner:
https://m.tiktok.com/h5/share/usr/(insert username number from DB).html
 If the account is not private you will be able to see the shared content. Here is an example:

This is the account.

The app also maintains the shared content as described in the first section of the blog post as well as multiple XML files that can be of use to the analyst. In my sample data set there were 77 XML files. Going through them here is beyond scope but it is highly recommended to take the time and understand their content.

Next post will be about finding TikTok messages in iOS. When those queries are completed they will be added to the DFRI SQL Query Repo as well.

As always I can be reached on twitter @alexisbrignoni and email 4n6[at]abrignoni[dot]com.