Initialization vectors: Finding Slack app messages in Android and using json_extract to do it.

Sunday, September 30, 2018

Finding Slack app messages in Android and using json_extract to do it.

Short version:

The Slack app for Android keeps message related data in the TCJRXQD1B database located in the following directory:
  • /userdata/data/com.Slack/databases/
    • Notice the database name. Due to its curious name I am not sure if it is the same across all Slack installations or unique per device. Will confirm soon with further testing.
    • Edit on 01/27/2019: Database name changes. Each Slack group will have their own unique database name. The name follows the same alphanumeric format and character length.
    • Edit on 01/27/2019: Scripts below and test database can be found at:
      https://github.com/abrignoni/DFIR-SQL-Query-Repo/tree/master/Android/SLACK
The following query on the TCJRXQD1B database was utilized to extract messages and selected associated data:
SELECT   messaging_channel_counts.CHANNEL_TYPE,
  messaging_channels.NAME_OR_USER,
  users.NAME,
  users.PROFILE_REAL_NAME,
  messages.USER_ID,
  messages.CHANNEL_ID,
  messages.TS as TS_Original,
  DateTime(messages.TS, 'unixepoch', 'localtime') as TS_Conv,
  messages.MESSAGE_BLOB,
      json_extract(MESSAGE_BLOB, '$.text') as TEXTinJSON,
      json_extract(MESSAGE_BLOB, '$.files[0].filetype') as FILE_TYPE,
      json_extract(MESSAGE_BLOB, '$.files[0].name') as FILE_NAME,
      json_extract(MESSAGE_BLOB, '$.files[0].created') as FILE_CREATED,
      DateTime(json_extract(MESSAGE_BLOB, '$.files[0].created'), 'unixepoch', 'localtime') as FILE_CREATED_CONV,
      json_extract(MESSAGE_BLOB, '$.files[0].public_url_shared') as PUBLIC_URL_SHARED,
      json_extract(MESSAGE_BLOB, '$.files[0].permalink') as PERMALINK,
      json_extract(MESSAGE_BLOB, '$.files[0].id') as FILE_ID
FROM users, messages, messaging_channel_counts, messaging_channels
  WHERE messages.USER_ID = users.ID and messaging_channel_counts.id = messages.channel_id
        and messaging_channels.msg_channel_id = messages.channel_id
  ORDER BY messages.ts
The following query on the TCJRXQD1B database was used to identify shared files and content that might possibly reside in the database itself. Not all shared content is kept there.
SELECT  files.ID,
  files.CONTENT,
  files.CONTENT_HTML,
  files.FILE_BLOB,
    json_extract(file_blob, '$.created') as FILE_CREATED,
DateTime(json_extract(file_blob, '$.created'), 'unixepoch', 'localtime') as FILE_CREATED_CONV,
json_extract(file_blob, '$.url') as URL,
json_extract(file_blob, '$.name') as NAME,
json_extract(file_blob, '$.filetype') as FILE_TYPE
  FROM FILES

Long Version:

Last month Jessica Hyde, while being interviewed by the Rally Security folks, mentioned the need to share information due to the vast amount of topics available for research in the digital forensics field. She asked, as an example, "where are the write-ups to parse Slack?"



This post will be the start of a series looking into where Slack stores message data and how to extract it on all available platforms.

What is Slack?

Slack is a fully featured messaging and collaboration service designed for corporate workspaces. The latest Android Slack client has over 10 million downloads.

In Dr. Evil voice: 10 million downloads
With such a large and growing user base it is only a matter of time until our case work will include recovering Slack messages. As always I've tried to research this topic using open source tools.

Testing platform

Details of hardware and software used can be found here.

Extraction and processing

Magnet Forensics Acquire - Full physical extraction. 
Autopsy 4.8.0
DB Browser for SQLite

Analysis

The Slack data folders can be found here:
/userdata/data/com.Slack

In the Beatles voice: All the lonely folders. Where do they all come from.
The Slack messages are in the databases folder:
/userdata/data/com.Slack/databases/
In my exemplar phone the main SQLite database, the one that contains the messages, is named TCJRXQD1B. I find this name to be really peculiar and I wonder if it is unique to my Slack installation. This is something I will follow up on soon. The schema of this database is as follows:

So many tables.
For my testing I created a free (as in no cost to use) Slack workspace. On this workspace I exchange a few messages both in the general (a type of public group) channel as well as direct messages (DMs). Within these I took and sent a picture, uploaded a picture, created a post, sent a piece of python code, and sent some emojis.

The main table is appropriately named as messages. The most relevant content in this table is located in the MESSAGE_BLOB field. The content is in JSON format and it has the user generated text message data as well as metadata of shared files.

In the past I have expressed my belief in the increasing importance of being able to know some scripting in order to custom parse JSON and XML formats when these are placed one inside another. See here. For example an XML file whose tags have JSON as content where this JSON has more JSON or XML further in it. Thankfully in this case the MESSAGE_BLOB field  only contains various levels of JSON and nothing more. This fact made it really easy to access the JSON content and have it be shown as another field in a database table.

Json_extract

The moment I saw that the MESSAGE_BLOB field had only JSON in it I remembered a tweet sent to me by Paul Sanderson.

The book introduced me to the json_extract functionality in SQLite. It made extracting the JSON data a snap and presented it as another field in my SELECT query. For an in depth explanation on how it works and to see some usage examples go here.

The following images are the result of the first query shown at the beginning of the blog post. I divided them in two since it was impossible to screenshot all the fields with content at once. The first image are the first fields in the query that relate to the users and messages. These have been sorted by time; timestamps have been converted to local time as well.

message_blob is JSON!!!
All the fields at the right of the MESSAGE_BLOB are fields extracted from it using json_extract. These are TEXTinJSON and all of the following fields in the next image.

Messages and metadata of attached files.
These fields are metadata about the images and files shared within a message. This is how a user would have seen them:

Emoji, text, image, code and post.
A comparison of the last two images tells us that the shared content metadata (file creation time, filename, type, etc...) is contained in the JSON file we just extracted. Where to these shared files reside?

It depends

Shared files can be in a few places:

  • In the following app directory: /userdata/data/com.Slack/cache/
    • Files located in this folder are files, like images, that were taken with the device itself.
  • In the image_manager_disk_cache folder after message sync in the following directory: /userdata/data/com.Slack/cache/image_manager_disk_cache/
    • The cache engine used by Slack for images is called Glide. More here.
    • I know of no way to reverse the hashed filename of the images back to their original components, one of these being the original filename itself. More here. This means there is no way of ascertaining for sure which images in this folder match up with a particular  message and file metadata in the extracted JSON or files table.
    • The only way to ascertain which files go with what is to go to the url in the permalink field as stated in the messages query. To do this authentication to the Slack work space is required for access.
  • In the files table.
    • The files table has a reference for all shared items. It can also contain the actual file data for posts and shared code. The second query at the top of the blog post can be used to locate these types of shared content.
The following image shows the result of executing the second query. It used json_extract as well. 

Files
Some things of note from the previews image:
  • ID = Used to match the file content with the message itself.
  • Content = Contained all of the sample python snippet of code I shared.
  • Content_html = Contained all of the post content I shared.
  • File_blob = Used json_extract to generated the rest of the fields from it.
  • File_Created = Self explanatory.
  • URL = When this field contains data it can reference a file created by the device itself. Notice how the contained data path references the local file system when that happens.
  • Name = Self explanatory.
  • File_typer = Self explanatory.
Conclusion

There are additional Slack artifacts not covered in this write-up. Be aware that my queries extracted from the JSON only the content I found to be relevant. You might find that in your case work some of these other artifacts might be of importance. Change the query as needed and do validate, validate, validate.

Last but not least:

In order to not miss evidence of possible deleted records do use a forensic SQLite tool to explore WAL records.

As always I can be reached on twitter @alexisbrignoni and email 4n6[at]abrignoni[dot]com.