Initialization vectors: September 2018

Sunday, September 30, 2018

Finding Slack app messages in Android and using json_extract to do it.

Short version:

The Slack app for Android keeps message related data in the TCJRXQD1B database located in the following directory:
  • /userdata/data/com.Slack/databases/
    • Notice the database name. Due to its curious name I am not sure if it is the same across all Slack installations or unique per device. Will confirm soon with further testing.
    • Edit on 01/27/2019: Database name changes. Each Slack group will have their own unique database name. The name follows the same alphanumeric format and character length.
    • Edit on 01/27/2019: Scripts below and test database can be found at:
      https://github.com/abrignoni/DFIR-SQL-Query-Repo/tree/master/Android/SLACK
The following query on the TCJRXQD1B database was utilized to extract messages and selected associated data:
SELECT   messaging_channel_counts.CHANNEL_TYPE,
  messaging_channels.NAME_OR_USER,
  users.NAME,
  users.PROFILE_REAL_NAME,
  messages.USER_ID,
  messages.CHANNEL_ID,
  messages.TS as TS_Original,
  DateTime(messages.TS, 'unixepoch', 'localtime') as TS_Conv,
  messages.MESSAGE_BLOB,
      json_extract(MESSAGE_BLOB, '$.text') as TEXTinJSON,
      json_extract(MESSAGE_BLOB, '$.files[0].filetype') as FILE_TYPE,
      json_extract(MESSAGE_BLOB, '$.files[0].name') as FILE_NAME,
      json_extract(MESSAGE_BLOB, '$.files[0].created') as FILE_CREATED,
      DateTime(json_extract(MESSAGE_BLOB, '$.files[0].created'), 'unixepoch', 'localtime') as FILE_CREATED_CONV,
      json_extract(MESSAGE_BLOB, '$.files[0].public_url_shared') as PUBLIC_URL_SHARED,
      json_extract(MESSAGE_BLOB, '$.files[0].permalink') as PERMALINK,
      json_extract(MESSAGE_BLOB, '$.files[0].id') as FILE_ID
FROM users, messages, messaging_channel_counts, messaging_channels
  WHERE messages.USER_ID = users.ID and messaging_channel_counts.id = messages.channel_id
        and messaging_channels.msg_channel_id = messages.channel_id
  ORDER BY messages.ts
The following query on the TCJRXQD1B database was used to identify shared files and content that might possibly reside in the database itself. Not all shared content is kept there.
SELECT  files.ID,
  files.CONTENT,
  files.CONTENT_HTML,
  files.FILE_BLOB,
    json_extract(file_blob, '$.created') as FILE_CREATED,
DateTime(json_extract(file_blob, '$.created'), 'unixepoch', 'localtime') as FILE_CREATED_CONV,
json_extract(file_blob, '$.url') as URL,
json_extract(file_blob, '$.name') as NAME,
json_extract(file_blob, '$.filetype') as FILE_TYPE
  FROM FILES

Long Version:

Last month Jessica Hyde, while being interviewed by the Rally Security folks, mentioned the need to share information due to the vast amount of topics available for research in the digital forensics field. She asked, as an example, "where are the write-ups to parse Slack?"



This post will be the start of a series looking into where Slack stores message data and how to extract it on all available platforms.

What is Slack?

Slack is a fully featured messaging and collaboration service designed for corporate workspaces. The latest Android Slack client has over 10 million downloads.

In Dr. Evil voice: 10 million downloads
With such a large and growing user base it is only a matter of time until our case work will include recovering Slack messages. As always I've tried to research this topic using open source tools.

Testing platform

Details of hardware and software used can be found here.

Extraction and processing

Magnet Forensics Acquire - Full physical extraction. 
Autopsy 4.8.0
DB Browser for SQLite

Analysis

The Slack data folders can be found here:
/userdata/data/com.Slack

In the Beatles voice: All the lonely folders. Where do they all come from.
The Slack messages are in the databases folder:
/userdata/data/com.Slack/databases/
In my exemplar phone the main SQLite database, the one that contains the messages, is named TCJRXQD1B. I find this name to be really peculiar and I wonder if it is unique to my Slack installation. This is something I will follow up on soon. The schema of this database is as follows:

So many tables.
For my testing I created a free (as in no cost to use) Slack workspace. On this workspace I exchange a few messages both in the general (a type of public group) channel as well as direct messages (DMs). Within these I took and sent a picture, uploaded a picture, created a post, sent a piece of python code, and sent some emojis.

The main table is appropriately named as messages. The most relevant content in this table is located in the MESSAGE_BLOB field. The content is in JSON format and it has the user generated text message data as well as metadata of shared files.

In the past I have expressed my belief in the increasing importance of being able to know some scripting in order to custom parse JSON and XML formats when these are placed one inside another. See here. For example an XML file whose tags have JSON as content where this JSON has more JSON or XML further in it. Thankfully in this case the MESSAGE_BLOB field  only contains various levels of JSON and nothing more. This fact made it really easy to access the JSON content and have it be shown as another field in a database table.

Json_extract

The moment I saw that the MESSAGE_BLOB field had only JSON in it I remembered a tweet sent to me by Paul Sanderson.

The book introduced me to the json_extract functionality in SQLite. It made extracting the JSON data a snap and presented it as another field in my SELECT query. For an in depth explanation on how it works and to see some usage examples go here.

The following images are the result of the first query shown at the beginning of the blog post. I divided them in two since it was impossible to screenshot all the fields with content at once. The first image are the first fields in the query that relate to the users and messages. These have been sorted by time; timestamps have been converted to local time as well.

message_blob is JSON!!!
All the fields at the right of the MESSAGE_BLOB are fields extracted from it using json_extract. These are TEXTinJSON and all of the following fields in the next image.

Messages and metadata of attached files.
These fields are metadata about the images and files shared within a message. This is how a user would have seen them:

Emoji, text, image, code and post.
A comparison of the last two images tells us that the shared content metadata (file creation time, filename, type, etc...) is contained in the JSON file we just extracted. Where to these shared files reside?

It depends

Shared files can be in a few places:

  • In the following app directory: /userdata/data/com.Slack/cache/
    • Files located in this folder are files, like images, that were taken with the device itself.
  • In the image_manager_disk_cache folder after message sync in the following directory: /userdata/data/com.Slack/cache/image_manager_disk_cache/
    • The cache engine used by Slack for images is called Glide. More here.
    • I know of no way to reverse the hashed filename of the images back to their original components, one of these being the original filename itself. More here. This means there is no way of ascertaining for sure which images in this folder match up with a particular  message and file metadata in the extracted JSON or files table.
    • The only way to ascertain which files go with what is to go to the url in the permalink field as stated in the messages query. To do this authentication to the Slack work space is required for access.
  • In the files table.
    • The files table has a reference for all shared items. It can also contain the actual file data for posts and shared code. The second query at the top of the blog post can be used to locate these types of shared content.
The following image shows the result of executing the second query. It used json_extract as well. 

Files
Some things of note from the previews image:
  • ID = Used to match the file content with the message itself.
  • Content = Contained all of the sample python snippet of code I shared.
  • Content_html = Contained all of the post content I shared.
  • File_blob = Used json_extract to generated the rest of the fields from it.
  • File_Created = Self explanatory.
  • URL = When this field contains data it can reference a file created by the device itself. Notice how the contained data path references the local file system when that happens.
  • Name = Self explanatory.
  • File_typer = Self explanatory.
Conclusion

There are additional Slack artifacts not covered in this write-up. Be aware that my queries extracted from the JSON only the content I found to be relevant. You might find that in your case work some of these other artifacts might be of importance. Change the query as needed and do validate, validate, validate.

Last but not least:

In order to not miss evidence of possible deleted records do use a forensic SQLite tool to explore WAL records.

As always I can be reached on twitter @alexisbrignoni and email 4n6[at]abrignoni[dot]com.

Sunday, September 23, 2018

It is our responsibility: Supporting DFIR researchers and content creators.

Putting out free Digital Forensics and Incident Response (DFIR) content, be it blogs or tweets, is a time consuming endeavor for sure. Even my blog and its basic DFIR content, which I work on my free time, takes me hours to just finish one post. This fact made think of those who create more frequent and extensive heavyweight content. I have lost count on how many times I have used information from the DFIR community in my case work. How can we thank these fine folks for the work they put in for us and most of the time for free? How can we motivate them to continue doing so?

How to say 'we got your back' to DFIR content creators.

1.     Say thank you.

Was something you learned useful? Tell the person so through a retweet, direct message or email. You might not be allowed to tell them how it helped you specifically in a case but most folks appreciate the knowledge that something they worked on and shared had an impact even if specifics might not be available. That being said if you can provide some details do it. There might be more to what you found useful that the person you are thanking might be able to tell you that you wouldn't have known otherwise if you hadn't reached out.

2.     Share and promote the work.

By sharing and promoting not only gives the content more impact and distribution but it might motivate others to also share their own work. For an example see here.

3.     Support their Patreon if they have one.

Time is not free. Hosting a website is not free. Equipment to record audio and/or video is not free. Editing software and DFIR tools are not free. Thanks to Patreon and other similar sites we can directly encourage and support these folks monetarily. Here are some DFIR content creators I support:

https://www.patreon.com/DFIRtraining

DFIR website by Brett Shavers and it contains a wealth of information. Being a Patreon supporter comes with awesome educational perks. Depending on your level of support you will have access to videos and even DFIR course content.

https://www.patreon.com/thisweekin4n6

If you don't read Phill's weekly blog post and hear his monthly DFIR roundup, who are you? I think Phill has a time slowing machine that enables him to listen, read and then do a summary of all the latest happening in the DFIR space.

https://www.patreon.com/rallysecurity

Rally Security is an Information Security (Infosec) focused Twitch channel. Great group of known experts that discuss the latest in Infosec as well as interviewing interesting people in the field. The Twitch channel keeps a copy of each show if you missed one.

https://www.patreon.com/13cubed

Richard Davis creates great videos that showcase different DFIR techniques. Being able to see how something is done helps immensely. As an example check out his video on SRUM here.

4.    Buy their books but also review them.

Recently I have made the point of buying the books of people that are active in DFIR twitter/blogs. Great decision. Most of the time tweets and blog posts do not do justice to the research or work these individuals put in. When they sit down and write a book about it one can really benefit from all the details that do not fit in social media. Was the book useful? Leave a comment in Amazon or in your own twitter. Think others will benefit from knowing your take on the book? Make a review and post it on your blog. I am currently reading Harlan Carvey's new book titled Investigating Windows Systems and hope to have a review done soon.

Conclusion

A free resource is never actually free. People put their time, knowledge and even their hearts out there because the have a passion for this field. It is our ethical responsibility as beneficiaries of such efforts to encourage and support as many of these folks as we can, and if possible, become one of them too.