A Begginers All Inclusive Guide to ETW

ETW

Working within an IR provider, you often have to make do with what logs you can find. 99% of the time, ETW goes unnoticed and forgotten. Even though it might be the best artifact on the box.

Recently I got asked to do an all inclusive guide for ETW. I figured seeing it’s a super useful artifact and often goes unused or used the wrong way, I figured I’d post the guide here too. So what are we going to talk about?

  • What are they?

  • Where do they come from?

  • Where do they go?

  • How are they useful?

  • Can we create our own?

  • Why are ELAM drivers relevant?

  • How do you handle the flood gate of events?

  • When should ETW be used?

I’m also going to cover three different ways you can access ETW, including two built in methods and one using the Windows API. I’ll also cover how to find historic ETW events, in the event that you’re responding to a compromise where the actor has already left the network.


What is ETW?

Event Tracing for Windows (ETW) is an efficient kernel-level tracing facility…
— docs.microsoft.com - About Event Tracing

At the core of it, ETW is a more verbose version of Windows Event Logs (EVTX). A lot of Windows Event Logs actually come from ETW providers. The big difference is ETW does not log by default, it needs to be enabled before it will start reporting events.

ETW is one of the best ways for Kernel modules to provide user-mode readable logs, but it is also not limited to Kernel mode modules, User mode modules can use it as well.

ETW works in three parts:

  • Controllers start and stop Event Tracing Sessions. These sessions can subscribe to 1 or more providers, enabling the providers to start logging.

  • Providers are where the events come from. Due to the sheer number of events some of these providers can generate, they’re disabled unless a Tracing Session is actively using it.

  • Consumers take the generated events and handle them. The default method of handling events is outputting them to an .ETL file, but another example of a consumer is using the Windows API.

Note: While .ETL files have a similar structure as .EVTX files, they are not the same. Because of this, some evtx parsers may not be able to parse an .ETL file. 

Logman

Logman is one of the built in tools for handling ETW and Event Tracing Sessions. You can use Logman to query, create, start and stop tracing sessions, which can be great for understanding what sessions are readily available for collection or starting your own collection.

Using the argument flag “-ets” will directly query the event tracing sessions, allowing you to see system level tracing sessions. Towards the bottom of the right image, you can see the Sysmon Event Tracing Sessions.

Create, Start and Query Event Tracing Sessions

Using ‘-ets’ to query Event Tracing Sessions directly.

If we query an Event Tracing Session directly, we can detect the session specific details including the Name, Max Log Size, Log Location, and the providers that it’s subscribed too. This can be extremely useful for Incident Response companies. If you find a session that is recording providers that you’re interested in, this may give you essential logs for an investigation.

Note: Note the “-ets” argument in the command. If we don’t have that, logman will not be able to find the Event Tracing Session.

For each provider that the session is subscribed to, we can get some essential information:

  • Name / Provider GUID: The unique identifer representing the provider.

  • Level: The event level, describing if it’s filtering for warning, information, critical or all events.

  • Keywords Any: Keywords provide a filter based on the type of event being created by the provider. We’ll go more into this soon.

So, we’ve got the GUID for the Provider, how do we make this human readable?

Using the command “logman query providers”, we can get a list of each of the providers available on the system as well as their GUID.

Windows 10 has over 1,000 providers built-in. Third Party Software will often install their own ETW providers as well, especially if their in Kernel mode.

Due to the sheer number of providers, I generally find it best to filter them using “findstr”. For example, you can see in the image below there are multiple results for SMB.

You may notice the similarities between the SMB providers and the structure of SMB event logs. This is because SMB event logs get their events from ETW. You can see reference to this in the Keyword Filters.

By naming a specific provider with Logman, we can get a more detailed understanding around what the provider does. This tellsus with the Keywords that we can filter on, the available event levels and what processes are currently using the provider.

Looking at the Keywords for SMBClient, We can see which keywords provide us:

  • Full packet logging.

  • Security events

  • Authorization Events

  • The bottom 8 keywords reference the event logs that are created from this provider.

We can enable specific keywords by adding their bitmasks together and providing that when subscribing to the provider.

We can also see that PID 0 is the only process currently accessing the provider. Considering PID 0 is reserved for Kernel / System related processes, this would likely be related to the EVTX module.


Keyword Bitmasking Explained

For each keyword there’s an associated bitmask, but what is bitmasking?

Note: This section is probably a bit overkill, but I feel like bitmasking is a useful skill to know if you’re a developer.

Bitmasking uses individual bits to store a list of boolean configuration options. The boolean datatype is 1 byte large, while if we use each individual bit we can fit 8 times the data within that 1 byte. This can also be imagined as eight binary digits.

If you’ve ever counted in binary, you’d also know that each binary digit has a unique numerical value. Adding the values of enabled bits together will provide a unique value representing that specific configuration. With ETW Keywords, they represent the numerical value in hexidecimal, to make it easier to read and calculate. If you’ve ever dealt with Network maks, it’s exactly the same in principle.

Example

To the right, we have the keywords for Microsoft-Windows-Kernel-Process. Let’s say that we specifically wanted to enable Process, Image and Job. If we consider that from a bitmask point of view, we’d get a binary value like the one below.

So to simplify it, we can just add the hexidecimal values together to get the specific value we need for enabling the keywords we’re interested in.

Some tools don’t accept the hexidecimal value, so you may need to covert the value into decimal.

   0x010    Process     
+ 0x040    Image    
+ 0x400    Job   
————————     
0x450    Value we need! 

Event Manifests

So where do ETW events come from?

For each ETW provider, is an accompanying manifest file. These manifests are brilliant for determing whether or not a provider is going to be useful for an investigation. Pretty much all of the default providers have been mined and are available on github. The one below is an example of the Microsoft-Windows-Kernel-Process manifest.

As a DFIR analyst, there’s two main fields that we’re interested in within the manifest file, events and templates.

<event> tells us the different event ids that are present within the provider. This also includes the keyword that they come under, so you can filter for them.

<template> gives you the structure of events, and what parameters the event accepts.

Using these manifests can give you a great understanding for which events you want to watch out for.

If you were writing a tool which installs its own ETW provider, you casn use Visual Studio to compile the manfiest into a class or header file. Within your program, you can then import the header file and call the “EventWrite” functions for the individual event ids.

If you’re interested in checking out some manifests yourself, repnz’s etw-provider-docs is my favourite reference.


wevtutil.exe

So, how do we found out what manifests are installed on our system?

Microsoft’s wevtutil.exe enables you to install, query, modify and enable Event Logs and their associated manifest files. Similar to logman, you can use wevtutil.exe to query existing providers, except this time you’re querying the providers manifests.

Installing a manifest file using the import manifest ‘im’ argument. Make sure you have the right file permissions for the event log service account.

Enumering existing manifest files using the enumerate logs ‘el’ argument.

To query a provider we can use the get log ‘gl’ argument, naming a specific manifest. This will give us information on whether the log is currently enabled, its parent provider, log location and max size. To enable or disable the log, we can use the set log ‘sl’ argument and providing the enable flag ‘/e’. When enabling a provider this way, you will not be able to use keyword filters.

We can then open the resulting ‘.etl’ file using Event Viewer. Leaving this provider enabeld for 30 seconds generated almost 9,000 events. This is why filter ETW providers is so essential.

Example of a process creation event within Microsoft-Windows-Kernel-Process


Accessing ETW via the API

By far my preffered way of accessing ETW is via the Windows API. Logman is useful but it’s quite resource intensive. You can do a lot more filtering as well, say on EventID rather than just the keyword. The sky is the limit when you access it via the API yourself.

The two officially supported languages are C and C#, but there are wrappers for other languages such as Python and GoLang. For this example, I will be using C# because it’s my preferred and I think it’s the easiest to read. To get started, you’ll need to install the nuget package, ‘Microsoft.Diagnostics.Tracing.TraceEvent’.

Enabling Providers

Enabling providers work in four parts:

  1. Create the trace session.

  2. Enable the provider, supplying the Information Level and Keyword Bitmask.

  3. Setup a callback function

  4. Start processing.

// 1. Create a new session to listen for events. 
using (TraceEventSession session = new TraceEventSession("DART Trace Session"))
{

    // 2. Enabled Provider by GUID or Name
    session.EnableProvider(
        "Microsoft-Windows-Kernel-Process",
        TraceEventLevel.Verbose,
        0x450 // Keyword Bitmask
    );

    // 3. Setup the callback Function
    session.Source.AllEvents += Source_AllEvents;
    // 4. Start Processing 
    session.Source.Process();
}

For some of the more common providers, this library provides helper functions to enable specific keywords rather than the whole provider. We are also able to create event specific callbacks rather than a generic catch all. Below we enable Process and Image Load events.

// 1. Create a new session to listen for events. 
using (TraceEventSession session = new TraceEventSession("DART Trace Session"))
{

    // 2. Enable specific keywords. 
    session.EnableKernelProvider(
        KernelTraceEventParser.Keywords.Process |
        KernelTraceEventParser.Keywords.ImageLoad
    );

    // 3. Setup the event specific callback functions.
    session.Source.Kernel.ImageLoad += Kernel_ImageLoad;
    session.Source.Kernel.ProcessStart += Kernel_ProcessStart;
    session.Source.Kernel.ProcessStop += Kernel_ProcessStop;

    // 4. Start Processing 
    session.Source.Process();
}

Handling Event Data

Event specific callbacks allow us to handle events directly, rather than us manually having to work what Provider and Event ID we’re handling. Using event specific callbacks also gives us event specific objects as well. Using the process start example, we’re handed a ‘ProcessTraceData’ object. In the example below, we’re able to pull out the process name, PID and PPID.

private static void Kernel_ProcessStart(ProcessTraceData obj)
{
    Console.WriteLine(
        "Process Name: {0} \t PID: {1} \t PPID: {3}",
        obj.ProcessName, obj.ProcessID, obj.ParentID
    );
}

Adding duplicate functions for Process Stop and Image Load, we can get an output like this:

Handling Generic Events

If we’re using a provider which is not lucky enough to have it’s own objects, we’re going to have to determine the Provider and Event ID ourselves. Rather than an Event specific object, we get the default ‘TraveEvent’ object. We can utilise ‘ProviderName’ and ‘EventName’ to determine how to process the event. During troubleshooting we can also use the ‘Dump()"‘ function to get the contents of the object, so we can parse it later.

private static void Source_AllEvents(TraceEvent obj)
{
    Console.WriteLine(
        "Provider Name: {0} \t EventID {1} \t\n Raw Event: {2}",
        obj.ProviderName, obj.EventName, obj.Dump(true)
    );
}

With an ImageLoad event, we’ll get the following output:

We can see that the parameters for ImageLoad are help within the Payload of the event. We can use the manifests mentioned earlier to parse the event data into our own Data Type.

Event Sessions

It’s worth mentioning that while our session is active, we will appear within Logman Session queries. This means if an actor is actively paying attention, they will be able to disable our trace session.


Useful Providers for DFIR

So enough about “What is ETW and how it works”, let’s get to the important stuff. Which Providers are actually useful for DFIR? I’m going to list of some of the ones that have been particularly useful for me, but this is in no ways an all inclusive list. There are just way too many providers out there.

Microsoft-Windows-Kernel-Process

The Kernel process is probably the most obvious provider. This gives us visibility over a huge variety of process related events:

  • Process Start

  • Process Stop

  • Image Load

  • Image Unload

  • Process Freeze

Note: Kernel Process gives us annoyingly little about the parent process. Unless you happened to capture an event with the parent process, you’re limited to a PPID.

Microsoft-Windows-Kernel-File

Covered in a previous blog, Kernel File can provide great monitoring of File System operations. Below is a table of events that I’ve found useful.

Note: Create (Event ID: 12) has a deceptive name. It will also capture files opened using the NtFileCreate command.

Microsoft-Windows-Kernel-Network

Ever wanted to packet capture, but didn’t want to install software? Kernel level logging for all network connections. The keywords aren’t the best, but if you filter on Event ID, you’re able to capture raw packets. Example Events:

  • Data Sent

  • Data Received

  • Connection

    • Accept

    • Received

    • Disconnected.

Microsoft-Windows-SMBClient/SMBServer

We’ve already mentioning the SMB providers. But just to reiterate, this is fantastic for capturing all SMB and remote Named Pipe activities. SMB logs can be a bit notorious for not capturing all SMB activity, but you won’t have this problem here.

These providers will also include packet captures, great for capturing Named Pipe exploitation.

Microsoft-Windows-DotNETRuntime

Webshells are everywhere at the moment, and they’re getting more sophisticated. If an actor has done any research, they’ve likely realised that they don’t even need to write a webshell to the webroot anymore. Instead an actor can load a .NET module into the GAC to act as a webshell.

The DotNETRuntime provider has a bunch of different events to capture module loads, depending on the API call that they’re using. This also isn’t limited to IIS, this can include any .NET assemblies such as SharpHound.

.NET Module Loads include:

  • Event ID 141/152: Module Load

  • Event ID 142/153: Module Unload

  • Event ID 154/155: Assembly Load / Unload

  • Event ID 156/157: AppDomain Load / Unload

QuickFire

Because I can’t be bothered going through them, I’m just going to list off a few more that you may want to check out.

  • OpenSSH

  • Microsoft-Windows-VPN-Client

  • Microsoft-Windows-PowerShell

  • Microsoft-Windows-Kernel-Registry

  • Microsoft-Windows-CodeIntegrity

  • Microsoft-Antimalware-Service

  • WinRM

  • Microsoft-Windows-TerminalServices-LocalSessionManager

  • Microsoft-Windows-Security-Mitigations

  • Microsoft-Windows-DNS-Client

  • Microsoft-Antimalware-Protection

ELAM Drivers and Protected Processes

Some of the most important providers for DFIR are restricted. Specifically, these process are restricted to protected processes with the Anti-Malware Service privileges. This privilege can only be granted via an Early Launch Anti Malware (ELAM) Driver, which is a Microsoft approved driver that has the permission to execute protected processes. These processes must have their signature details compiled into the ELAM driver.

An example of a restricted provider is:

Microsoft-Windows-Threat-Intelligence

Probably the best example of a restricted Provider, the Threat Intelligence provider provides useful API monitoring of core funcations that get misused by malware and exploitation.

If you’ve had any vulnerability research experience, you’d like recognise all of these functions.


Post Processing

We’ve mentioned a few times in this blog, that ETW is a flood of events. No matter how many analysts you have, you’re not going to be able to effectively monitor these events manually. Asside from your standard SIEM queries, I’ve had success with these two techniques:

Yara

Many ETW providers are able to capture raw packets, such as SMB and Kernel Network. Because of this, YARA rules can be extremely effective at parsing the packets for known bad material. Even generic terms such as NOP sleds can be extremely useful.

Machine Learning

If you work within a SOC analyst role for a large dataset, ETW can give you a brilliant set of data for training anomaly detection algorithms. These don’t have to be super fancy, dense neural networks. But instead targetted models based around the Mitre attack framework.

An example technique where this would be useful would be Search Order Hi-Jacking. This technique can be notoriously hard for an analyst to detect manually, but if you build a module load anomaly detection feed, you’ll have sure fire wins.


Wrap-Up

So, the age old question: Is ETW useful?

Yes and No???

ETW greatly depends on your position:

  • If you’re in an Incident Response firm, you quiet often arrive after the actor has left. In that case, creation new Trace Session will not be useful, because it doesn’t capture historic data. You’re going to have to rely on existing Trace Sessions.

  • If you’re in a SOC enviornment, ETW could greatly help fill in gaps for your feed. But due to the scale of events, this will likely depend on whether you have the resources to handle it.

Either way, ETW is an extremely useful resource and you should have it in the back of your mind.

Hopefully this blog was informative. If you have any questions, you can find my twitter at the top of the page!

Next
Next

Extracting Cobalt Strike from Windows Error Reporting