Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Application Analysis with Event Tracing for Windows (ETW)

4.90/5 (14 votes)
6 Aug 2013Ms-PL11 min read 101.4K   3.5K  
ETW is a super fast way to collect runtime data from your app. We will look how we can insert intelligent data points, collect data, and analyze data logs. This permits supervision, anomaly detection, and system optimization.

Summary table of etw events

Introduction

This is the first article of two about ETW events. The first article is about how to use them, the second looks at how an  EtwDataViewer can display the events in a hierarchal tree and analyze them to reveal context and support searchability.

When we have a problem with an application, we always wish we had more logs, or even logs at all. Writing a lot of log data to files using printfs or some other technology, slows performance and fills the disk. As a consequence, the debug build often contain more log capability than the release build, where it has been removed by hand or by the compiler in order to increase performance. However, when a problem arises in a production environment and you need to troubleshoot, you just wish you still had those logs. Wouldn't it be great to be able to turn on logs when needed on a production system with close to zero impact? One drawback with debug builds is that some developers let an alternative implementation run in the debug build. I can understand the reason for doing that, run-time checks and safer handling of memory, but in a worst case scenario. A problem is not reproducible in a debug build.

Debuggers and profilers are both indispensable tools for troubleshooting, optimizing and improving software. But when handling Big Data, the same tools are of limited use. Let's say that batch processing of a 10000 invoices performs really bad. If I single step through 10 or 20 invoices, I might not detect any obvious problems. If we use a database, the cpu will hardly be saturated which make it harder to detect bottlenecks. The problem is usually on a higher-level. Are we doing the right things? Are we doing redundant queries to the database? Are the queries that we do, really cached internally as efficient as we think? Without high-level data points constructed from within the code itself, it might be very hard to answer those questions.

Event Tracing for Windows (ETW) can be used for inserting permanent, close to zero impact data points. These data points can be activated and deactivated in production environments, and later analyzed on a completly different machine. We will see how we can insert these data points and produce a nice report.

Background

Logging and Diagnostics can be done with various different tools. Process Monitor, from sysinternals collects system wide events. Perfect for finding missing registry keys, locked files, and interfering processes. Process Monitor uses both hooks and etw to collect information.

Then we have xprof.exe and Windows Performance Recorder (wpr.exe and wprgui.exe), from the Windows Performance Toolkit, which is also found in the Windows SDK 8.0. The xprof.exe uses only ETW, but with much more level of detail. You can compare a graph of disk accesses to a graph of thread scheduling, or see how your software affects paging. Really powerful, and best of all it is all free.

I use both tools, but for showing application specific event, I have relied on debugview.exe (also from sysinternals). Unfortunately, logs prints, don't usually reveal context, and normal prints tend to be harder to analyze, since every developer tend to write logs their own way.

A standardised way would be to write logs in the ETW format.

Scenario

Let's go back to the order system example. In an Order system, one often talk about Orders and Order lines. Typically 1 order, can contain one to a thousand different products, each product type is put in a separate Order line. When the order is taken, the products are reserved in the warehouse, and when it is packed and shipped the stock is decreased. Additionally, one invoice may contain many orders, and sometimes, there is need to process a thousands invoices, for example at the end of the month. The amount of data that needs to be processed is enormous, and there are many sensitive steps that needs data locking and refetching of data. The more data we have the less feasible it becomes to use debugging to step through the execution of the program. Profiling also have its weakness. When the amount of orderlines in an order are non uniform, I cannot easily use the statistics the profiler gives. I can of course optimize some low-level calls to ToString or some other method, which initially gives a good performance increase, but in a mature software, much of that optimizations has already been done. One has to optimize on a higher level. Question ourselves, if we are really doing the right thing, rather than doing what we do as fast as we can.

What we need are quality data points.

  • Amount of orderlines
  • How many sql queries
  • What queries that are executed
  • Efficiency of the sql data cache (own implementation)
  • The reason why the sql data cache is flushed
  • Number of locks
  • Number of nested transactions
  • What we are locking on
  • Time measures
  • Memory consumption
  • Amount of time we create certain objects

It is possible to use an Sql Profiler to get what queries that are executed, but it is harder to correlate it to what is actually taking place in the software at the same time. In the best of cases you have done a tool yourselves that is able to merge these logs with your own logs.

Adding the ETW data points

Adding ETW data points has a steep learning curve. There are few complete instructions and tutorials in the subject. I had to serach for resources on the internet, do some trial and error, and figure some things out myself. My aim is to show you what can be done with ETW, to plant a seed of interest. Eventually, I hope I will also write a tutorial on this.

The producer of ETW events is called a Provider. The consumer is called Controller. You have two option of writing a provider. Either implementing a classic provider which also works for older operating systems like Win XP, or a manifest based provider which supports up to 8 simultaneous sessions. I went for the manifest based, which is the recommended one.

The code

You will need Visual Studio 2012 for opening the solution. I use the Express version myself. In addition you need the Windows SDK 8.0, and the Windows Performance Toolkit which is an optional package of the SDK.

I didn't find the sample code for writing and ETW provider easy to understand at all, or at least it wasn't obvious how I could write my own. What I didn't know is that if you use manifests you don't have to write any code at all. The code can be generated from the manifest itself, and the manifest is done rather quickly in a graphical tool called ecmangen.exe

Image showing ecmangen.exe

The tool has a short but very helpful "help" documentation. I followed the steps for the first example, and after that I was able to extend it and adapt the manifest to my own needs.

When you are done with the manifest, you should run it through the message compiler (mc.exe), and then generate .cs files from the manifest also using the same tool. What you end up with is a some static classes that works out of the box. No initialization is needed.

mc FirstETW.man
mc -css MyProvider.Eventing FirstETW.man

The -css switch makes it generate .cs files which you add to your project. Below is skeleton of the generated class. I removed the body of the functions, but what you see is that the class is static and ready for use.

C#
namespace MyProvider.Eventing
{
    public static class FunctionTraceProvider
    {
        public static bool EventWriteFunctionEntry() { }
        public static bool EventWriteFuntionExit(string FunctionName) { }
        public static bool EventWriteCreateDbConnection() { }
        public static bool EventWriteSqlQuery(string query, int rowcount) { }
        public static bool EventWriteNetException(string message) { }
    }
}

Putting it into a context. I can for example log the queries that my software does.

C#
public List<OrderSummary> GetOrderSummary()
{
  const string sql = "select [OrderId], [CustomerId], [OrderlineId], [ProductId], [ProductDescription] from [OrderSummary];";
  var ds = new DataSet();
  var da = new Sql.SQLiteDataAdapter(sql, m_connection);
  da.Fill(ds);
  var list = new List<OrderSummary>();
  // Code for filling the list removed 
  
  // Logging query and the number of rows returned
  MyProvider.Eventing.FunctionTraceProvider.EventWriteSqlQuery(sql, list.Count);
  
  return list;
}

In the example above I log the Sql query together with the number of rows returned. For efficiency reasons, consider using sql string constants mapped to query numbers instead, and just log the query number.

I had to run also the resource compiler, and convert the FirstETW.rc to FirstETW.res.

rc FirstETW.rc

Then I added it as resource file under properties for the project. I haven't figured out how I can use the .rc file directly.

Image showing how to add .res file

Running the demo project

A manifest based provider has to be registered in each computer where you want to collect the data, but before you do this you must update 2 file paths the manifest so that it points to the the correct file.

Image showing ecmangen.exe

After you have saved the manifest you register the provider with the wevtutil.exe tool.

wevtutil.exe im FirstETW.man  //installs the provider
wevtutil.exe um FirstETW.man  //uninstalls the provider

Putting it all together

One of the ways ETW events can be recorded is through xperf.exe. This starts logging of your provider.

xperf.exe -start <SomeName> -on <NameOfYourRegisteredProvider>
xperf.exe -start FirstETW -on Function-Entry-Exit-Provider

You can interact with the WCF app though the tool WcfTestClient.exe, which is part of visual studio.

Register the address http://localhost:8080/ordersystem and invoke methods from the tool.

Image showing wcftestclient.exe

The logs can be saved with the following command

xperf.exe -stop <SomeName> -d <FilePath>
xperf.exe -stop FirstETW -d "c:/temp/myevents.etl"

After that you can view the logs in xperfview.exe

xperfview.exe myevents.etl

The result can be seen below

Image showing xperfview.exe

The above graph just shows the log event during a period of time from our provider. If we select a start and end period and right click on any of the Summary Tables, we will get a detailed list of all the events and the data it carries.

Image showing xperfview.exe

What have we gained from using ETW??

This far. There isn't much difference. All the events are just presented in a long list. Just as we would have with normal traces or printfs. They are thread safe and they are probably a lot faster than prints. Apart from that, the advantage, is minimal.

Leveraging ETW

Let's turn on more ETW providers. The whole kernel is full of them. You can log disk accesses, network, context switches, thread scheduling, amount of paging, literally the tinyiest move the system does. ETW is so efficient that is has almost zero impact on your system.

Use the -on DiagEasy switch to turn on the most commonly used providers.

xperf.exe -on DiagEasy -start FirstEtw -on Function-Entry-Exit-Provider
// Do your stuff with your app
// Then stop the logging
xperf.exe -stop FirstETW -d "c:/temp/myevents1.etl"  // stop user defined provider
xperf.exe -stop -d "c:/temp/myevents2.etl"           // stop kernel providers

xperfview.exe myevents1.etl myevents2.etl            // Merge and show everything

Image showing xperfview.exe

If you zoom in to graphs, you are able to see how a specific event affect cpu, disk, and network activity.

This is amazing!!!

Reporting

We can turn the logged events into a real summary report. Why not do some data analysis in Excel with pivot tables? First we need to export the events to a csv file. Right-click anywhere on the page, and select "Export full table". Then we open the file with Excel and create a Pivot table

Please click on the image to view it in full size

A pivot in Excel

In the pivot I can get a count over all logged event containing specific data. I can see that a special sql query was executed 6 times, we had 1 .Net Exception, and created a database connection 7 times. We can also see what type of queries that are executed the most. A Pivot, can show patterns and give structure to a report. Rotating the Pivot, can give a report with a totally different view.

The next step

The full etw trace with additional providers and working with Pivots may prove useful.

But let's go back to my original order system software. To really be able to pinpoint problem areas. I need to build up an event hierarchy. To do this, one must plan ahead to make this possible. Since this article was written I have written a follow-up article on a EtwDataViewer, which looks how we can create a log viewer that supports an event hierarchy.

In my sample wcf app. I have created a wcf service with a published interface. I log when a published method is entered and exited. Additionally I log some other data points. Having the Enter and Exit, I can assume that the events happening between those two points belong to the same task. This cannot be guaranteed, but maybe you can temporarily turn off multi processing when logging, or tag events also with a task id. This permit to create a context, in its extension, to calculate the impact of isolated task. If you are really ambitious, you can take into consideration the csv logs from the kernel etw data that we obtained from using the -on DiagEasy switch.

Alternative Approaches

The reader Andre Ziegler was kind to inform about the new .Net 4.5 class, System.Diagnostics.Tracing.EventSource[^] , which simplifies ETW writing and doesn't need a manifest. However, since the ETW provider isn't registered system-wide, xperf can no longer be used, instead PerfView[^] is recommended. Here follows a small tutorial Introduction Tutorial: Logging ETW events in C#: System.Diagnostics.Tracing.EventSource[^]. In my opinion, the .Net class seems easier to use, but the manifest approach is more complete. In addition, if you are using a mixed mode application, the manifest approach has an advantage that you can generate code for both C++ and C#.

History

1st of April 2013 - Initial release

3rd of April 2013 - Added Alternative Approach with Net class.

6th of August 2013 - Added link to follow-up article

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)