What's good and what's bad about profiling nowadays
There are lots of developers who already understand the necessity of performance tuning of Java applications. They know about profilers, and most likely use them in everyday developer's life. These people reasonably ask: why should we use this or that tool, why one is better than another, and finally which one is the best?
1. Approach to memory profiling
Let's start from the beginning. The first critical issue is memory profiling. Being a developer of a considerably big project, one often faces memory-related performance problems. The program starts "eating" memory, and/or does not reclaim it when it has to. Surely, there are profilers to handle this. But the lion's share of them have one common drawback: first, it takes extremely much time to start a program under the profiler, and then you have to wait long while it reaches the memory leak state. What makes it even worse, the leak may happen only under very rare and even unknown, obscure conditions, and often no one has any idea how to reproduce the problem. And it would be great to be able to analyze the heap of every application once a memory problem is found. Not long ago this was impossible, because one could not always run an application with a profiler - it would have lead to unacceptable performance degradation. That was the case, at least with profilers that existed before. Then I thought: isn't there indeed any way to take all the necessary information from a running Java application when it is needed without making the application slow as hell during its entire run time? After a series of researches and tries, my colleague and I managed to find the way out.
The clue is not to rely on time-consuming processing of events such as object creation or destruction by the garbage collector, i.e. not to record object allocations. To find and fix a memory leak, it is not that interesting to know who created the object, but rather who holds it, so that this approach works perfectly. Instead, there can be an optional ability provided, to turn object allocation on/off when needed, for those who need [I'd better say 'want' or 'think they need'] it. Frankly speaking, I personally do not find it useful to know where a particular live object is created. The useful aspect of allocation recording is to find the code that produces a lot of 'garbage', i.e. temporary objects. Modern garbage collectors work fast and temporary objects are not that big a problem. Anyway this optimization matters, because it always takes less time [for a garbage collector] not to do a job at all, rather than do it even if it is done fast.
The deep analysis of the memory profiling problem turned into effective result: we have developed such a tool (namely YourKit Java Profiler), with which an application can run with NO overhead, and is ready to tell all about its memory state exactly WHEN it is needed. Currently YourKit is the only profiler that addresses the issue of memory profiling adequately and in a user-friendly way.
2. Approach to CPU profiling
An approach similar to that discussed in the previous section can be applied to CPU profiling. An application starts and runs at full speed. When the need arises, the profiler is activated; it starts recording profiling data, which of course may lead to a performance overhead. This lasts until the application finishes the particular task you are interested in, and then you capture a snapshot with all collected information. The application goes its way, again at its full speed. You go your way - open the collected information in the profiler UI and study it.
The purpose of the tool matters. A useful profiler should not be just a tool primarily targeted to show beautiful views jumping synchronously with an application being analyzed, with no regard to how much this "live show" slows down the application. Alternatively, it should help finding slow parts of a running application and show or give a hint how to make them faster. This theory became the basis for YourKit profiling ideology.
This may be easily taken as radicalism. And radicalism is not always good. Thus we are currently working on adding support for some useful "telemetry" information that illustrates on application's time line. That should be an addition to the main "start - wait - capture snapshot" approach.
In general, measuring is intrusive. That's a law of nature. No measuring tool can measure without affecting the thing it measures, either in the software world or in the real physical world. Capturing a memory snapshot may pause the profiled application for a couple of seconds, CPU sampling gives a very small, invisible, non-perceptible but still existing constant overhead for periodic inquiring into stacks of running threads. When the CPU is profiled with tracing (which handles method entries and exits), or an option to record object allocations is used, intrusion is more significant.
There's an approach for reducing this intrusion by means of reducing the scope of code (methods, classes) to measure (profile). As a result, the code you measure runs slower, but the rest of the code runs fast. So you can skip profiling of non-interesting parts of an application, and profile only interesting parts. This approach is widely used in different profilers.
It's all about saving your time. But... the saving is doubtful.
This approach works, but there's a problem with it. Which appeared earlier: the chicken or the egg? I should be interested in the profiling of slow parts of my application, the particular code that causes performance problems. But how can I know which part of my code is "problematic" and which is not, i.e. which classes, entry points I should profile, other than learning this from profiling results? Of course sometimes it is a priori known, but sometimes not, because I cannot hold my entire project details in my head. Even libraries, the good candidates for skipping "by default", should sometimes not be skipped. For example, in my own practice there were cases when without analyzing the behavior of core Java library classes (thanks to Sun the sources are available) it was very hard to understand the reason for a performance problem.
As a workaround, this approach can be applied iteratively: launch profiling session, analyze results, reduce scope of profiled code, launch another session, etc. But:
- This takes human time and human attention, which is more valuable than computer time. Were's my time saving?
- What if I understand the need to change profiled code scope, but cannot re-launch the profiling session? E.g. there's a problem which is very hard to (or unknown how to) reproduce.
In YourKit Java Profiler, we decided to use the alternative approach: in two words, no filtering in runtime. And I bet it does save time:
- Profiling scope is reduced by reducing time periods of heavily intrusive profiling:
- in most cases, heavily intrusive profiling modes, such as allocation recording and CPU tracing, are not used at all; allocation recording is optional for memory profiling; sampling as a CPU profiling method gives good results in most cases, having very small overhead
- one can turn on the heavily intrusive profiling modes when needed only, and for limited periods of time
- Human needs to think less, and that's good. Snapshots contain all the data, and filters are only a UI option. You always can change your mind without re-launching measuring sessions. A human has the right to make a mistake and to change his mind, and that's good.