Using Cloud Analytics
The Cloud Analytics tool helps you do real-time performance analysis of your Node.js™ application.
Cloud Analytics requires Node 0.4.0 or later.
In this page:
Also see New metrics on no.de.
Cloud Analytics is a system to gather real-time metrics from across the cloud, including advanced metrics made available by DTrace. The metrics are only enabled when in use, such as during a performance issue investigation, and are not (currently) archived. The metrics are exposed via two means:
- An interactive browser-based performance analysis tool (also referred to as "Cloud Analytics")
- An API for enabling and retrieving performance metrics
The interactive analysis tool is found under the "ANALYTICS" tab of my.joyent.com, and is documented in the sections that follow.
The API allows Cloud Analytics metrics to be fetched and consumed by other tools. It is documented separately in the CA apidocs.
The analytics tool displays two kinds of charts: line graphs and heat maps.
Line graphs display the quantity that you're measuring at a given time. For instance, if you're measuring the number of HTTP operations by IP address, the chart shows a stacked line graph of the number of requests for each IP address per second.
Heat maps let you see three dimensions of data, much like a rainfall map that uses different colors for the amount of rain in a particular region. Cloud analytics heat maps display colored blocks of different densities. For instance, in a heat map that displays the number of threads broken down by application and CPU runtime, the x-axis shows time in one-second intervals, and the y-axis shows run time. The density of each block, or bucket, in the graph represents the number of threads that took up a particular amount of run time. You can use the filters or click on a bucket to see how many times a thread ran on a processor for that length of time at that time. The height of each bucket gives a range of values along the y-axis.
In both kinds of charts, the x-axis represents time in one-second intervals. The y-axis represents one variable that you're measuring, but the scale changes as the range of values widens or narrows.
|Instrumentation options||These controls let you create an instrumentation panel. You can choose from among several metrics decomposed in different ways. These metrics are described in the next section.|
|Element list||This lists the the items that you're measuring. For example, if you're measuring HTTP server operations decomposed by URL, this pane contains a list of URLs. In a heat map, selecting an item from this list highlights all the buckets that contain measurements for that element. You can use the filters at the bottom of the list to isolate or exclude the selected items.|
|Rank / Linear|| This control lets you choose how the density of the buckets in a heat map is rendered.
Linear coloring means that each bucket is colored according to its value. A bucket whose value is 100 will be 100 times darker than a bucket whose value is 1.
Rank-based coloring sorts all the values and distributes the color density among them. Rank-based coloring is particularly useful to discover outliers.
|Granularity||This slider controls the height of each block and, consequently, the range of values for each bucket.|
|Scale|| This slider lets you control the range of values of the y-axis. The range is given in the text below the control.
When the sliders are all the way to the top and the bottom, the scale of the y-axis changes depending on the range of values displayed. If you use the sliders to set an upper or lower limit, the scale doesn't change.
|Zoom||These controls let you change the width of each bucket so you can see a wider range of time. The width of each bucket is always one second.|
|Play/Pause||These controls let you pause and resume the display.|
|Buckets||Clicking on a bucket shows the value distributions of that bucket.|
Cloud analytics can show you different kinds of metrics:
- CPU thread executions
- HTTP server operations
- HTTP client operations
- Garbage collection operations
- Socket operations
- Logical file system operations
In most cases, the number of operations is not as interesting as decomposing the operation by latency. Latency is the amount of time between making a request and its fulfillment.
This metric can help you understand how your application uses the CPU. It reports the number of thread executions sorted by different criteria. A thread execution starts when when the thread is put on the CPU and ends when it is taken off.
You can use the following decompositions by themselves to get a line graph of the number of thread executions or combine them with runtime to display a heat map.
|process identifier||The number of threads for a given process ID.|
|application name||The number of threads for each application running.|
|zone name||The number of threads running in each zone (SmartMachine). This information is not especially useful if you're running only one SmartMachine.|
|reason leaving cpu||The number of threads sorted by reason leaving the CPU.|
The line graphs that show the number of executions aren't particularly interesting. The heat maps of run time, which show how long the thread was on the CPU, are more revealing because they show when your application has been doing a lot of computation.
|Selecting runtime by itself without an addition decomposition is results in a heat map of thread executions decomposed by runtime that's not particularly useful.|
If your application isn't running for very long, you can look at why it's coming off CPU. It might be blocking on a kernel lock (potentially indicating kernel lock contention), a userland lock (indicating user-level lock contention), a condition variable (indicating it's probably waiting for some other event, like disk I/O or networking), and so on. Or it may have been runnable but taken off the CPU because something else of higher priority needed to run.
This is a low-level and subtle metric, but it may help you understand whether the reason an application is taking so long to serve requests is because it's spending a lot of time on the CPU.
HTTP server performance is the primary performance metric for a web server.
You can use the following decompositions by themselves to get a line graph of the number of events, or you can combine them with latency to get a heat map.
|Method||The number of requests by HTTP method: GET, PUT, POST, DELETE|
|URL||The number of requests by URL.|
|Remote IP address||The number of requests by IP address of the client that issued the request.|
|Remote TCP port||The number of requests by TCP port used by the client that issued the request. This metric is useful for picking out multiple requests from the same client on the same TCP connection using HTTP Keep-Alive.|
This metric reports HTTP requests and responses that originate from your application as opposed to the ones that it serves. You can use this data to help you determine whether your server's latency is due to the latency of some other web service you're using. For example, if your service stores data on Amazon's S3, you can see how long requests to the S3 web service take.
If you find that another web service is the source of your high latency, then you can investigate that other web service (if you're able to) or rethink how your application uses it, perhaps by caching results where possible.
|Method||The number of requests by HTTP method.|
|URL||The number of requests by URL.|
|HTTP server address||The number of requests by server address.|
|HTTP server port||The number of requests by server port.|
You can use this metric to observe garbage collection in your application. In this case, "latency" refers to how long the garbage collection operation took. If you see garbage collection activity correlate with HTTP request latency spikes, you may need to rethink how memory is being used in your application to avoid creating so much garbage, or else tune the GC to avoid large spikes.
|GC type||The number of garbage collection operations by type.|
Socket operations are the basis of HTTP activity, WebSockets, and most other types of network activity. This metric lets you observe activity for non-web applications. Because it's not possible to know which messages correspond with which others, it's impossible to show latency directly at this level.
|Type||The number of socket operations by type (read, write).|
|Remote host||The number of socket operations by remote host.|
|Remote port||The number of socket operations by remote port.|
|Size||The number of socket operations by size.|
|Buffered data||The amount of data buffered within Node.js because the application has written more than kernel can consume without blocking.|
Disk I/O can be a source of significant latency. This metric examines one of the primary causes of disk I/O by applications: accessing the filesystem.
|Process identifier||The number of file system operations by process ID.|
|Application name||The number of file system operations by application name.|
|Zone name||The number of file system operations by zone name.|
|Operation type||The number of file system operations by type (read, write, lookup, access, etc.).|
|File system type||The number of file system operations by file system type (zfs, proc, lofs, etc.).|
If you see significant latency here that corresponds with HTTP request latency, you might consider changing the way your application interacts with the filesystem, perhaps by prefetching or caching data.
The decompositions help you identify precisely who is doing what so you can optimize the right thing. Unlike the Node.js metrics, this metric examines all filesystem activity in your SmartMachine. The decompositions by process ID and application name are useful in these cases.
You can learn more about using heat maps to visualize latency from these resources: