Skip to content
铭师堂的云原生升级实践Know more

Trace Tracking

Trace Tracking Plugin

Starting from version 2.2.0, Nacos allows injection of trace tracking implementation plugins through the SPI mechanism. Plugins can subscribe and process tracing events, handling them according to desired methods (e.g., logging, storage). This document provides a comprehensive guide on implementing a trace tracking plugin and enabling its functionality.

Note: The trace tracking plugin is currently in Beta testing; its APIs and interface definitions may undergo significant modifications in future updates. Please ensure compatibility with your plugin’s target version.

Unlike conventional distributed tracing, Nacos’s trace tracking focuses on monitoring and recording operations related to Nacos, such as service registration, de-registration, pushes, and status changes, not inter-service communication paths. For monitoring service interactions, use dedicated distributed tracing solutions.

Concepts in Trace Tracking Plugins

Trace Event

Nacos embeds tracing points at critical operational stages, defining a series of trace events (TraceEvent). Linking multiple events targeting the same resource (e.g., services, configurations) forms a trace for that resource.

A TraceEvent includes:

Field NameDescription
typeThe type of event, defined by specific events
eventTimeTime of the event occurrence
namespaceIdNamespace ID of the event’s corresponding resource
groupGroup name of the event’s corresponding resource
nameResource name, such as a service name or configuration dataId

Nacos has predefined sub-event types:

Event NameDescriptionDetails
RegisterInstanceTraceEventService instance registration event, triggered when registering a service provider
DeregisterInstanceTraceEventService instance deregistration event, triggered upon service provider deregistration
RegisterServiceTraceEventService registration event, distinct from instance registration, occurs during service creation
DeregisterServiceTraceEventService deregistration event, different from instance deregistration, happens when deleting a service
SubscribeServiceTraceEventService subscription event, activated when subscribing to a service
UnsubscribeServiceTraceEventService unsubscription event, triggered upon unsubscribing from a service
PushServiceTraceEventService push event, occurs during service push
HealthStateChangeTraceEventService instance health state change event, triggered when instance health changes due to heartbeats or health checks

Plugin Development

To develop a Nacos server-side trace tracking plugin, first, depend on the trace tracking plugin’s API:

<dependency>
<groupId>com.alibaba.nacos</groupId>
<artifactId>nacos-trace-plugin</artifactId>
<version>${project.version}</version>
</dependency>

Replace ${project.version} with the Nacos version for which you’re developing the plugin.

Next, implement the com.alibaba.nacos.plugin.trace.spi.NacosTraceSubscriber interface and register your implementation into SPI services.

The required methods include:

Method NameInput ContentReturn ContentDescription
getNamevoidStringThe plugin’s name; if names conflict, the latter loaded plugin will overwrite the former.
subscribeTypesvoidList<Class<? extends TraceEvent>>The types of events this plugin subscribes to; returns an empty list for no subscriptions.
onEventTraceEventvoidThe logic for event handling; input event types are defined by subscribeTypes.
executorvoidExecutorIf not null, the onEvent call uses this Executor; otherwise, it uses the event distribution thread.

Note: It is advised to use a dedicated Executor in plugin implementations to prevent blocking I/O operations in one plugin from delaying other events’ processing.

A demo trace tracking plugin implementation is available in the nacos-group/nacos-plugin repository, subscribing to instance registration and deregistration events and logging them.

Plugin Degradation Strategy

As a monitoring enhancement, trace tracking plugins do not impact Nacos data. Thus, when issues arise, the primary workflow should remain unaffected.

Hence, using a dedicated Executor is recommended. If there are blocking I/O operations in the plugin, I/O exceptions could stall other event onEvent calls, causing a backlog.

In case of a backlog, the trace tracking plugin’s event queue automatically discards subsequent events once its capacity is reached, ensuring overall system stability. Log entries indicating dropped events will appear in nacos.log.

Appendix: Sub-trace Event Details

RegisterInstanceTraceEvent

Since 2.2.0.

type: REGISTER_INSTANCE_TRACE_EVENT

Extra Content:

Field NameDescription
clientIpThe source IP of registering service instance request, probably null.
rpcWhether the source request is gRPC, true when request is gRPC, false is HTTP.
instanceIpThe IP or Host of service instance registered
instancePortThe Port of service instance registered

DeregisterInstanceTraceEvent

Since 2.2.0.

type: DEREGISTER_INSTANCE_TRACE_EVENT

Extra Content:

Field NameDescription
clientIpThe source IP of de-registering service instance request, probably null.
reasonThe reason of de-registering, details see DeregisterInstanceReason
rpcWhether the source request is gRPC, true when request is gRPC, false is HTTP.
instanceIpThe IP or Host of service instance de-registered
instancePortThe Port of service instance de-registered

DeregisterInstanceReason

ReasonDescription
REQUESTDe-registration comes from client requests, in other word, user initiated de-registration.
NATIVE_DISCONNECTEDDe-registration comes from client disconnected
SYNCED_DISCONNECTEDDe-registration comes from client disconnected in other server node, and synced from other server node.
HEARTBEAT_EXPIREDe-registration comes from heartbeat timeout for 1.X version client.

RegisterServiceTraceEvent

Since 2.2.0.

type: REGISTER_SERVICE_TRACE_EVENT

Extra Content: None

DeregisterServiceTraceEvent

Since 2.2.0.

type: DEREGISTER_SERVICE_TRACE_EVENT

Extra Content: None

SubscribeServiceTraceEvent

Since 2.2.0.

type: SUBSCRIBE_SERVICE_TRACE_EVENT

Extra Content:

Field NameDescription
clientIpThe IP of subscriber

UnsubscribeServiceTraceEvent

Since 2.2.0.

type: UNSUBSCRIBE_SERVICE_TRACE_EVENT

Extra Content:

Field NameDescription
clientIpThe IP of subscriber

PushServiceTraceEvent

Since 2.2.0.

type: PUSH_SERVICE_TRACE_EVENT

Extra Content:

Field NameDescription
clientIpThe IP of subscriber
instanceSizeThe size of service instance for this push
pushCostTimeForAllThe full cost for this push, means that the cost from start pushing to end pushing, including the wait time in combined queue and the time for executing.
pushCostTimeForNetWorkThe network cost for this push, means that the cost from executing to end pushing, only including the network cost.
serviceLevelAgreementTimeThe actual cost for this push, means the cost from services changeing to end pushing. It’s a reference value not accuracy.

HealthStateChangeTraceEvent

Since 2.2.0.

type: HEALTH_STATE_CHANGE_TRACE_EVENT

Extra Content:

Field NameDescription
instanceIpThe IP or Host of service instance changed
instancePortThe Port of service instance changed
isHealthyThe change result is healthy or not
healthCheckTypeThe type of health check
healthStateChangeReasonThe reason of healthy changed