Skip to content
OpenClaw 不踩坑恶意 Skills ,企业需 Skills Registry:Nacos 3.2 发布Know more

Monitoring Manual

Nacos monitoring focuses on two kinds of information:

  • Nacos Server metrics, such as JVM, HTTP, gRPC, configuration management, and service discovery metrics.
  • Health check endpoints, such as liveness and readiness.

Expose Nacos Server metrics

Nacos exposes Prometheus metrics through Spring Boot Actuator. In the default configuration file, this capability is commented out. Enable it on every Nacos Server node:

management.endpoints.web.exposure.include=prometheus

If you already expose other Actuator endpoints, add prometheus to that list.

After restarting the node, visit:

http://{nacos-server-host}:8848/nacos/actuator/prometheus

The /nacos prefix comes from the default nacos.server.contextPath. If you change the server context path, adjust the URL accordingly.

Prometheus scrape example

Prometheus can scrape every Nacos Server node directly:

scrape_configs:
- job_name: nacos
metrics_path: /nacos/actuator/prometheus
static_configs:
- targets:
- 10.0.0.1:8848
- 10.0.0.2:8848
- 10.0.0.3:8848

If Nacos Server uses a different port or context path, change targets and metrics_path.

Grafana can use Prometheus as the datasource. Community dashboard templates are available in nacos-template.

Health checks

Nacos 3.x provides v3 health check endpoints. They are suitable for load balancers, Kubernetes probes, or inspection systems.

TargetEndpoint
Nacos Server state/nacos/v3/admin/core/state
Nacos Server liveness/nacos/v3/admin/core/state/liveness
Nacos Server readiness/nacos/v3/admin/core/state/readiness
Independent console liveness/v3/console/health/liveness
Independent console readiness/v3/console/health/readiness

If you change nacos.server.contextPath or nacos.console.contextPath, add the corresponding context path to the endpoint URL.

Key metrics

Prometheus metric names may include suffixes based on Micrometer type. For example, timers usually export series such as _seconds_count and _seconds_sum. When troubleshooting, search the base name first, then inspect its labels.

Basic resource and request metrics

MetricDescription
system_cpu_usageSystem CPU usage.
system_load_average_1mSystem load average over 1 minute.
jvm_memory_used_bytesJVM used memory.
jvm_memory_max_bytesJVM max memory.
jvm_gc_pause_secondsGC count and duration.
jvm_threads_daemonJVM daemon thread count.
http_server_requests_secondsHTTP request count and latency.
grpc_server_requestsgRPC request latency with labels such as requestClass, success, errorCode, and module.
grpc_server_executorgRPC server executor status, including active count, pool size, and queued tasks.

Core metrics

MetricDescription
nacos_monitor{module="core",name="longConnection"}Long connection count by module.
nacos_monitor_summarySummary metrics for Raft read index, leader read, apply log, and apply read.

Configuration management metrics

MetricDescription
nacos_monitor{module="config",name="getConfig"}Config query statistics.
nacos_monitor{module="config",name="publish"}Config publish statistics.
nacos_monitor{module="config",name="longPolling"}Config long polling count.
nacos_monitor{module="config",name="configCount"}Config count.
nacos_monitor{module="config",name="notifyTask"}Config notify task backlog.
nacos_monitor{module="config",name="notifyClientTask"}Client notify task backlog.
nacos_monitor{module="config",name="dumpTask"}Config dump task backlog.
nacos_monitor{module="config",name="fuzzySearch"}Fuzzy search statistics.
nacos_config_subscriber{version="v1"} / nacos_config_subscriber{version="v2"}Config listener count.
nacos_timer{module="config",name="readConfigRt"}Read config latency.
nacos_timer{module="config",name="writeConfigRt"}Write config latency.
nacos_timer{module="config",name="notifyRt"}Notify latency.
nacos_timer{module="config",name="dumpRt"}Dump latency.
nacos_exception{module="config",name="illegalArgument"}Config illegal argument exception statistics.
config_change_countTopN config change statistics.

Service discovery metrics

MetricDescription
nacos_monitor{module="naming",name="serviceCount"}Service count.
nacos_monitor{module="naming",name="ipCount"}Instance count.
nacos_monitor{module="naming",name="subscriberCount"}Subscriber count.
nacos_monitor{module="naming",name="totalPush"}Total push count.
nacos_monitor{module="naming",name="failedPush"}Failed push count.
nacos_monitor{module="naming",name="emptyPush"}Empty push count.
nacos_monitor{module="naming",name="avgPushCost"}Average push latency.
nacos_monitor{module="naming",name="maxPushCost"}Maximum push latency.
nacos_monitor{module="naming",name="leaderStatus"}Leader state of the service discovery module.
nacos_monitor{module="naming",name="serviceSubscribedEventQueueSize"}Service subscribed event queue size.
nacos_monitor{module="naming",name="serviceChangedEventQueueSize"}Service changed event queue size.
nacos_monitor{module="naming",name="pushPendingTaskCount"}Pending push task count.
nacos_naming_subscriber{version="v1"} / nacos_naming_subscriber{version="v2"}Service subscriber count.
nacos_naming_publisher{version="v1"} / nacos_naming_publisher{version="v2"}Service provider count.
service_change_countTopN service change statistics.

Experimental distributed lock metrics

Distributed Lock is an experimental feature. If you use it, watch these metrics:

MetricDescription
nacos_monitor{module="lock",name="grpcLockTotal"}Total gRPC lock requests.
nacos_monitor{module="lock",name="grpcLockSuccess"}Successful gRPC lock requests.
nacos_monitor{module="lock",name="grpcUnLockTotal"}Total gRPC unlock requests.
nacos_monitor{module="lock",name="grpcUnLockSuccess"}Successful gRPC unlock requests.
nacos_monitor{module="lock",name="aliveLockCount"}Current alive lock count.
nacos_timer{module="lock",name="lockHandlerRt"}Lock request handling latency.

Alert suggestions

ScenarioWatch
Node unavailableReadiness failure, HTTP/gRPC request error rate, JVM resources, process liveness.
gRPC backloggrpc_server_executor active count, queue, and completed task changes.
Config push delaynotifyTask, notifyClientTask, notifyRt, and dumpTask.
Config publish exceptionpublish, writeConfigRt, and nacos_exception{module="config"}.
Service push exceptionfailedPush, pushPendingTaskCount, avgPushCost, and maxPushCost.
Abnormal connection changelongConnection, nacos_config_subscriber, and nacos_naming_subscriber.
Lock usage exceptionLock success rate, unlock success rate, and aliveLockCount.

Troubleshooting

/actuator/prometheus returns 404 or no data

Check whether management.endpoints.web.exposure.include contains prometheus, confirm that the node has been restarted, and verify nacos.server.contextPath.

nacos.prometheus.metrics.enabled=true is configured, but Nacos Server metrics are still unavailable

This is expected. That configuration belongs to the Prometheus service discovery ecology document. It does not enable Nacos Server metrics.

Prometheus fails to scrape some nodes

Check node ports, context path, network policy, and Prometheus scrape configuration. Every cluster node must expose metrics and be scraped separately.

A module metric is missing

Confirm whether the corresponding module is enabled. For example, if only the configuration module is started, service discovery metrics will not be complete. Experimental distributed lock metrics are meaningful only when the related capability is used.