Monitoring Manual
Nacos monitoring focuses on two kinds of information:
- Nacos Server metrics, such as JVM, HTTP, gRPC, configuration management, and service discovery metrics.
- Health check endpoints, such as liveness and readiness.
Expose Nacos Server metrics
Nacos exposes Prometheus metrics through Spring Boot Actuator. In the default configuration file, this capability is commented out. Enable it on every Nacos Server node:
management.endpoints.web.exposure.include=prometheusIf you already expose other Actuator endpoints, add prometheus to that list.
After restarting the node, visit:
http://{nacos-server-host}:8848/nacos/actuator/prometheusThe /nacos prefix comes from the default nacos.server.contextPath. If you change the server context path, adjust the URL accordingly.
Prometheus scrape example
Prometheus can scrape every Nacos Server node directly:
scrape_configs: - job_name: nacos metrics_path: /nacos/actuator/prometheus static_configs: - targets: - 10.0.0.1:8848 - 10.0.0.2:8848 - 10.0.0.3:8848If Nacos Server uses a different port or context path, change targets and metrics_path.
Grafana can use Prometheus as the datasource. Community dashboard templates are available in nacos-template.
Health checks
Nacos 3.x provides v3 health check endpoints. They are suitable for load balancers, Kubernetes probes, or inspection systems.
| Target | Endpoint |
|---|---|
| Nacos Server state | /nacos/v3/admin/core/state |
| Nacos Server liveness | /nacos/v3/admin/core/state/liveness |
| Nacos Server readiness | /nacos/v3/admin/core/state/readiness |
| Independent console liveness | /v3/console/health/liveness |
| Independent console readiness | /v3/console/health/readiness |
If you change nacos.server.contextPath or nacos.console.contextPath, add the corresponding context path to the endpoint URL.
Key metrics
Prometheus metric names may include suffixes based on Micrometer type. For example, timers usually export series such as _seconds_count and _seconds_sum. When troubleshooting, search the base name first, then inspect its labels.
Basic resource and request metrics
| Metric | Description |
|---|---|
system_cpu_usage | System CPU usage. |
system_load_average_1m | System load average over 1 minute. |
jvm_memory_used_bytes | JVM used memory. |
jvm_memory_max_bytes | JVM max memory. |
jvm_gc_pause_seconds | GC count and duration. |
jvm_threads_daemon | JVM daemon thread count. |
http_server_requests_seconds | HTTP request count and latency. |
grpc_server_requests | gRPC request latency with labels such as requestClass, success, errorCode, and module. |
grpc_server_executor | gRPC server executor status, including active count, pool size, and queued tasks. |
Core metrics
| Metric | Description |
|---|---|
nacos_monitor{module="core",name="longConnection"} | Long connection count by module. |
nacos_monitor_summary | Summary metrics for Raft read index, leader read, apply log, and apply read. |
Configuration management metrics
| Metric | Description |
|---|---|
nacos_monitor{module="config",name="getConfig"} | Config query statistics. |
nacos_monitor{module="config",name="publish"} | Config publish statistics. |
nacos_monitor{module="config",name="longPolling"} | Config long polling count. |
nacos_monitor{module="config",name="configCount"} | Config count. |
nacos_monitor{module="config",name="notifyTask"} | Config notify task backlog. |
nacos_monitor{module="config",name="notifyClientTask"} | Client notify task backlog. |
nacos_monitor{module="config",name="dumpTask"} | Config dump task backlog. |
nacos_monitor{module="config",name="fuzzySearch"} | Fuzzy search statistics. |
nacos_config_subscriber{version="v1"} / nacos_config_subscriber{version="v2"} | Config listener count. |
nacos_timer{module="config",name="readConfigRt"} | Read config latency. |
nacos_timer{module="config",name="writeConfigRt"} | Write config latency. |
nacos_timer{module="config",name="notifyRt"} | Notify latency. |
nacos_timer{module="config",name="dumpRt"} | Dump latency. |
nacos_exception{module="config",name="illegalArgument"} | Config illegal argument exception statistics. |
config_change_count | TopN config change statistics. |
Service discovery metrics
| Metric | Description |
|---|---|
nacos_monitor{module="naming",name="serviceCount"} | Service count. |
nacos_monitor{module="naming",name="ipCount"} | Instance count. |
nacos_monitor{module="naming",name="subscriberCount"} | Subscriber count. |
nacos_monitor{module="naming",name="totalPush"} | Total push count. |
nacos_monitor{module="naming",name="failedPush"} | Failed push count. |
nacos_monitor{module="naming",name="emptyPush"} | Empty push count. |
nacos_monitor{module="naming",name="avgPushCost"} | Average push latency. |
nacos_monitor{module="naming",name="maxPushCost"} | Maximum push latency. |
nacos_monitor{module="naming",name="leaderStatus"} | Leader state of the service discovery module. |
nacos_monitor{module="naming",name="serviceSubscribedEventQueueSize"} | Service subscribed event queue size. |
nacos_monitor{module="naming",name="serviceChangedEventQueueSize"} | Service changed event queue size. |
nacos_monitor{module="naming",name="pushPendingTaskCount"} | Pending push task count. |
nacos_naming_subscriber{version="v1"} / nacos_naming_subscriber{version="v2"} | Service subscriber count. |
nacos_naming_publisher{version="v1"} / nacos_naming_publisher{version="v2"} | Service provider count. |
service_change_count | TopN service change statistics. |
Experimental distributed lock metrics
Distributed Lock is an experimental feature. If you use it, watch these metrics:
| Metric | Description |
|---|---|
nacos_monitor{module="lock",name="grpcLockTotal"} | Total gRPC lock requests. |
nacos_monitor{module="lock",name="grpcLockSuccess"} | Successful gRPC lock requests. |
nacos_monitor{module="lock",name="grpcUnLockTotal"} | Total gRPC unlock requests. |
nacos_monitor{module="lock",name="grpcUnLockSuccess"} | Successful gRPC unlock requests. |
nacos_monitor{module="lock",name="aliveLockCount"} | Current alive lock count. |
nacos_timer{module="lock",name="lockHandlerRt"} | Lock request handling latency. |
Alert suggestions
| Scenario | Watch |
|---|---|
| Node unavailable | Readiness failure, HTTP/gRPC request error rate, JVM resources, process liveness. |
| gRPC backlog | grpc_server_executor active count, queue, and completed task changes. |
| Config push delay | notifyTask, notifyClientTask, notifyRt, and dumpTask. |
| Config publish exception | publish, writeConfigRt, and nacos_exception{module="config"}. |
| Service push exception | failedPush, pushPendingTaskCount, avgPushCost, and maxPushCost. |
| Abnormal connection change | longConnection, nacos_config_subscriber, and nacos_naming_subscriber. |
| Lock usage exception | Lock success rate, unlock success rate, and aliveLockCount. |
Troubleshooting
/actuator/prometheus returns 404 or no data
Check whether management.endpoints.web.exposure.include contains prometheus, confirm that the node has been restarted, and verify nacos.server.contextPath.
nacos.prometheus.metrics.enabled=true is configured, but Nacos Server metrics are still unavailable
This is expected. That configuration belongs to the Prometheus service discovery ecology document. It does not enable Nacos Server metrics.
Prometheus fails to scrape some nodes
Check node ports, context path, network policy, and Prometheus scrape configuration. Every cluster node must expose metrics and be scraped separately.
A module metric is missing
Confirm whether the corresponding module is enabled. For example, if only the configuration module is started, service discovery metrics will not be complete. Experimental distributed lock metrics are meaningful only when the related capability is used.