Skip to content
云栖回顾 | 2024 云栖大会微服务和网关相关演讲材料Know more

Nacos monitor guide

Document optimizing…

Nacos 0.8.0 improves the monitoring system, supporting Nacos operation status monitoring through exposing metrics data access to third-party monitoring system. Currently, prometheus, elastic search and influxdb are supported. The docs introduce how prometheus and grafana monitor Nacos. You can find out for yourself how to use elastic search and influxdb.

Deploy Nacos cluster to expose metrics data

Deploy the Nacos cluster according to the deploy document

Configure the application. properties file to expose metrics data

management.endpoints.web.exposure.include=*

Access {ip}:8848/nacos/actuator/prometheus to see if metrics data can be accessed

Deploy prometheus to collect Nacos metrics data

Download the Prometheus version you want to install at the address of download prometheus

linux & mac

Decompress prometheus compression package

tar xvfz prometheus-*.tar.gz
cd prometheus-*

Modify configuration file prometheus.yml to collect Nacos metrics data

metrics_path: '/nacos/actuator/prometheus'
static_configs:
- targets: ['{ip1}:8848','{ip2}:8848','{ip3}:8848']

Start prometheus service

./prometheus --config.file="prometheus.yml"

windows

Download the corresponding version of Windows and decompress it

Modify configuration file prometheus.yml to collect Nacos metrics data

metrics_path: '/nacos/actuator/prometheus'
static_configs:
- targets:['{ip1}:8848','{ip2}:8848','{ip3}:8848']

Start prometheus service

prometheus.exe --config.file=prometheus.yml

By accessing http://{ip}:9090/graph, we can see the data collected by prometheus. By searching nacos_monitor in the search bar, we can find Nacos data to show the success of the data collection. IMAGE

Deploy grafana to graphically display metrics data

Install grafana on the same machine as prometheus, and use yum to install grafana

mac

brew install grafana
brew services start grafana

linux

sudo yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.4-1.x86_64.rpm
sudo service grafana-server start

windows

Reference document:http://docs.grafana.org/installation/windows/

Access grafana: http://{ip}:3000

Configuring prometheus data source IMAGE

Import Nacos grafana monitoring template IMAGE

Nacos monitoring is divided into three modules:

  • nacos monitor shows core monitoring items IMAGE
  • nacos detail shows the change curve of index IMAGE
  • nacos alert is alerts about nacos IMAGE

configure grafana alert

When Nacos runs out of order, Grafana can alert the person in charge. Grafana supports a variety of police alert. Mail, DingTalk and webhook are commonly used.

DingTalk alert

Configure DingTalk robots IMAGE

Configure DingTalk robots url IMAGE

Test alert IMAGE

mail alert

Modify defaults.ini configuration file to add mail alerts

#################################### SMTP / Emailing ##########################
[smtp]
enabled = true
host = smtp.126.com:25
user = xxxxxx
password = xxxxx
;cert_file =
;key_file =
skip_verify = true
from_address = xxxxxx@126.com
[emails]
;welcome_email_on_sign_up = false

Configuration notification mailbox IMAGE

meaning of Nacos metrics

jvm metrics

itemmeaning
system_cpu_usagecpu usage
system_load_average_1mload
jvm_memory_used_bytesjvm memory used(bytes)
jvm_memory_max_bytesjvm memory max(bytes)
jvm_gc_pause_seconds_countgc count
jvm_gc_pause_seconds_sumgc time
jvm_threads_daemonjvm threads count

Nacos metrics

itemmeaning
http_server_requests_seconds_counthttp requests count
http_server_requests_seconds_sumhttp requests time
nacos_timer_seconds_sumNacos config notify time
nacos_timer_seconds_countNacos config notify count
grpc_server_requests_seconds_maxNacos grpc request handle record
grpc_server_executor{name=‘maximumPoolSize’}Nacos grpc server executor maximum pool size
grpc_server_executor{name=‘corePoolSize’}Nacos grpc server executor core pool size
grpc_server_executor{name=‘taskCount’}Nacos grpc server executor task count
grpc_server_executor{name=‘poolSize’}Nacos grpc server executor current pool size
grpc_server_executor{name=‘activeCount’}Nacos grpc server executor current active thread count
grpc_server_executor{name=‘completedTaskCount’}Nacos grpc server executor completed task count
grpc_server_executor{name=‘inQueueTaskCount’}Nacos grpc server executor current in queue task count
nacos_monitor{name=‘longPolling’}Nacos config connection count
nacos_monitor{name=‘configCount’}Nacos configuration file count
nacos_monitor{name=‘dumpTask’}Nacos config dump task count
nacos_monitor{name=‘notifyTask’}Nacos config notify task count
nacos_monitor{name=‘getConfig’}Nacos config read configuration count
nacos_monitor{name=‘publish’}Nacos config update configuration count
nacos_monitor{name=‘ipCount’}Nacos naming ip count
nacos_monitor{name=‘domCount’}Nacos naming domain count(1.x version)
nacos_monitor{name=‘serviceCount’}Nacos naming domain count(2.x version)
nacos_monitor{name=‘failedPush’}Nacos naming push fail count
nacos_monitor{name=‘avgPushCost’}Nacos naming push cost time(average)(ms)
nacos_monitor{name=‘leaderStatus’}Nacos naming if node is leader
nacos_monitor{name=‘maxPushCost’}Nacos naming push cost time(max)(ms)
nacos_monitor{name=‘mysqlhealthCheck’}Nacos naming mysql health check count
nacos_monitor{name=‘httpHealthCheck’}Nacos naming http health check count
nacos_monitor{name=‘tcpHealthCheck’}Nacos naming tcp health check count
nacos_monitor{name=‘longConnection’}Nacos core connection count group by module

nacos exception

itemmeaning
nacos_exception_total{name=‘db’}database exception
nacos_exception_total{name=‘configNotify’}Nacos config notify exception
nacos_exception_total{name=‘unhealth’}Nacos config server health check exception
nacos_exception_total{name=‘disk’}Nacos naming write disk exception
nacos_exception_total{name=‘leaderSendBeatFailed’}Nacos naming leader send heart beat fail count
nacos_exception_total{name=‘illegalArgument’}request argument illegal count
nacos_exception_total{name=‘nacos’}Nacos inner exception

client metrics

itemmeaning
nacos_monitor{name=‘subServiceCount’}subscribed services count
nacos_monitor{name=‘pubServiceCount’}published services count
nacos_monitor{name=‘configListenSize’}listened configuration file count
nacos_client_request_seconds_countrequest count
nacos_client_request_seconds_sumrequest time

Nacos-Sync monitor

With the release of Nacos 0.9, Nacos-Sync 0.3 supports metrics monitoring. It can observe the running status of Nacos-Sync service through metrics data, and improve the monitoring capability of Nacos-Sync in production environment. Reference for the Construction of the Overall Monitoring System Nacos Monitoring Manual

grafana monitor Nacos-Sync

The same as Nacos monitoring, Nacos-Sync also provides monitoring templates to import monitoring Nacos-Sync templates

Nacos-Sync monitoring is also divided into three modules:

  • nacos-sync monitor shows core monitoring items monitor
  • nacos-sync detail and alert shows monitoring curves and alarms. detail

Nacos-Sync metrics meaning

Nacos-Sync metrics is divided into JVM layer and application layer

jvm metrics

itemmeaning
system_cpu_usagecpu usage
system_load_average_1mload
jvm_memory_used_bytesjvm memory used(bytes)
jvm_memory_max_bytesjvm memory max(bytes)
jvm_gc_pause_seconds_countgc count
jvm_gc_pause_seconds_sumgc time
jvm_threads_daemonjvm threads count

application metrics

itemmeaning
nacosSync_task_sizesync task count
nacosSync_cluster_sizecluster count
nacosSync_add_task_rtadd task time
nacosSync_delete_task_rtdelete task time
nacosSync_dispatcher_taskdispatcher task time
nacosSync_sync_task_errorsync task error count