Java Client Failover

We can turn on the local data failover feature to handle the situation when Nacos server side is unstable or has problematic data.

There are two typical scenarios:

When Nacos server is in deployment, we can switch on the failover so the clients use local data only. The data anomaly or oscillation at Nacos server won’t affect the clients. After the deployment and the data verification are done, we can switch off the failover feature.
When there is a sudden data anomaly at Nacos server at runtime, we can turn on the failover feature to prevent Nacos clients using wrong data.

The full detailed solution description can be found in https://github.com/alibaba/nacos/issues/11053

Procedures

As shown above, the query requests to Nacos client would first be checked by FailoverReactor, and only if FailoverReactor has no related data, can the requests move on to query ServiceInfoHolder.

Disk based Failover

FailoverReactor can select different data sources. Disk is the default option.

Disk Failover File Path

The default path of disk failover files are:

{user.home}/nacos/naming/{namespace}/failover

This path can be customised via -D argument:

-DJM.SNAPSHOT.PATH=/mypath

So the path becomes:

/mypath/nacos/naming/{namespace}/failover

Disk Failover Switch

The disk failover switch is stored in a file with name:

00-00---000-VIPSRV_FAILOVER_SWITCH-000---00-00

The content of this file is just a number 0 or 1, where 0 represents failover is off, 1 is on.

Disk Failover Data

The disk failover data is stored in multiple files under the failover path. Each file stores the failover data for a single service.

The file name is in the following format:

{group.name}%40%40{service.name}

The content in the file is the JSON string of one ServiceInfo object, for instance:

{
    "name":"DEFAULT_GROUP@@test.2",
    "groupName":"DEFAULT_GROUP",
    "clusters":"",
    "cacheMillis":10000,
    "hosts":[
        {
            "instanceId":"1.1.2.1#8888#DEFAULT#DEFAULT_GROUP@@test.2",
            "ip":"1.1.2.1",
            "port":8888,
            "weight":1,
            "healthy":true,
            "enabled":true,
            "ephemeral":true,
            "clusterName":"DEFAULT",
            "serviceName":"DEFAULT_GROUP@@test.2",
            "metadata":{
                "k1":"v1"
            },
            "instanceHeartBeatInterval":5000,
            "instanceHeartBeatTimeOut":15000,
            "ipDeleteTimeout":30000
        }
    ],
    "lastRefTime":1689835375819,
    "checksum":"",
    "allIPs":false,
    "reachProtectionThreshold":false,
    "valid":true
}

Extent Failover Data Source

Disk failover is simple and requires no extra remote components. But sometimes we may want to use another kind of data source, such as Redis, Mysql, etc.

Now we support extending the failover data source with SPI mechanism. Here are the steps:

Develop Your Own Failover Data Source

Write a class and implement the interface com.alibaba.nacos.client.naming.backups.FailoverDataSource:

public class MyFailoverDataSource implements FailoverDataSource {

    @Override
    public FailoverSwitch getSwitch() {
        // TODO write your own implementation.
        return null;
    }

    @Override
    public Map<String, FailoverData> getFailoverData() {
        // TODO write your own implementation. For naming module, the map
        // should contain failover data with service name as key and ServiceInfo as value
        return null;
    }
}

Configure Failover Data Source Class

Create a file under the resource root path:

{resource.root}/META-INF/services/com.alibaba.nacos.client.naming.backups.FailoverDataSource

One example of {resource.root} is src/main/resources.

The file content is:

your.package.MyFailoverDataSource