Sessions made with node-zookeeper-client always have 1 outstanding request #100

lobachpavel · 2019-12-14T18:13:53Z

TL;DR: xid must not be zero in client requests - it screws up session, when request throttling happens

Background:

We are actively using Zookeeper (3.4.x but issue is valid for any version) in our infrastructure for storing various properties, which can be changed in runtime.
There are multiple application instances listening for these properties' changes (~60k active watchers total).
We have Java applications (connect using Curator) as well as Node clients (connect using node-zookeeper-client).

And we are facing weird behaviour when sessions made from Node clients are getting expired and closed by ZK server.
It usually happens when new property (ZNode) got created under watched path which causes applications to recursively re-read all ZNodes under this path.
Basically in response to a single watch event, ZK servers are getting back 20-40 'getdata' requests from the several hundred clients. globalOutstandingLimit limit is set to default 1000 and I know that it takes effect at these peak moments. ZK server itself does pretty good and responds to all these requests within few hundred milliseconds. However some Node clients (randomly) stop getting the response and in 20 seconds (negotiated session timeout) their sessions get expired and ZK servers disconnect them.

To answer obvious question: network is fine and it happens only to Node clients and never to Java clients and they are all running in the same VPC in the cloud.
I managed to take tcpdump from ZK severs at one of these moments and there is always same pattern on affected nodes (ZK - server, C - client):

ZK -> C (watch, path: /xxx)
C -> ZK (getchildren, path: /xxx)
ZK -> C (getchildren response (many child nodes))
C -> ZK (getdata: path: /xxx/a)
C -> ZK (getdata: path: /xxx/b)
.... (getdata for each node from getchildren request)

optionally there could be server response for several getdata requests, but not all like
ZK -> C (getdata response: path: /xxx/a)
C -> ZK PING
C -> ZK PING
... no responses for 20 sec ... except TCP ACK replies get sent for each client request
ZK -> C TCP RST (connection closed)
in ZK server logs there is:

Expiring session 0xXXXXXXXXXX, timeout of 20000ms exceeded

So it appears ZK server starts ignoring requests including ping from the client and at the same time, it keeps track of session and terminates it, like there were not any activity from the client at all!

Looking into ZK code I found only one suspicious place in zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxn.java:

    protected void incrOutstandingRequests(RequestHeader h) {
        if (h.getXid() >= 0) {
            synchronized (this) {
                outstandingRequests++;
            }
            synchronized (this.factory) {        
                // check throttling
                if (zkServer.getInProcess() > outstandingLimit) {
                    if (LOG.isDebugEnabled()) {
                        LOG.debug("Throttling recv " + zkServer.getInProcess());
                    }
                    disableRecv();
                    ..... skip comments
                }
            }
        }

    }

this method is called for each enqueued client request from ZooKeeperServt.processPacket(). It increments outstandingRequests (which holds number of enqueued requests for this particular connection) and implements actual throttling logic:
If total number of enqueued requests is above outstandingLimit (1000 in our case) it calls disableRecv() which effectively removes OP_READ from the client's socket interest ops which in turn prevents selecting this connection for reading new data (in this case data is piling up in the socket's backlog buffer which is done by OS network stack). Nothing wrong with this method itself.

But what interesting is that the logic which adds OP_READ back to the selector is not 100% symmetric (see sendResponse in the same class):

			if (h.getXid() > 0) {
                synchronized(this){
                    outstandingRequests--;
                }
                // check throttling
                synchronized (this.factory) {        
                    if (zkServer.getInProcess() < outstandingLimit
                            || outstandingRequests < 1) {
                        sk.selector().wakeup();
                        enableRecv();
                    }
                }
            }

Things to notice are:

outstandingRequests will get decremented when h.getXid > 0 (as opposed to h.getXid() >= 0 when incrementing)
enableRecv() which actually enables OP_READ will get called if:
a) total number of enqueued requests is below outstandingLimit (1000)
b) outstandingRequests (enqueued requests for this particular connection) is zero

Implications of all this are:

For any processed request with xid = 0, outstandingRequests will not get decremented and therefore it will never reach zero
(Given that 1 happened): When throttling happens (zkServer.getInProcess() >= outstandingLimit) and there is no more queued requests for given connection, OP_READ will never get enabled back, because no more events will happen for this connection and there is no other place to call enableRecv()

Now looking to node-zookeeper-client code I see that first request always has xid = 0 (lib/ConnectionManager.js):

...
this.xid = 0;
...
	header.xid = this.xid;
	this.xid += 1;

Indeed when I'm sending stat 4-letter command to our ZK server I see that all our Node clients' connections have queued=1 from the start and this value never goes down to zero.

Steps to reproduce:

Run Zookeper server of any version locally (bundled zoo_sample.cfg is fine)
Using zkCli create node /test with any data
Clone node-zookeeper-client and run
node examples/get.js localhost:2181 /test
keep it running in background
Execute echo stat |nc localhost 2181 several times

Expected result:

Clients:
 /127.0.0.1:62813[1](queued=0,recved=xxx,sent=yyy)

Actual result:

Clients:
 /127.0.0.1:62766[1](queued=1,recved=xxx,sent=yyy)

Proposed solution:

I'll create story for ZK guys, but it looks like 0 is invalid value for xid in requests (however it's not stated anywhere)

I quickly looked into other ZK client libraries and here is the result:
ZK bundled Java library: xid starts with 1 - no overflow protection
ZK bundled C library: xid starts with time() (Unix epoch) and gets incremented for each subsequent request - no overflow protection
Python kazoo library: xid starts with 1 and wraps over to 1 when it reaches 2147483647

looks like the latter is best option here

The text was updated successfully, but these errors were encountered:

lobachpavel · 2019-12-16T08:19:13Z

Follow-up ticket created in ZK JIRA: https://issues.apache.org/jira/browse/ZOOKEEPER-3650

Also I found that change for int overflow already done for Java client: ZOOKEEPER-3253
So for sure same change is needed here together with setting starting value to 1

Please review PR #101

alexguan · 2019-12-16T17:25:29Z

Thanks, I will find some time to look at it this week.

lobachpavel · 2020-07-23T11:39:25Z

Hi @alexguan
Did you have a chance to check it?

lobachpavel mentioned this issue Dec 14, 2019

Change xid computation logic #101

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sessions made with node-zookeeper-client always have 1 outstanding request #100

Sessions made with node-zookeeper-client always have 1 outstanding request #100

lobachpavel commented Dec 14, 2019

lobachpavel commented Dec 16, 2019

alexguan commented Dec 16, 2019

lobachpavel commented Jul 23, 2020

Sessions made with node-zookeeper-client always have 1 outstanding request #100

Sessions made with node-zookeeper-client always have 1 outstanding request #100

Comments

lobachpavel commented Dec 14, 2019

Background:

Steps to reproduce:

Expected result:

Actual result:

Proposed solution:

lobachpavel commented Dec 16, 2019

alexguan commented Dec 16, 2019

lobachpavel commented Jul 23, 2020