Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sessions made with node-zookeeper-client always have 1 outstanding request #100

Open
lobachpavel opened this issue Dec 14, 2019 · 3 comments

Comments

@lobachpavel
Copy link

TL;DR: xid must not be zero in client requests - it screws up session, when request throttling happens

Background:

We are actively using Zookeeper (3.4.x but issue is valid for any version) in our infrastructure for storing various properties, which can be changed in runtime.
There are multiple application instances listening for these properties' changes (~60k active watchers total).
We have Java applications (connect using Curator) as well as Node clients (connect using node-zookeeper-client).

And we are facing weird behaviour when sessions made from Node clients are getting expired and closed by ZK server.
It usually happens when new property (ZNode) got created under watched path which causes applications to recursively re-read all ZNodes under this path.
Basically in response to a single watch event, ZK servers are getting back 20-40 'getdata' requests from the several hundred clients. globalOutstandingLimit limit is set to default 1000 and I know that it takes effect at these peak moments. ZK server itself does pretty good and responds to all these requests within few hundred milliseconds. However some Node clients (randomly) stop getting the response and in 20 seconds (negotiated session timeout) their sessions get expired and ZK servers disconnect them.

To answer obvious question: network is fine and it happens only to Node clients and never to Java clients and they are all running in the same VPC in the cloud.
I managed to take tcpdump from ZK severs at one of these moments and there is always same pattern on affected nodes (ZK - server, C - client):

ZK -> C (watch, path: /xxx)
C -> ZK (getchildren, path: /xxx)
ZK -> C (getchildren response (many child nodes))
C -> ZK (getdata: path: /xxx/a)
C -> ZK (getdata: path: /xxx/b)
.... (getdata for each node from getchildren request)

optionally there could be server response for several getdata requests, but not all like
ZK -> C (getdata response: path: /xxx/a)
C -> ZK PING
C -> ZK PING
... no responses for 20 sec ... except TCP ACK replies get sent for each client request
ZK -> C TCP RST (connection closed)
in ZK server logs there is:

Expiring session 0xXXXXXXXXXX, timeout of 20000ms exceeded

So it appears ZK server starts ignoring requests including ping from the client and at the same time, it keeps track of session and terminates it, like there were not any activity from the client at all!

Looking into ZK code I found only one suspicious place in zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxn.java:

    protected void incrOutstandingRequests(RequestHeader h) {
        if (h.getXid() >= 0) {
            synchronized (this) {
                outstandingRequests++;
            }
            synchronized (this.factory) {        
                // check throttling
                if (zkServer.getInProcess() > outstandingLimit) {
                    if (LOG.isDebugEnabled()) {
                        LOG.debug("Throttling recv " + zkServer.getInProcess());
                    }
                    disableRecv();
                    ..... skip comments
                }
            }
        }

    }

this method is called for each enqueued client request from ZooKeeperServt.processPacket(). It increments outstandingRequests (which holds number of enqueued requests for this particular connection) and implements actual throttling logic:
If total number of enqueued requests is above outstandingLimit (1000 in our case) it calls disableRecv() which effectively removes OP_READ from the client's socket interest ops which in turn prevents selecting this connection for reading new data (in this case data is piling up in the socket's backlog buffer which is done by OS network stack). Nothing wrong with this method itself.

But what interesting is that the logic which adds OP_READ back to the selector is not 100% symmetric (see sendResponse in the same class):

			if (h.getXid() > 0) {
                synchronized(this){
                    outstandingRequests--;
                }
                // check throttling
                synchronized (this.factory) {        
                    if (zkServer.getInProcess() < outstandingLimit
                            || outstandingRequests < 1) {
                        sk.selector().wakeup();
                        enableRecv();
                    }
                }
            }

Things to notice are:

  1. outstandingRequests will get decremented when h.getXid > 0 (as opposed to h.getXid() >= 0 when incrementing)
  2. enableRecv() which actually enables OP_READ will get called if:
    a) total number of enqueued requests is below outstandingLimit (1000)
    b) outstandingRequests (enqueued requests for this particular connection) is zero

Implications of all this are:

  1. For any processed request with xid = 0, outstandingRequests will not get decremented and therefore it will never reach zero
  2. (Given that 1 happened): When throttling happens (zkServer.getInProcess() >= outstandingLimit) and there is no more queued requests for given connection, OP_READ will never get enabled back, because no more events will happen for this connection and there is no other place to call enableRecv()

Now looking to node-zookeeper-client code I see that first request always has xid = 0 (lib/ConnectionManager.js):

...
this.xid = 0;
...
	header.xid = this.xid;
	this.xid += 1;

Indeed when I'm sending stat 4-letter command to our ZK server I see that all our Node clients' connections have queued=1 from the start and this value never goes down to zero.

Steps to reproduce:

  1. Run Zookeper server of any version locally (bundled zoo_sample.cfg is fine)
  2. Using zkCli create node /test with any data
  3. Clone node-zookeeper-client and run
    node examples/get.js localhost:2181 /test
    keep it running in background
  4. Execute echo stat |nc localhost 2181 several times

Expected result:

Clients:
 /127.0.0.1:62813[1](queued=0,recved=xxx,sent=yyy)

Actual result:

Clients:
 /127.0.0.1:62766[1](queued=1,recved=xxx,sent=yyy)

Proposed solution:

I'll create story for ZK guys, but it looks like 0 is invalid value for xid in requests (however it's not stated anywhere)

I quickly looked into other ZK client libraries and here is the result:
ZK bundled Java library: xid starts with 1 - no overflow protection
ZK bundled C library: xid starts with time() (Unix epoch) and gets incremented for each subsequent request - no overflow protection
Python kazoo library: xid starts with 1 and wraps over to 1 when it reaches 2147483647

looks like the latter is best option here

@lobachpavel
Copy link
Author

Follow-up ticket created in ZK JIRA: https://issues.apache.org/jira/browse/ZOOKEEPER-3650

Also I found that change for int overflow already done for Java client: ZOOKEEPER-3253
So for sure same change is needed here together with setting starting value to 1

Please review PR #101

@alexguan
Copy link
Owner

Thanks, I will find some time to look at it this week.

@lobachpavel
Copy link
Author

Hi @alexguan
Did you have a chance to check it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants