Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition nil channel panic when calling server.Shutdown() #6358

Open
Nintron27 opened this issue Jan 9, 2025 · 2 comments
Open

Race condition nil channel panic when calling server.Shutdown() #6358

Nintron27 opened this issue Jan 9, 2025 · 2 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@Nintron27
Copy link

Observed behavior

When running NATS embedded in go, if you start the server then create a context from a system signal, such as SIGTERM, and
call server.Shutdown() on that contexts cancellation, it will cause a "close of nil channel" panic in server.(*Server).shutdownEventing.

I believe this is due to these chains of events:

  1. server.(*Server).Shutdown() calls server.(*Server).shutdownEventing()
  2. server.(*Server).shutdownEventing() first calls server.(*Server).eventsRunning()
  3. server.(*Server).eventsRunning() calls server.(*Server).isRunning()
  4. server.(*Server).isRunning() returns the result from the Server.running atomic bool, which will return true, because it is only set to false in later in server.(*Server).Shutdown()
  5. isRunning() returns true
  6. eventsRunning() returns true
  7. shutdownEventing() now thinks the server is running so it continues, and tries to close rc on line 1826 even though it will be nil.

Expected behavior

Multiple simultaneous calls shouldn't cause server.Shutdown() to panic.

Server and client version

server: v2.10.22
client: N/A

Host environment

OS: NixOS 24.11
CPU architecture: x86_64

Steps to reproduce

Steps

  1. Create and start an embedded nats-server in a Go program.
  2. Wait till server is ready for connections.
  3. Create a signal notify context (such as SIGTERM)
  4. Call server.Shutdown() when the signal's done channel is closed.

Example

package main

import (
	"context"
	"log"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/nats-io/nats-server/v2/server"
)

func main() {
	ns, err := server.NewServer(&server.Options{})
	if err != nil {
		panic(err)
	}

	ns.Start()

	if !ns.ReadyForConnections(time.Second * 5) {
		panic("Timeout exceeded")
	}

	sigCtx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
	defer cancel()

	<-sigCtx.Done()
	log.Println("Shutting down server...")

	log.Println(ns.Running())
	if ns.Running() {
		ns.Shutdown() // Causes panic, even though ns.Running() returns true
	}
	log.Println("shutdown started")
	ns.WaitForShutdown()

	log.Println("Shutdown completed")
}
@Nintron27 Nintron27 added the defect Suspected defect such as a bug or regression label Jan 9, 2025
@ripienaar
Copy link
Contributor

You s should set NoSigs on the options structure to avoid double signal handling.

@Nintron27
Copy link
Author

You s should set NoSigs on the options structure to avoid double signal handling.

Typically I would do that however the bug was discovered from a wrapper library that provides a Shutdown function with a timeout context. If that function checks if the nats-server has NoSigs set to true, then it can just call Shutdown without any worry, however, if NoSigs is set to false then it doesn't know if nats-server is already handling the shutdown, and should just wait till it's done, or call Shutdown itself since ns.Running() (in the example provided) returns true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants