Help to understand a leak report with non-obvious cycle #866
-
Hi! I'm experimenting with Hereafter a minimal example to reproduce:
The use case is about later being capable C++ side to create models by name and run them (not part of the snippet bellow): C++ code#include <nanobind/nanobind.h>
#include <nanobind/trampoline.h>
#include <nanobind/stl/string.h>
#include <functional>
#include <iostream>
#include <memory>
#include <map>
namespace nb = nanobind;
using namespace nb::literals;
class Model
{
public:
Model() = default;
virtual double compute() = 0;
virtual ~Model() {};
};
class Registry
{
public:
using model_factory_type = std::function<std::unique_ptr<Model>()>;
Registry() = default;
void add(const std::string& name, const model_factory_type& factory)
{
m_reg[name] = factory;
}
void clear()
{
m_reg.clear();
}
private:
std::map<std::string, model_factory_type> m_reg{};
};
struct PyModel : Model
{
NB_TRAMPOLINE(Model, 1);
double compute() override
{
NB_OVERRIDE_PURE(compute);
}
};
/// Keep the model created C++ side alive
struct PyModelWrapper : Model
{
PyModelWrapper(nb::object obj)
: m_obj(obj)
{
p_model = nb::cast<Model*>(obj);
}
double compute() override
{
return p_model->compute();
}
nb::object m_obj;
Model* p_model;
};
/// Allows to store Python types referred by the factories for `wrapper_tp_traverse`
class PyRegistry : public Registry
{
public:
std::vector<PyObject*> refs;
};
int
wrapper_tp_clear(PyObject* self)
{
PyRegistry* reg = nb::inst_ptr<PyRegistry>(self);
reg->clear();
reg->refs.clear();
return 0;
}
int
wrapper_tp_traverse(PyObject* self, visitproc visit, void* arg)
{
PyRegistry* reg = nb::inst_ptr<PyRegistry>(self);
for (auto ref : reg->refs)
Py_VISIT(ref);
return 0;
}
PyType_Slot slots[] = { { Py_tp_traverse, (void*) wrapper_tp_traverse },
{ Py_tp_clear, (void*) wrapper_tp_clear },
{ 0, nullptr } };
NB_MODULE(testpy, m)
{
nb::class_<Model, PyModel>(m, "Model").def(nb::init<>());
nb::class_<Registry>(m, "RegistryBase");
nb::class_<PyRegistry, Registry>(m, "Registry" /*, nb::type_slots(slots)*/) // works when adding those 2 slots
.def(nb::init<>())
.def("add",
[](PyRegistry& reg, const std::string& name, nb::type_object type)
{
// this lambda will increment the ref counter of the Python model type
auto factory = [type]() -> std::unique_ptr<Model>
{
nb::object new_model = type();
return std::make_unique<PyModelWrapper>(new_model); // PyModelWrapper keeps the model instance alive
};
reg.add(name, factory);
reg.refs.push_back(type.ptr()); // only required for wrapper_tp_traverse
})
.def("clear", &Registry::clear);
} Python codefrom testpy import Registry, Model
class ModelA(Model):
def compute(self):
return 1.02
if __name__ == "__main__":
registry = Registry()
registry.add("ModelA", ModelA)
## Uncommenting one of the following lines resolves the leak detection without implementing the cyclic garbage collection (as expected)
# registry.clear()
# del registry When running the Python code I get: outputs> python test.py
nanobind: leaked 1 instances!
- leaked instance 0x784c41f69de8 of type "testpy.Registry"
nanobind: leaked 3 types!
- leaked type "testpy.Model"
- leaked type "testpy.Registry"
- leaked type "testpy.RegistryBase"
nanobind: leaked 4 functions!
- leaked function "__init__"
- leaked function "add"
- leaked function "clear"
- leaked function "__init__"
nanobind: this is likely caused by a reference counting issue in the binding code. It works if I manually clear the I hope I didn't miss something trivial and/or documented. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
When capturing the auto factory = [h = nb::handle(type)]() -> std::unique_ptr<Model>
{
nb::object new_model = h();
return std::make_unique<PyModelWrapper>(new_model);
}; but I would like to avoid this because it could segfault if the from testpy import Registry, Model
import gc
def register(registry):
class ModelA(Model):
def compute(self):
return 1.02
registry.add("ModelA", ModelA)
if __name__ == "__main__":
registry = Registry()
register(registry)
gc.collect()
# any use here of "ModelA" factory added to `registry` would segfault The previous example extends the lifetime of the type, but it is also the source of the leak report. |
Beta Was this translation helpful? Give feedback.
I am not 100% sure, these things are confusing to track down. In the past, I used objgraph to find the reference cycles.
It is possible that the class object captures some variables implicitly (like a lambda capture object in the case of function). Probably that includes the outer function, which may in turn depend on other things. It's likely that something references the module itself, which contains all of the other stuff.
Generally, when you create bindings for things that can capture arbitrary variables in Python, a
tp_traverse
andtp_clear
are unavoidable.