Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ti.init() failed on CUDA backend and complained about an undefined symbol __stack_chk_fail #7523

Closed
dream189free opened this issue Mar 9, 2023 · 4 comments
Assignees

Comments

@dream189free
Copy link
Contributor

dream189free commented Mar 9, 2023

Describe the bug
ti.init() failed on CUDA backend and complain Undefined external symbol "__stack_chk_fail" when using taichi built from source, (but it works well when using pre-built taichi 1.4.1 from pip).

To Reproduce

import taichi as ti
ti.init(
    debug=True,
    arch=ti.cuda,
    offline_cache=False,
    log_level=ti.TRACE)

Log/Screenshots

[Taichi] version 1.5.0, llvm 15.0.4, commit 930c4e21, linux, python 3.10.9
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
[T 03/09/23 17:22:37.160 205484] [cuda_driver.cpp:load_lib@39] libcuda.so loaded!
[T 03/09/23 17:22:37.160 205484] [cuda_driver.cpp:CUDADriver@59] CUDA driver API (v12.0) loaded.
[Taichi] Starting on arch=cuda
[T 03/09/23 17:22:37.160 205484] [program.cpp:Program@59] Program initializing...
[D 03/09/23 17:22:37.160 205487] [parallel_executor.cpp:worker_loop@71] Starting worker thread.
[D 03/09/23 17:22:37.160 205490] [parallel_executor.cpp:worker_loop@71] Starting worker thread.
[D 03/09/23 17:22:37.160 205488] [parallel_executor.cpp:worker_loop@71] Starting worker thread.
[D 03/09/23 17:22:37.160 205490] [parallel_executor.cpp:worker_loop@86] Worker thread initialized and running.
[D 03/09/23 17:22:37.160 205487] [parallel_executor.cpp:worker_loop@86] Worker thread initialized and running.
[D 03/09/23 17:22:37.160 205489] [parallel_executor.cpp:worker_loop@71] Starting worker thread.
[D 03/09/23 17:22:37.160 205489] [parallel_executor.cpp:worker_loop@86] Worker thread initialized and running.
[D 03/09/23 17:22:37.160 205488] [parallel_executor.cpp:worker_loop@86] Worker thread initialized and running.
[T 03/09/23 17:22:37.167 205484] [cuda_context.cpp:CUDAContext@25] Using CUDA device [id=0]: Tesla P4
[T 03/09/23 17:22:37.168 205484] [cuda_context.cpp:CUDAContext@50] CUDA Device Compute Capability: 6.1
[T 03/09/23 17:22:37.214 205484] [cuda_context.cpp:CUDAContext@56] Total memory 7.93 GB; free memory 7.79 GB
[T 03/09/23 17:22:37.214 205484] [cuda_context.cpp:CUDAContext@69] Emitting CUDA code for sm_61
[T 03/09/23 17:22:37.214 205484] [snode_tree_buffer_manager.cpp:SNodeTreeBufferManager@9] SNode tree buffer manager created.
[T 03/09/23 17:22:37.215 205484] [llvm_runtime_executor.cpp:LlvmRuntimeExecutor@111] CUDA max blocks per SM = 32
[T 03/09/23 17:22:37.215 205484] [llvm_context.cpp:TaichiLLVMContext@73] Creating Taichi llvm context for arch: cuda
[T 03/09/23 17:22:37.215 205484] [llvm_context.cpp:get_this_thread_data@865] Creating thread local data for thread 139676962867008
[T 03/09/23 17:22:37.279 205484] [llvm_context.cpp:TaichiLLVMContext@137] Taichi llvm context created.
[E 03/09/23 17:22:37.546 205484] [llvm_context.cpp:operator()@79] LLVM Fatal Error: Undefined external symbol "__stack_chk_fail"


Traceback (most recent call last):
  File "/home/lty/taichi/./tmp/test.py", line 5, in <module>
    ti.init(
  File "/home/lty/taichi/python/taichi/lang/misc.py", line 466, in init
    impl.get_runtime().create_program()
  File "/home/lty/taichi/python/taichi/lang/impl.py", line 346, in create_program
    self.prog = _ti_core.Program()
RuntimeError: [llvm_context.cpp:operator()@79] LLVM Fatal Error: Undefined external symbol "__stack_chk_fail"

Additional comments

System info: Arch Linux with

  • kernel version: 6.2.2
  • CUDA version: 11.5 (also tried 11.8)
  • LLVM version: 15.0.4, pre-built (also tried 15.0.5, built from source, following the instructions in docs)
  • clang version: 15.0.7
  • GCC version: 12.2.1
  • Python version: 3.10.9
  • NVIDIA GPU: Tesla P4 (also tried GT 730)

The output of command ti diagnose

[Taichi] version 1.5.0, llvm 15.0.5, commit 930c4e21, linux, python 3.10.9
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]
system: linux
executable: /usr/bin/python
platform: Linux-6.2.2-arch1-1-x86_64-with-glibc2.37
architecture: 64bit ELF
uname: uname_result(system='Linux', node='dell-optiplex', release='6.2.2-arch1-1', version='#1 SMP PREEMPT_DYNAMIC Fri, 03 Mar 2023 15:58:31 +0000', machine='x86_64')
locale: en_US.UTF-8
PATH: /opt/cuda-10.2/bin:/opt/cuda/bin:/home/lty/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
PYTHONPATH: ['/home/lty/.local/bin', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/home/lty/.local/lib/python3.10/site-packages', '/home/lty/taichi/python', '/usr/lib/python3.10/site-packages']

LSB Version:    n/a
Distributor ID: Arch
Description:    Arch Linux
Release:        rolling
Codename:       n/a



/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
import: <module 'taichi' from '/home/lty/taichi/python/taichi/__init__.py'>

/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
cc: True
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
cpu: True
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
metal: False
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
opengl: True
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
cuda: True
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
vulkan: True

OpenGL version 4.6.0 NVIDIA 525.89.02 is supported
GL_ARB_compute_shader:                                         OK
GL_ARB_gpu_shader_int64:                                       OK
GL_NV_shader_atomic_float:                                     OK
GL_NV_shader_atomic_float64:                                   OK
GL_NV_shader_atomic_int64:                                     OK

Thu Mar  9 22:01:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:01:00.0 Off |                  Off |
| N/A   30C    P0    22W /  75W |     34MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    166308      G   /usr/lib/Xorg                      34MiB |
+-----------------------------------------------------------------------------+

/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
[Taichi] version 1.5.0, llvm 15.0.5, commit 930c4e21, linux, python 3.10.9

/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
[Taichi] version 1.5.0, llvm 15.0.5, commit 930c4e21, linux, python 3.10.9
[Taichi] Starting on arch=x64

/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
python: /home/lty/taichi/taichi/rhi/window_system.cpp:59: void taichi::lang::window_system::glfw_context_release(): Assertion `false && "GLFW context double release?"' failed.
Taichi OpenGL test failed: Command '['/usr/bin/python', '-c', 'import taichi as ti; ti.init(arch=ti.opengl)']' died with <Signals.SIGABRT: 6>.
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
[E 03/09/23 22:01:25.912 322752] [llvm_context.cpp:operator()@79] LLVM Fatal Error: Undefined external symbol "__stack_chk_fail"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/lty/taichi/python/taichi/lang/misc.py", line 466, in init
    impl.get_runtime().create_program()
  File "/home/lty/taichi/python/taichi/lang/impl.py", line 346, in create_program
    self.prog = _ti_core.Program()
RuntimeError: [llvm_context.cpp:operator()@79] LLVM Fatal Error: Undefined external symbol "__stack_chk_fail"
Taichi CUDA test failed: Command '['/usr/bin/python', '-c', 'import taichi as ti; ti.init(arch=ti.cuda)']' returned non-zero exit status 1.
/home/lty/taichi/python/taichi/types/ndarray_type.py:91: DeprecationWarning: The element_dim and element_shape arguments for ndarray will be deprecated in v1.5.0, use matrix dtype instead.
  warnings.warn(
[Taichi] version 1.5.0, llvm 15.0.5, commit 930c4e21, linux, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                   TAICHI EXAMPLES
 ────────────────────────────────────────────────────────────────────────────────────
  0: ad_gravity               25: laplace                 50: physarum
  1: circle_packing_image     26: laplace_equation        51: poisson_disk_sampling
  2: comet                    27: mandelbrot_zoom         52: print_offset
  3: cornell_box              28: marching_squares        53: rasterizer
  4: diff_sph                 29: mass_spring_3d_ggui     54: regression
  5: euler                    30: mass_spring_game        55: sdf_renderer
  6: explicit_activation      31: mass_spring_game_ggui   56: simple_derivative
  7: export_mesh              32: mciso_advanced          57: simple_texture
  8: export_ply               33: mgpcg                   58: simple_uv
  9: export_videos            34: mgpcg_advanced          59: snow_phaseField
  10: fem128                  35: minimal                 60: stable_fluid
  11: fem128_ggui             36: minimization            61: stable_fluid_ggui
  12: fem99                   37: mpm128                  62: stable_fluid_graph
  13: fractal                 38: mpm128_ggui             63: taichi_bitmasked
  14: fractal3d_ggui          39: mpm3d                   64: taichi_dynamic
  15: fullscreen              40: mpm3d_ggui              65: taichi_logo
  16: game_of_life            41: mpm88                   66: taichi_ngp
  17: gui_image_io            42: mpm88_graph             67: taichi_sparse
  18: gui_widgets             43: mpm99                   68: texture_graph
  19: implicit_fem            44: mpm_lagrangian_forces   69: tutorial
  20: implicit_mass_spring    45: nbody                   70: two_stream_instability
  21: initial_value_problem   46: odop_solar              71: vortex_rings
  22: jacobian                47: oit_renderer            72: waterwave
  23: karman_vortex_street    48: patterns
  24: keyboard                49: pbf2d
 ────────────────────────────────────────────────────────────────────────────────────
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.38s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 4.65s
@turbo0628
Copy link
Member

This is a pretty weird behavior.. Is it on our CI bot?

Could it be a driver issue?

I have no problems with 495.29 with RTX3080.

@dream189free
Copy link
Contributor Author

This is a pretty weird behavior.. Is it on our CI bot?

Could it be a driver issue?

I have no problems with 495.29 with RTX3080.

No, it’s my personal PC. I think some cuda-related pre-compiled library was compiled with -fstack-protector (by system package manager?), which is not understood by taichi's runtime. I was unable to reproduce this issue on Ubuntu 20.04 with lateset CUDA driver (525?) and RTX 2060s. It seems Arch Linux should take the blame for this:(

@FantasyVR FantasyVR moved this from Untriaged to Backlog in Taichi Lang Mar 10, 2023
@FantasyVR FantasyVR added this to the v1.5.0 milestone Mar 10, 2023
@jim19930609
Copy link
Contributor

Per offline discussion with @dream189free, this error only happens with Taichi compiled locally. @dream189free further verified that Taichi-nightly is out of problem on his local workstation.

This issue shouldn't block v1.5.0 release since it's more related to the local compilation environment.

@jim19930609 jim19930609 removed this from the v1.5.0 milestone Mar 10, 2023
@github-project-automation github-project-automation bot moved this from Backlog to Done in Taichi Lang Mar 17, 2023
@arunoruto
Copy link

Sorry for commenting on a closed issue, but I am trying to package taichi in nixpkgs, and I am having the same error when running ti.init(arch=ti.cuda). The current version of the nix file is here: https://github.com/arunoruto/nixpkgs/blob/2f2b369b833fd3f4f0a3da0bc8ee27b9f5f7d927/pkgs/development/python-modules/taichi/default.nix

I am not doing anything exotic really. TI_WITH_CUDA is set to ON, TI_WITH_CUDA_TOOLKIT is OFF (setting it to ON results in the same error), and I provide the cudaPackages.cudatoolkit package as the library, which includes all cuda dependencies! I also analyzed the CI's build.yaml file, but couldn't figure out what I am missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

5 participants