Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undeclared inclusions building from source #1467

Open
mojitonoproblem opened this issue Oct 14, 2021 · 19 comments
Open

Undeclared inclusions building from source #1467

mojitonoproblem opened this issue Oct 14, 2021 · 19 comments

Comments

@mojitonoproblem
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Linux Ubuntu 20.04

  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
    None

  • TensorFlow installed from (source or binary):
    None

  • TensorFlow version:

$ git branch
* develop-upstream

  • Python version:
python --version
Python 3.9.7

  • Installed using virtualenv? pip? conda?:
    conda
  • Bazel version (if compiling from source):
$ bazel --version
bazel 3.7.2

  • GCC/Compiler version (if compiling from source):
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

  • ROCm/MIOpen version:
    4.3.0

  • GPU model and memory:
    Marketing Name: Hawaii PRO [Radeon R9 290/390]

Describe the problem

Trying to build, bazel gives the following error message:

ERROR: /home/minion/tensorflow-upstream/tensorflow/stream_executor/rocm/BUILD:393:11: undeclared inclusion(s) in rule '//tensorflow/stream_executor/rocm:rocm_helpers':  
this rule is missing dependency declarations for the following files included by 'tensorflow/stream_executor/rocm/rocm_helpers.cu.cc':                                   
  '/opt/rocm-4.3.0/hip/include/hip/hip_runtime.h'                                                                                                                        
  '/opt/rocm-4.3.0/hip/include/hip/hip_version.h'                                                                                                                        
  '/opt/rocm-4.3.0/hip/include/hip/hip_common.h'                                                                                                                         
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_runtime.h'                                                                                                             
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_common.h'
  '/opt/rocm-4.3.0/hip/include/hip/hip_runtime_api.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_runtime_api.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/host_defines.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/driver_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_texture_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/channel_descriptor.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_vector_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/texture_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_surface_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_ldg.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_atomic.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/device_functions.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/math_fwd.h'
  '/opt/rocm-4.3.0/hip/include/hip/hip_vector_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/device_library_decls.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/llvm_intrinsics.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/surface_functions.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/texture_fetch_functions.h'
  '/opt/rocm-4.3.0/hip/include/hip/hip_texture_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/ockl_image.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/texture_indirect_functions.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/math_functions.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_fp16_math_fwd.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/hip_memory.h'
  '/opt/rocm-4.3.0/hip/include/hip/library_types.h'
  '/opt/rocm-4.3.0/hip/include/hip/amd_detail/library_types.h'
clang-13: warning: argument unused during compilation: '-fcuda-flush-denormals-to-zero' [-Wunused-command-line-argument]
Target //tensorflow/tools/pip_package:build_pip_package failed to build

Provide the exact sequence of commands / steps that you executed before running into the problem

$ ./configure
$ bazel build --verbose_failures //tensorflow/tools/pip_package:build_pip_package

Any other info / logs
I tried declaring the dependency by issuing:
sudo ln -s /opt/rocm/include/ tensorflow/stream_executor/rocm/include

cc_library(
        name = "rocm_helpers",
        srcs = ["rocm_helpers.cu.cc"],
        hdrs = ["include/hip/hip_runtime.h"],
        deps =
        ["@local_config_rocm//rocm:rocm_headers",
        ],
        copts = rocm_copts(),
        alwayslink = True,
    )

but it only leads to another error (duplicate declaration).

@reza-amd
Copy link

reza-amd commented Oct 14, 2021

Could you please run the build_rocm_python3 script to start the build (it is located in the root folder of the repository)?

@xuhuisheng
Copy link

@mojitonoproblem
Just curious, Does hawaii r290/r390 can run properly on ROCm-4.3.0? ROCm teams said only ROCm-1.9.3 supports Hawaii, which released on 2018.
Do you test any small samples, for examples, hip square sample?

@reza-amd
Copy link

reza-amd commented Oct 14, 2021

@mojitonoproblem
I did not notice the GPU model you mentioned in the initial comment. Let me ping a member of the team for further assistance.
cc @sunway513

@mojitonoproblem
Copy link
Author

@mojitonoproblem Just curious, Does hawaii r290/r390 can run properly on ROCm-4.3.0? ROCm teams said only ROCm-1.9.3 supports Hawaii, which released on 2018. Do you test any small samples, for examples, hip square sample?

I didn't know that. Please let me know how to run those samples and I will post the results. Thank you

@mojitonoproblem
Copy link
Author

@mojitonoproblem I did not notice the GPU model you mentioned in the initial comment. Let me ping a member of the team for further assistance. cc @sunway513

Perfect. I hardcoded the path to the ROCM installation directory and the Python bin, and it is building so far.

@mojitonoproblem
Copy link
Author

@reza-amd I'm pasting the error resulting from running build_rocm_python3:

ERROR: /home/minion/.cache/bazel/_bazel_minion/95be990c2bc0fe49a10affcebca4a754/external/local_config_rocm/rocm/BUILD:129:11: @local_config_rocm//rocm:rocprim: missing input file 'external/local_config_rocm/rocm/rocm/include/hipcub/hipcub_version.hpp', owner: '@local_config_rocm//rocm:rocm/include/hipcub/hipcub_version.hpp'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /home/minion/.cache/bazel/_bazel_minion/95be990c2bc0fe49a10affcebca4a754/external/local_config_rocm/rocm/BUILD:129:11 2 input file(s) do not exist
INFO: Elapsed time: 5816.222s, Critical Path: 115.56s
INFO: 3324 processes: 109 internal, 3215 local.
FAILED: Build did NOT complete successfully

Let me know if I can do anything to help. Thank you

@deven-amd
Copy link

@mojitonoproblem do you have the ROCM_PATH and TF_NEED_ROCM env vars set when you run the configure command?
(for e.g. - https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/build_rocm_python3#L31 )

If not, please set, and retry
If you are setting them and still running into the error, please paste the .tf_configure.bazelrc file here

Your .tf_configure.bazelrc should look something like

root@ixt-rack-04:/root/tensorflow# cat .tf_configure.bazelrc 
build --action_env PYTHON_BIN_PATH="/usr/bin/python3"
build --action_env PYTHON_LIB_PATH="/usr/lib/python3/dist-packages"
build --python_path="/usr/bin/python3"
build --config=rocm
build --action_env ROCM_PATH="/opt/rocm-4.3.1"
build --action_env ROCBLAS_TENSILE_LIBPATH="/opt/rocm-4.3.1/lib/library"
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-Wno-sign-compare
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_env=LD_LIBRARY_PATH
test:v1 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial
test:v1 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu
test:v2 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial,-v1only
test:v2 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu,-v1only

note the --action_env ROCM_PATH=... line

@mojitonoproblem
Copy link
Author

@reza-amd Thank you for your kind help. I'm pasting the error resulted from the last attempt (with env variables and python path set):

ERROR: /home/minion/tensorflow-upstream/tensorflow/core/data/service/BUILD:544:23: //tensorflow/core/data/service:server_lib_headers_lib: missing inpu│··················
t file 'external/local_config_rocm/rocm/rocm/include/hipcub/hipcub_version.hpp', owner: '@local_config_rocm//rocm:rocm/include/hipcub/hipcub_version.h│··················
pp'                                                                                                                                                   │··················
Target //tensorflow/tools/pip_package:build_pip_package failed to build                                                                               │··················
ERROR: /home/minion/tensorflow-upstream/tensorflow/core/data/service/BUILD:544:23 2 input file(s) do not exist                                        │··················
INFO: Elapsed time: 5960.307s, Critical Path: 97.00s                                                                                                  │··················
INFO: 3406 processes: 56 internal, 3350 local.                                                                                                        │··················
FAILED: Build did NOT complete successfully                                                                                                           │··················

as well as .tf_configure.bazelrc, as requested:

$ cat tensorflow-upstream/.tf_configure.bazelrc 
build --action_env PYTHON_BIN_PATH="/home/minion/anaconda3/envs/ai/bin/python"
build --action_env PYTHON_LIB_PATH="/home/minion/anaconda3/envs/ai/lib/python3.9/site-packages"
build --python_path="/home/minion/anaconda3/envs/ai/bin/python"
build --config=rocm
build --action_env ROCM_PATH="/opt/rocm-4.3.0"
build --action_env ROCBLAS_TENSILE_LIBPATH="/opt/rocm-4.3.0/lib/library"
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-Wno-sign-compare
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_env=LD_LIBRARY_PATH
test:v1 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial
test:v1 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu
test:v2 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial,-v1only
test:v2 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu,-v1only

Thank you,

@deven-amd
Copy link

looks like you are using ROCm 4.3.0....can I request you to switch to ROCm 4.3.1 and try it out. thanks

@mojitonoproblem
Copy link
Author

@deven-amd It seems that there is a missing library, although it is actually installed:

$ /opt/rocm/bin/rocminfo
/opt/rocm/bin/rocminfo: error while loading shared libraries: libhsakmt.so.1: cannot open shared object file: No such file or directory

$ dpkg -L hsakmt-roct
/opt
/opt/rocm-4.3.0
/opt/rocm-4.3.0/lib
/opt/rocm-4.3.0/lib/libhsakmt.so.1.0.40300
/opt/rocm-4.3.0/share
/opt/rocm-4.3.0/share/doc
/opt/rocm-4.3.0/share/doc/hsakmt
/opt/rocm-4.3.0/share/doc/hsakmt/LICENSE.md
/opt/rocm-4.3.0/lib/libhsakmt.so
/opt/rocm-4.3.0/lib/libhsakmt.so.1

I cannot manage to get it work. Thanks

@mojitonoproblem
Copy link
Author

Please note that even with the repository 4.3.1, the installation directory is named 4.3.0.

@jayfurmanek
Copy link

jayfurmanek commented Oct 20, 2021

For ROCm 4.3.1, the directory should be /opt/rocm-4.3.1

# dpkg -L hsakmt-roct
/opt
/opt/rocm-4.3.1
/opt/rocm-4.3.1/lib
/opt/rocm-4.3.1/lib/libhsakmt.so.1.0.40301
/opt/rocm-4.3.1/share
/opt/rocm-4.3.1/share/doc
/opt/rocm-4.3.1/share/doc/hsakmt
/opt/rocm-4.3.1/share/doc/hsakmt/LICENSE.md
/opt/rocm-4.3.1/lib/libhsakmt.so
/opt/rocm-4.3.1/lib/libhsakmt.so.1

I think maybe your ROCm install is not quite right. Perhaps try removing it altogether and putting on 4.3.1 fresh.

@jayfurmanek
Copy link

Hi @mojitonoproblem,
Were you able to get a fresh rocm 4.3.1 install and try again?

@mojitonoproblem
Copy link
Author

mojitonoproblem commented Oct 27, 2021

Hi @jayfurmanek I was not able yet. I didn't want to add useless info to this thread. I started with a new install but cannot get it work. I'm gona try again today.

$ /opt/rocm-4.3.1/bin/rocminfo 
ROCk module is loaded
Unable to open /dev/kfd read-write: Cannot allocate memory
$ dmesg | grep kfd
$ 

Thanks.

@mojitonoproblem
Copy link
Author

I uninstalled, rebooted and reinstalled everything, but cannot get rid of the previous error message. Any hints?
Thanks in advance

@jayfurmanek
Copy link

It seems your ROCm install is still not right. How did you remove/reinstall?
Maybe that will shed some light on what is going on here.

Also, note we did just move the top of the develop-upstream branch to be ROCm-4.5 based if you are still interested in building the latest.

@mojitonoproblem
Copy link
Author

mojitonoproblem commented Nov 4, 2021

@jayfurmanek thank you, I removed the 4.3.0 version using sudo apt purge rocm* comgr rock-dkms, then changed the apt source to point to 4.3.1 and then sudo apt install rocm-dkms (after apt update of course).

Now I'm pulling develop-upstream and changing apt source to 4.5. After attempting to build I'll post my results.
Thanks

@mojitonoproblem
Copy link
Author

It appears that it cannot find rock-dkms. I just rebooted after removing 4.3.1 and issued the following command:

$ sudo apt install rocm-dkms
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 rocm-dkms : Depends: rock-dkms but it is not installable
E: Unable to correct problems, you have held broken packages.

@AliJahan
Copy link

AliJahan commented May 2, 2023

@mojitonoproblem I was able to find a workaround for this issue (I am not sure if it is the right way).
I added: "-I/opt/rocm/include/" to cops in the bazel file. After adding this, it got compiled successfully!
Hope it helps you as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants