Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VITIS-13806 Fix return codes for xrt-smi #8708

Closed
wants to merge 10 commits into from

Conversation

rchane
Copy link
Collaborator

@rchane rchane commented Jan 22, 2025

Problem solved by the commit

https://jira.xilinx.com/browse/VITIS-13806

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

Error codes returned by xrt-smi were being masked by loader and helper scripts such that return code was always zero.

How problem was solved, alternative solutions (if any) and why they were rejected

After discussion, we decided we no longer even needed the helper scripts. Similar to when the xbutil wrapper was removed on Windows for MCDM, the relevant helper scripts for xrt-smi were removed from Windows and Linux. In addition, xrt-smi has been moved up one directory to the bin directory.

Risks (if any) associated the changes in the commit

I will need to contact other teams regarding the removal of the scripts/the movement of xrt-smi for when they package xrt-smi with MCDM driver.

What has been tested and how, request additional testing if necessary

Tested on Linux, as desired, one can see the correct error code again.

$ xrt-smi configure --pmode performance
ERROR: Please specify a device using --device option
 Available devices:

$ echo $?
1

Documentation impact (if any)

N/A

@AShivangi
Copy link
Collaborator

Please test this on MCDM setup. We will have to probably change the inf file as well. This probably needs to be propogated to mcdm_sw_stack as well

@rchane
Copy link
Collaborator Author

rchane commented Jan 23, 2025

I have reached out to Akshata and confirmed the build on Windows MCDM. Akshata will take care of things on mcdm sw stack side!

@stsoe
Copy link
Collaborator

stsoe commented Jan 23, 2025

Hey @rchane @AShivangi. Can we really get rid of the loader scripts? At least on Linux, xrt-smi may be invoked directly as /opt/xilinx/xrt/bin/xrt-smi. This will not work unless LD_LIBRARY_PATH, XILINX_XRT are set. The point of the loader was not only to intelligently decide between xbutil and xbutil2, but more importantly to not require sourcing a setup script. Of course on windows we do not need a loader as xrt-smi.exe will be able to find xrt_coreutil in system32/.

@rchane
Copy link
Collaborator Author

rchane commented Jan 24, 2025

Hi @stsoe , if I understand correctly, we currently ask people to source the setup script on Linux in xdna-driver. Is this fine?

@stsoe
Copy link
Collaborator

stsoe commented Jan 24, 2025

Hi @stsoe , if I understand correctly, we currently ask people to source the setup script on Linux in xdna-driver. Is this fine?

Maybe, but AFAI can see, the change in this PR breaks the use-case I gave. I think it is reasonable to require that any tools we install, must be able to execute without any required setup.

@rchane rchane added the do not merge hold off on merging label Jan 24, 2025
@rchane
Copy link
Collaborator Author

rchane commented Jan 24, 2025

Hi @stsoe , if I understand correctly, we currently ask people to source the setup script on Linux in xdna-driver. Is this fine?

Maybe, but AFAI can see, the change in this PR breaks the use-case I gave. I think it is reasonable to require that any tools we install, must be able to execute without any required setup.

I see, thank you for the insight, Soren! I'll look into possible alternate solutions.

rchane and others added 9 commits January 24, 2025 13:38
* Update GHE with actions v3

* Update clang-tidy-review.yml

* Update clangtidy.yml

* Update coverity2.yml

* Update pipeline.yml

* Update xrt_master_2025.1.yml

* Update xrt_master_2025.1.yml
Amend Xilinx#8706. Use dummy value or XRT_DEV_COMPONENT when building only
base component.

Tempoary fix for cpackLin conditionally adding dependencies for
legacy XRT when XRT_DEV_COMPONENT equals "xrt".  We don't want the
dependencies in case of base package so use dummy value to by-pass.

Also, ensure that pcie/linux shim is built only when XRT_BASE is
active.  Linux shim is shared between npu and alveo, hence should be
in base component only.

Signed-off-by: Soren Soe <[email protected]>
* Add API to query system configuration

Signed-off-by: AShivangi <[email protected]>

* fix arg parsing

Signed-off-by: AShivangi <[email protected]>

* update copyright

* add total_cols to static region

---------

Signed-off-by: AShivangi <[email protected]>
* add hipMemsetD32Async().

Signed-off-by: Chiming Zhang <[email protected]>

* fix typo.

Signed-off-by: Chiming Zhang <[email protected]>

* Replace hard coded name with __func__.

Signed-off-by: Chiming Zhang <[email protected]>

* fix staled pointer error.

Signed-off-by: Chiming Zhang <[email protected]>

* fix double allocation issue.

Signed-off-by: Chiming Zhang <[email protected]>

* fix incorrect share_ptr use.

Signed-off-by: Chiming Zhang <[email protected]>

* add hipMemsetD32Async().

Signed-off-by: Chiming Zhang <[email protected]>

* fix typo.

Signed-off-by: Chiming Zhang <[email protected]>

* Replace hard coded name with __func__.

Signed-off-by: Chiming Zhang <[email protected]>

* fix staled pointer error.

Signed-off-by: Chiming Zhang <[email protected]>

* fix double allocation issue.

Signed-off-by: Chiming Zhang <[email protected]>

* 1) add lock in memory_database::get_hip_mem_from_addr()
2) avoid crash from getting hip mem object from null pointer
3) allow setting 0 for kernel arguments

Signed-off-by: Chiming Zhang <[email protected]>

* fix incorrect share_ptr use.

Signed-off-by: Chiming Zhang <[email protected]>

* Fix rebase error.

Signed-off-by: Chiming Zhang <[email protected]>

* move to_hex() into core/common/utils.h

Signed-off-by: Chiming Zhang <[email protected]>

* remove extra member in class copy_buffer.

Signed-off-by: Chiming Zhang <[email protected]>

* fix a typo.

Signed-off-by: Chiming Zhang <[email protected]>

* fix a typo.

Signed-off-by: Chiming Zhang <[email protected]>

* fix some typo and remove template use from copy_buffer class implementation.

Signed-off-by: Chiming Zhang <[email protected]>

* change type of err_msg argument of helper function throw_if() to const char*.

Signed-off-by: Chiming <[email protected]>

* fix incorrect usage of shared_ptr in copy_buffer constructor.

Signed-off-by: Chiming Zhang <[email protected]>

* change the place where host_vec is moved.

Signed-off-by: Chiming Zhang <[email protected]>

* add back missing std::move in copy_from_host_buffer_commad constructor.

Signed-off-by: Chiming Zhang <[email protected]>

* Add initial implementation of hip stream ordered memory allocator.

Signed-off-by: Chiming Zhang <[email protected]>

* Fix the size alignment in memory pool allocator.

Signed-off-by: Chiming Zhang <[email protected]>

* fix rebase error.

Signed-off-by: Chiming Zhang <[email protected]>

* fix rebase error.

Signed-off-by: Chiming Zhang <[email protected]>

* fix rebase error.

Signed-off-by: Chiming Zhang <[email protected]>

* Add comment for the choice of shared_ptr vs unique_ptr in in enqueing the async memcpy commands.

Signed-off-by: Chiming Zhang <[email protected]>

* Add comment for using shared_ptr for storage of pointer to memory_pool pointers.

Signed-off-by: Chiming Zhang <[email protected]>

* use unique_ptr for device_cache.

Signed-off-by: Chiming <[email protected]>

* Fix issues raised in code review.

Signed-off-by: Chiming <[email protected]>

* fix the error in memory::write().

Signed-off-by: Chiming <[email protected]>

* remove curly braces.

Signed-off-by: Chiming <[email protected]>

* Fix issues in hipMallocAsync() and hipFreeAsync().

Signed-off-by: Chiming <[email protected]>

* fix error found in unit testing.

Signed-off-by: Chiming <[email protected]>

* use sub class of hip::memory for async allocation from hip memory pool.

Signed-off-by: Chiming <[email protected]>

* fix the error in sub mem lookup from memory_database.

Signed-off-by: Chiming <[email protected]>

* remove sub_mem address map.

Signed-off-by: Chiming <[email protected]>

* code clean up.

Signed-off-by: Chiming <[email protected]>

* add code for hipDeviceGetDefaultMemPool(), hipDeviceGetMemPool() and hipDeviceSetMemPool().

Signed-off-by: Chiming <[email protected]>

* fix nullptr error.

Signed-off-by: Chiming <[email protected]>

* Fix compile error on Linux.

Signed-off-by: Chiming Zhang <[email protected]>

* Fix compile warning caused by using "int" type.

Signed-off-by: Chiming Zhang <[email protected]>

* fix compile error in release builds.

Signed-off-by: Chiming Zhang <[email protected]>

* Fix compile error.

Signed-off-by: Chiming Zhang <[email protected]>

* remove wrong header files included.

Signed-off-by: Chiming Zhang <[email protected]>

---------

Signed-off-by: Chiming Zhang <[email protected]>
Signed-off-by: Chiming <[email protected]>
Co-authored-by: Chiming <[email protected]>
Fix include search paths per Xilinx#8648.
Add config support for ROCm 6.2
Remove release of unused config.h

Signed-off-by: Soren Soe <[email protected]>
@rchane
Copy link
Collaborator Author

rchane commented Jan 24, 2025

Apologies for the extra review requests; I messed up my rebase. Please ignore!!!

@rchane rchane closed this Jan 24, 2025
@rchane rchane deleted the VITIS-13806 branch January 24, 2025 21:49
@rchane rchane restored the VITIS-13806 branch January 24, 2025 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge hold off on merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants