-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VITIS-13806 Fix return codes for xrt-smi #8708
Conversation
Signed-off-by: Ryan Chane <[email protected]>
Please test this on MCDM setup. We will have to probably change the inf file as well. This probably needs to be propogated to mcdm_sw_stack as well |
I have reached out to Akshata and confirmed the build on Windows MCDM. Akshata will take care of things on mcdm sw stack side! |
Hey @rchane @AShivangi. Can we really get rid of the loader scripts? At least on Linux, xrt-smi may be invoked directly as |
Hi @stsoe , if I understand correctly, we currently ask people to source the setup script on Linux in xdna-driver. Is this fine? |
Maybe, but AFAI can see, the change in this PR breaks the use-case I gave. I think it is reasonable to require that any tools we install, must be able to execute without any required setup. |
I see, thank you for the insight, Soren! I'll look into possible alternate solutions. |
Signed-off-by: Ryan Chane <[email protected]>
* Update GHE with actions v3 * Update clang-tidy-review.yml * Update clangtidy.yml * Update coverity2.yml * Update pipeline.yml * Update xrt_master_2025.1.yml * Update xrt_master_2025.1.yml
Amend Xilinx#8706. Use dummy value or XRT_DEV_COMPONENT when building only base component. Tempoary fix for cpackLin conditionally adding dependencies for legacy XRT when XRT_DEV_COMPONENT equals "xrt". We don't want the dependencies in case of base package so use dummy value to by-pass. Also, ensure that pcie/linux shim is built only when XRT_BASE is active. Linux shim is shared between npu and alveo, hence should be in base component only. Signed-off-by: Soren Soe <[email protected]>
* Add API to query system configuration Signed-off-by: AShivangi <[email protected]> * fix arg parsing Signed-off-by: AShivangi <[email protected]> * update copyright * add total_cols to static region --------- Signed-off-by: AShivangi <[email protected]>
* add hipMemsetD32Async(). Signed-off-by: Chiming Zhang <[email protected]> * fix typo. Signed-off-by: Chiming Zhang <[email protected]> * Replace hard coded name with __func__. Signed-off-by: Chiming Zhang <[email protected]> * fix staled pointer error. Signed-off-by: Chiming Zhang <[email protected]> * fix double allocation issue. Signed-off-by: Chiming Zhang <[email protected]> * fix incorrect share_ptr use. Signed-off-by: Chiming Zhang <[email protected]> * add hipMemsetD32Async(). Signed-off-by: Chiming Zhang <[email protected]> * fix typo. Signed-off-by: Chiming Zhang <[email protected]> * Replace hard coded name with __func__. Signed-off-by: Chiming Zhang <[email protected]> * fix staled pointer error. Signed-off-by: Chiming Zhang <[email protected]> * fix double allocation issue. Signed-off-by: Chiming Zhang <[email protected]> * 1) add lock in memory_database::get_hip_mem_from_addr() 2) avoid crash from getting hip mem object from null pointer 3) allow setting 0 for kernel arguments Signed-off-by: Chiming Zhang <[email protected]> * fix incorrect share_ptr use. Signed-off-by: Chiming Zhang <[email protected]> * Fix rebase error. Signed-off-by: Chiming Zhang <[email protected]> * move to_hex() into core/common/utils.h Signed-off-by: Chiming Zhang <[email protected]> * remove extra member in class copy_buffer. Signed-off-by: Chiming Zhang <[email protected]> * fix a typo. Signed-off-by: Chiming Zhang <[email protected]> * fix a typo. Signed-off-by: Chiming Zhang <[email protected]> * fix some typo and remove template use from copy_buffer class implementation. Signed-off-by: Chiming Zhang <[email protected]> * change type of err_msg argument of helper function throw_if() to const char*. Signed-off-by: Chiming <[email protected]> * fix incorrect usage of shared_ptr in copy_buffer constructor. Signed-off-by: Chiming Zhang <[email protected]> * change the place where host_vec is moved. Signed-off-by: Chiming Zhang <[email protected]> * add back missing std::move in copy_from_host_buffer_commad constructor. Signed-off-by: Chiming Zhang <[email protected]> * Add initial implementation of hip stream ordered memory allocator. Signed-off-by: Chiming Zhang <[email protected]> * Fix the size alignment in memory pool allocator. Signed-off-by: Chiming Zhang <[email protected]> * fix rebase error. Signed-off-by: Chiming Zhang <[email protected]> * fix rebase error. Signed-off-by: Chiming Zhang <[email protected]> * fix rebase error. Signed-off-by: Chiming Zhang <[email protected]> * Add comment for the choice of shared_ptr vs unique_ptr in in enqueing the async memcpy commands. Signed-off-by: Chiming Zhang <[email protected]> * Add comment for using shared_ptr for storage of pointer to memory_pool pointers. Signed-off-by: Chiming Zhang <[email protected]> * use unique_ptr for device_cache. Signed-off-by: Chiming <[email protected]> * Fix issues raised in code review. Signed-off-by: Chiming <[email protected]> * fix the error in memory::write(). Signed-off-by: Chiming <[email protected]> * remove curly braces. Signed-off-by: Chiming <[email protected]> * Fix issues in hipMallocAsync() and hipFreeAsync(). Signed-off-by: Chiming <[email protected]> * fix error found in unit testing. Signed-off-by: Chiming <[email protected]> * use sub class of hip::memory for async allocation from hip memory pool. Signed-off-by: Chiming <[email protected]> * fix the error in sub mem lookup from memory_database. Signed-off-by: Chiming <[email protected]> * remove sub_mem address map. Signed-off-by: Chiming <[email protected]> * code clean up. Signed-off-by: Chiming <[email protected]> * add code for hipDeviceGetDefaultMemPool(), hipDeviceGetMemPool() and hipDeviceSetMemPool(). Signed-off-by: Chiming <[email protected]> * fix nullptr error. Signed-off-by: Chiming <[email protected]> * Fix compile error on Linux. Signed-off-by: Chiming Zhang <[email protected]> * Fix compile warning caused by using "int" type. Signed-off-by: Chiming Zhang <[email protected]> * fix compile error in release builds. Signed-off-by: Chiming Zhang <[email protected]> * Fix compile error. Signed-off-by: Chiming Zhang <[email protected]> * remove wrong header files included. Signed-off-by: Chiming Zhang <[email protected]> --------- Signed-off-by: Chiming Zhang <[email protected]> Signed-off-by: Chiming <[email protected]> Co-authored-by: Chiming <[email protected]>
Fix include search paths per Xilinx#8648. Add config support for ROCm 6.2 Remove release of unused config.h Signed-off-by: Soren Soe <[email protected]>
Signed-off-by: Ryan Chane <[email protected]>
Apologies for the extra review requests; I messed up my rebase. Please ignore!!! |
Problem solved by the commit
https://jira.xilinx.com/browse/VITIS-13806
Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
Error codes returned by xrt-smi were being masked by loader and helper scripts such that return code was always zero.
How problem was solved, alternative solutions (if any) and why they were rejected
After discussion, we decided we no longer even needed the helper scripts. Similar to when the xbutil wrapper was removed on Windows for MCDM, the relevant helper scripts for xrt-smi were removed from Windows and Linux. In addition, xrt-smi has been moved up one directory to the bin directory.
Risks (if any) associated the changes in the commit
I will need to contact other teams regarding the removal of the scripts/the movement of xrt-smi for when they package xrt-smi with MCDM driver.
What has been tested and how, request additional testing if necessary
Tested on Linux, as desired, one can see the correct error code again.
Documentation impact (if any)
N/A