Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shell API unsafe to use outside of command handlers #84274

Open
AaronFontaine-DojoFive opened this issue Jan 21, 2025 · 2 comments
Open

Shell API unsafe to use outside of command handlers #84274

AaronFontaine-DojoFive opened this issue Jan 21, 2025 · 2 comments
Assignees
Labels
area: Shell Shell subsystem bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@AaronFontaine-DojoFive
Copy link

Describe the bug

The shell API is unsafe to call except from a shell's own thread.

The shell documentation says that the shell print functions (shell_fprintf(), shell_info(), shell_warn(), etc.) can be called from a command handler or from threads. In reality, it is only safe to call these functions (and certain other shell API functions) from a shell's own processing loop.

The problem is the use of the write mutex (sh->ctx->wr_mutex) in conjunction with the autoflush behavior assigned to sh->fprintf_ctx->ctrl_blk->autoflush. Autoflush causes any characters written to a shell to try to be flushed immediately, which will cause a blocking wait on SHELL_SIGNAL_TXDONE in shell_pend_on_txdone(), called from z_shell_write() called from z_shell_print_stream().

For certain transports, their physical interface may not be currently available. This is the case for USB CDC ACM UART if no USB host is connected, or if a USB host is connected, but no terminal is open on the serial device created by the CDC ACM. When this is the case, the k_poll() operation will pend indefinitely waiting for a client to whom the buffer can be sent. This is because k_poll() is called with K_FOREVER as a timeout.

When the block on k_poll() happens, it blocks all processing in the shell thread's processing loop and holds indefinitely the write mutex owned by that shell instance. Any attempts to perform operations on that UART from another thread will block (i.e. "lock up") the calling thread.

The unsafe functions are all of the shell print operations, shell_start(), and shell_execute_cmd(). shell_prompt_change() will not hang but will simply fail to change the prompt if the write mutex is unavailable.

There is not an option to change the autoflush behavior in any shell defined through the device tree since Z_SHELL_DEFINE() calls Z_SHELL_FPRINTF_DEFINE() with the autoflush parameter set to true. Changing this would require either patching the Zephyr source or creating one's own SHELL_DEFINE() macros and using them to instantiate their own shell instance outside of the Zephyr source.

An autostarting shell can hang in shell_start() called by shell_thread() before it enters its processing loop. There is no easy way to detect that this has occurred due to state_set() setting sh->ctx->state to SHELL_STATE_ACTIVE before it calls z_shell_print_prompt_and_cmd().

Nor is it safe to call shell_start() for a shell instance that is not auto-started due to the fact that it may block on k_poll waiting for SHELL_SIGNAL_TXDONE. Since a shell instance is abstracted from its transport by the struct shell_transport_api abstract interface, there is not a direct way of knowing the state of the shell's transport. The shell API is supposed to be usable regardless of the underlying transport.

Note that all of these issues also apply to a shell based on the Nordic UART service. Such a shell will also block indefinitely on SHELL_SIGNAL_TXDONE as long as no remote host is connected.

To Reproduce

  • Create a project with a UART-based shell.
  • Set zephyr,shell-uart to &cdc_acm_uart0 in the devicetree's chosen node.
  • Call shell_print() on the shell from main() or from any thread not belonging to the shell. (Use shell_backend_uart_get_ptr() to get a reference to the shell.)
  • Build and load the target board.
  • Ensure that MCU's USB is NOT connected to a host.
  • Power on or reset the board.
  • Observe that main() is stuck permanently in the call to shell_print(). (Use log statements or debugger to confirm this.)

Expected behavior

The shell API is safe to call from any non-ISR execution context.

Impact

High. This impacts the ability to implement an inactivity based timeout for the shell, and also to print warnings prior to the timeout. Using the pattern of a k_timer timeout handler invoking a k_work job, the workqueue invoked may become hung.

Security is important in our product and inactivity timeout is a requirement. I will need to find a suitable workaround that doesn't hang any of the threads.

Logs and console output

None. The device hangs then performs a watchdog reset. No logs are created when this happens other than the watchdog reset reason after reboot.

Environment (please complete the following information):

  • OS: WSL2 Ubuntu 22.04.4 LTS on Windows 10 Pro 22H2
  • Toolchain: Zephyr SDK 0.16.9
  • Version: 3.7.0 (commit 36940db)
@AaronFontaine-DojoFive AaronFontaine-DojoFive added the bug The issue is a bug, or the PR is fixing a bug label Jan 21, 2025
@pdgendt pdgendt added the area: Shell Shell subsystem label Jan 21, 2025
@jakub-uC
Copy link
Contributor

@AaronFontaine-DojoFive nice catch!

jakub-uC added a commit to jakub-uC/zephyr that referenced this issue Jan 21, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Changes made:
1. Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
   by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
   API functions to prevent indefinite blocking.
2. Added a new Kconfig option `SHELL_PRINTF_AUTOFLUSH` to allow
   configurable autoflush behavior for shell printing functions.
3. Updated `Z_SHELL_FPRINTF_DEFINE` to use the
   `CONFIG_SHELL_PRINTF_AUTOFLUSH` setting instead of hardcoding
   the autoflush behavior to `true`.

Fixes zephyrproject-rtos#84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
jakub-uC added a commit to jakub-uC/zephyr that referenced this issue Jan 21, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
API functions to prevent indefinite blocking.

Fixes zephyrproject-rtos#84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
@kartben kartben added the priority: low Low impact/importance bug label Jan 21, 2025
@kartben
Copy link
Collaborator

kartben commented Jan 21, 2025

prioritizing as "low", mostly due to it having a PR now lined up

jakub-uC added a commit to jakub-uC/zephyr that referenced this issue Jan 22, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
API functions to prevent indefinite blocking.

Link: zephyrproject-rtos#84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
kartben pushed a commit that referenced this issue Jan 23, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
API functions to prevent indefinite blocking.

Link: #84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
zephyrbot pushed a commit that referenced this issue Jan 23, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
API functions to prevent indefinite blocking.

Link: #84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
(cherry picked from commit b0a0feb)
jakub-uC added a commit that referenced this issue Jan 27, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
API functions to prevent indefinite blocking.

Link: #84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
(cherry picked from commit b0a0feb)
dkalowsk pushed a commit that referenced this issue Jan 31, 2025
Fixes an issue where the shell API could block indefinitely when called
from threads other than the shell's processing thread, especially when
the transport (e.g. USB CDC ACM) was unavailable or inactive.

Replaced `k_mutex_lock` calls with an indefinite timeout (`K_FOREVER`)
by using a fixed timeout (`K_MSEC(SHELL_TX_MTX_TIMEOUT_MS)`) in shell
API functions to prevent indefinite blocking.

Link: #84274

Signed-off-by: Jakub Rzeszutko <[email protected]>
(cherry picked from commit b0a0feb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Shell Shell subsystem bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants