Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(events): convert syscall arg to name at processing stage #4563

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

oshaked1
Copy link
Contributor

Since signatures now receive unparsed event arguments, conversion of syscall IDs to names became an issue since the signature has no way to know whether the event was generated on an x86_64 or ARM64 system. To solve this, we convert the syscall ID argument to its name at the event processing stage, so it happens regardless of argument parsing. This is applied to the suspicious_syscall_source and stack_pivot events.

Since signatures now receive unparsed event arguments, conversion of syscall IDs to names became an issue since the signature has no way to know whether the event was generated on an x86_64 or ARM64 system.
To solve this, we convert the syscall ID argument to its name at the event processing stage, so it happens regardless of argument parsing.
This is applied to the `suspicious_syscall_source` and `stack_pivot` events.
@geyslan
Copy link
Member

geyslan commented Jan 29, 2025

As discussed with @oshaked1 offline, this is required to consume tracee output through analyze, in a context that analyze (signatures) doesn't know the architecture from which the events were originated. I just wonder if providing that info in the event wouldn't be better - avoiding translations (int -> string -> int). RFC @yanivagman @NDStrahilevitz @rscampos.

@NDStrahilevitz
Copy link
Collaborator

As discussed with @oshaked1 offline, this is required to consume tracee output through analyze, in a context that analyze (signatures) doesn't know the architecture from which the events were originated. I just wonder if providing that info in the event wouldn't be better - avoiding translations (int -> string -> int). RFC @yanivagman @NDStrahilevitz @rscampos.

I think providing the host info, architecture in particular, is the solution and what immediately came to mind when I read the issue now.

@yanivagman
Copy link
Collaborator

yanivagman commented Jan 29, 2025

We actually have a solution exactly to that problem in the grpc server.
We assign a unique event ID to tracee events, so no matter what the CPU architecture is, the event id is the same:
https://github.com/aquasecurity/tracee/blob/main/pkg/server/grpc/tracee.go#L24

Once we will change the pipeline structure to use the protobuf struct, we can assign this event id in the decode stage

@oshaked1
Copy link
Contributor Author

We actually have a solution exactly to that problem in the grpc server. We assign a unique event ID to tracee events, so no matter what the CPU architecture is, the event id is the same: https://github.com/aquasecurity/tracee/blob/main/pkg/server/grpc/tracee.go#L24

Once we will change the pipeline structure to use the protobuf struct, we can assign this event id in the decode stage

@yanivagman how would that help with parsing syscall IDs?

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Jan 29, 2025

We actually have a solution exactly to that problem in the grpc server. We assign a unique event ID to tracee events, so no matter what the CPU architecture is, the event id is the same: https://github.com/aquasecurity/tracee/blob/main/pkg/server/grpc/tracee.go#L24

Once we will change the pipeline structure to use the protobuf struct, we can assign this event id in the decode stage

@yanivagman how would that help with parsing syscall IDs?

It would make tracee ids the syscall standard. so you consult the tracee event ids rather than the (unknown) architecture syscall id standard.

I don't think this has to be the solution though short term. there are other reasons to include this kind of host info in the event, considering that tracee usually runs as a daemonset.

@oshaked1
Copy link
Contributor Author

We actually have a solution exactly to that problem in the grpc server. We assign a unique event ID to tracee events, so no matter what the CPU architecture is, the event id is the same: https://github.com/aquasecurity/tracee/blob/main/pkg/server/grpc/tracee.go#L24
Once we will change the pipeline structure to use the protobuf struct, we can assign this event id in the decode stage

@yanivagman how would that help with parsing syscall IDs?

It would make tracee ids the syscall standard. so you consult the tracee event ids rather than the (unknown) architecture syscall id standard.

I don't think this has to be the solution though short term. there are other reasons to include this kind of host info in the event, considering that tracee usually runs as a daemonset.

I'm not sure I understand... Given a syscall ID in an event field, how do I find the corresponding syscall event if it has some sort of unique ID that is unrelated to the syscall ID? Or was the intention to add the unique event ID as an event field (or replace the syscall ID with the corresponding event unique ID) in the decode stage?

@yanivagman
Copy link
Collaborator

We actually have a solution exactly to that problem in the grpc server. We assign a unique event ID to tracee events, so no matter what the CPU architecture is, the event id is the same: https://github.com/aquasecurity/tracee/blob/main/pkg/server/grpc/tracee.go#L24
Once we will change the pipeline structure to use the protobuf struct, we can assign this event id in the decode stage

@yanivagman how would that help with parsing syscall IDs?

It would make tracee ids the syscall standard. so you consult the tracee event ids rather than the (unknown) architecture syscall id standard.

I don't think this has to be the solution though short term. there are other reasons to include this kind of host info in the event, considering that tracee usually runs as a daemonset.

I'm not sure I understand... Given a syscall ID in an event field, how do I find the corresponding syscall event if it has some sort of unique ID that is unrelated to the syscall ID? Or was the intention to add the unique event ID as an event field (or replace the syscall ID with the corresponding event unique ID) in the decode stage?

You can compare the unique event id to the enum given in tracee API (protobuf).
Anyway, this solution I mentioned is for event id, and in the case discussed here we discuss the syscall field. So it can be given as an inspiration for how to avoid the conversion from int to string and back to int. I'm also ok with always converting the syscall field to a string

Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yanivagman yanivagman merged commit 7bd8324 into aquasecurity:main Jan 30, 2025
42 checks passed
@oshaked1 oshaked1 deleted the syscall_arg_type branch January 30, 2025 10:58
geyslan added a commit to geyslan/tracee that referenced this pull request Feb 10, 2025
fix(epbf): fix handling of compat tasks in syscall checkers

fix(ebpf): treat sched_process_exit corner cases

The sched_process_exit event may be triggered by a standard exit, such
as a syscall, or by alternative kernel paths, making it unsafe to assume
that it is always associated with a syscall exit.

do_exit and do_exit_group, while typically invoked by the exit and
exit_group syscalls, can also be reached through internal kernel
mechanisms such as signal handling. A concrete example of this occurs
when a syscall returns, enters signal handling, and subsequently calls
do_exit after get_signal. Both get_signal and do_exit involve
tracepoints.

A real execution flow illustrating this scenario in the kernel is as
follows:

entry_SYSCALL_64
  ├── do_syscall_64
  ├── syscall_exit_to_user_mode
  ├── __syscall_exit_to_user_mode_work
  ├── exit_to_user_mode_prepare
  ├── exit_to_user_mode_loop
  ├── arch_do_signal_or_restart
  ├── get_signal  (has signal_deliver tracepoint)
  ├── do_group_exit
  └── do_exit  (has sched_process_exit tracepoint)

feat(events): convert syscall arg to name at processing stage (aquasecurity#4563)

Since signatures now receive unparsed event arguments, conversion of syscall IDs to names became an issue since the signature has no way to know whether the event was generated on an x86_64 or ARM64 system.
To solve this, we convert the syscall ID argument to its name at the event processing stage, so it happens regardless of argument parsing.
This is applied to the `suspicious_syscall_source` and `stack_pivot` events.

fix: get exit code and signal values

When a process exits normally via exit(n), the exit code is
stored in the upper byte (exit_code << 8). The lower byte is
used for signal information if the process was terminated by
a signal.

Also, align the type of exit_code used at struct task_struct.

fix(ebpf): revise thread stack identification logic
Identifying thread stacks by a VMA is unreliable because VMA splitting
and joining may cause the searched VMA to not align nicely with the
thread stack VMA.
Instead, we can identify a thread stack by an address, making the
identification straight-forward.

fix(pipeline): fix stack-addresses not working (aquasecurity#4579)

fix: type parsing net_tcp_connect

The code attempted to cast "type" as a string, which caused issues
when the actual value was of a different type. So this commit changes
the type to int32, aligning it with parsers.SOCK_STREAM.Value().

test: add test for event net_tcp_connect

perf(proc): use formatters for procfs file paths

Since the type of the converted primitive is already known, formatter
helpers should be used to construct procfs file paths instead of relying
on `fmt.Sprintf`. Using `fmt.Sprintf` is relatively costly due to its
internal formatting logic, which is unnecessary for simple path
construction.

perf(proc): introduce ReadFile for /proc

`os.ReadFile` is not efficient for reading files in `/proc` because it
attempts to determine the file size before reading. This step is
unnecessary for `/proc` files, as they are virtual files with sizes
that are often reported as unknown or `0`.

`proc.ReadFile` is a new function designed specifically for reading
files in `/proc`. It reads directly into a buffer and is more efficient
than `os.ReadFile` because it allows tuning the initial buffer size to
better suit the characteristics of `/proc` files.

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkReadFile$
github.com/aquasecurity/tracee/pkg/utils/proc -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/utils/proc
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkReadFile/ProcFSReadFile/Empty_File-32        10000000  3525 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile/Empty_File-32            10000000  4070 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Small_File-32        10000000  3961 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile/Small_File-32            10000000  4538 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Exact_Buffer_Size-32 10000000  4229 ns/op  920 B/op  5 allocs/op
BenchmarkReadFile/OsReadFile/Exact_Buffer_Size-32     10000000  4523 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile_/proc/self/stat-32   10000000  4043 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile_/proc/self/stat-32       10000000  4585 ns/op  872 B/op  5 allocs/op
PASS
ok  	github.com/aquasecurity/tracee/pkg/utils/proc	334.751s

perf(proc): improve stat file parsing

Remove the use of library functions to parse the stat file and instead
parse it manually (on the fly) to reduce the number of allocations and
improve performance.

chore(proc): align parsing of stat field with the formats size

This also align parsing sizes with the formats to avoid wrong parsing
of the stat file. The internal fields are represented aligned with the
actual kernel fields to avoid any confusion (signed/unsigned).

perf(proctree/proc): align fields to real size

Propagate values based on its real size which in most cases is smaller
than int (64-bit). This change reduces the memory footprint or at least
the stress on the stack/heap.

perf(proc): improve status file parsing

Remove the use of library functions to parse the status file and instead
parse it manually (on the fly) to reduce the number of allocations and
improve performance.

perf(proc): improve ns

Reduce ProcNS memory footprint by using the right member type sizes -
namespace id is an uint32, since it is the inode number in
struct ns_common.

This change also improves the performance of GetAllProcNS(), GetProcNS()
and GetMountNSFirstProcesses().

chore: introduce builders with specific fields

- NewProcStatFields()
- NewThreadStatFields()
- NewProcStatusFields()
- NewThreadStatusFields()

perf(proctree): remove stat call

Calling stat on /proc/<pid> would only increase the window for
process termination between the stat call and the read of the file.

This also replaces fmt.Sprintf with string concatenation and
strconv.FormatInt for better performance.

perf(proctree)!: rearrange struct fields

Mind the padding and the alignment of the fields to avoid wasting
memory.

BREAKING CHANGE: invoked_from_kernel, sched_process_exec event arg, is
now a bool.

perf(proctree): centralize child and thread Maps

Previously, each Process maintained its own maps for children and
threads, leading to significant overhead in the process tree. This
commit moves those maps into the ProcessTree, which now centrally
manages the children and threads for every process.

Additionally, this change allows us to simplify the Process struct by
removing the dedicated mutex that was solely used for protecting the
individual maps.

chore(proctree): use atomic types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants