You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've written a simple example to test the performance of MbedTLS. It's unoptimized and probably incorrect in some aspects, but I hope it shows the issue that I'm facing when using MbedTLS through HTTP.jl.
On machine with 8 cores and 1.5GB/s NIC throughput, this achieves a bit less than 200 MB/s. CPU is 100%, and it takes ~22s. mbedtls_gcm_update takes 40%, which means that CPU time spent in that function is ~70s (accounting for 8 cores). My assumption is that this function doesn't do network communication nor invokes it, but does pure processing.
So throughput of mbedtls_gcm_update is effectively ~58 MB/s per core on this machine.
This means that while machine has 1.5GB/s throughput, mbedtls_gcm_update is taking time, allowing only for around ~464MB/s for 8 cores in ideal conditions (no other CPU usage in the callstack), and would require more than 24 cores to utilise full NIC.
For comparison, similar (with a bit higher level of abstraction) test with HTTP put requests in Go, on the same machine, can achieve ~1.5GB/s, hitting NIC's throughput as a bottleneck.
Are there any ideas for how mbedtls_gcm_update could be optimized? Is this something worth submitting as an issue in https://github.com/Mbed-TLS/mbedtls ? I am not sure if this is also what happens if it's used directly, without Julia wrapper though.
I've written a simple example to test the performance of MbedTLS. It's unoptimized and probably incorrect in some aspects, but I hope it shows the issue that I'm facing when using MbedTLS through HTTP.jl.
On machine with 8 cores and 1.5GB/s NIC throughput, this achieves a bit less than 200 MB/s. CPU is 100%, and it takes ~22s.
mbedtls_gcm_update
takes 40%, which means that CPU time spent in that function is ~70s (accounting for 8 cores). My assumption is that this function doesn't do network communication nor invokes it, but does pure processing.So throughput of
mbedtls_gcm_update
is effectively ~58 MB/s per core on this machine.This means that while machine has 1.5GB/s throughput,
mbedtls_gcm_update
is taking time, allowing only for around ~464MB/s for 8 cores in ideal conditions (no other CPU usage in the callstack), and would require more than 24 cores to utilise full NIC.For comparison, similar (with a bit higher level of abstraction) test with HTTP put requests in Go, on the same machine, can achieve ~1.5GB/s, hitting NIC's throughput as a bottleneck.
Are there any ideas for how
mbedtls_gcm_update
could be optimized? Is this something worth submitting as an issue in https://github.com/Mbed-TLS/mbedtls ? I am not sure if this is also what happens if it's used directly, without Julia wrapper though.PProf profile file:
prof_ssl1.pb.gz
Here's a screenshoot of profile file opened using PProf:

The text was updated successfully, but these errors were encountered: