It takes slightly more than 125 milliseconds to transfer 16MiB of data through a TCP connection over a 1Gib/s link.
The Linux default 128KiB socket receive buffer gets full in ~0.976 milliseconds when receiving from a 1Gib/s link. A receiving process must read the 128KiB socket receive buffer dry at a minimum rate of 1024Hz, in order to be able to receive at the link's 1Gib/s rate at all. Such rate is possible but not sustainable with default distribution kernels and configurations.
CODEL buffer-bloat prevention, aggravated by 128KiB-small default socket buffer sizes (hardly adequate for 1Gib/second links), minimises the network transfer latency at the expense of choking throughput with smaller network adaptor buffers. CODEL is only meaningful when hosts are capable of fully saturating/congesting a network link and it is desirable to minimise latency for the price of lower throughput.
Using sshfs in high-speed local networks demands maximum throughput, rather than low latency — the extreme opposite of CODEL network latency minimisation achieved by making network adaptor buffers tiny.
Tiny buffers minimise Ethernet frame queue sizes and, hence, queuing delays, which throttles senders in connections with flow-control, such as TCP. TCP peers explicitly communicate available receive buffer capacities, so that a TCP sender pauses sending any further TCP segments until the receiver advertises a non-zero receive buffer size.
UDP has no flow-control, so that protocols communicating with UDP datagrams, such as WireGuard, end up hammering the CPU with non-blocking write/send/sendto syscall retries when a tiny network adaptor send buffer gets full and rejects queuing the next Ethernet frame.
ssh compression throughput is limited to ~200MiB/s, when compressing with a CPU core running at 3.5GHz (top cloud CPU cores run at 2.8GHz).
With ssh compression disabled, ssh encryption is the next data transfer bottleneck. No ssh cipher is capable of encrypting/decrypting at or above 1Gib/s rate, apart from aes128-gcm@openssh.com running on CPUs with the AESNI instruction set.
With above facts in mind, maximising throughput for sshfs mounts over 1Gib/s or faster network links requires invoking sshfs with additional option -o compression=no,Ciphers=aes128-gcm@openssh.com to remove ssh data transfer bottlenecks, when AESNI instruction set is available on both peers of an ssh connection.
E.g. before:
sshfs -o reconnect,idmap=user,noatime ...
After:
sshfs -o reconnect,idmap=user,noatime -o compression=no,Ciphers=aes128-gcm@openssh.com ...
Next, maximising the throughput of 1Gib/s or faster network links requires enabling much larger socket buffer sizes, and using much larger socket buffers by default.
Using a much larger sshfs data transfer size per request (than the default 256 FUSE pages, controlled by fs.fuse.max_pages_limit) significantly improves sshfs performance over high-bandwidth network connections too.
These parameters have to be configured with sysctl. E.g.:
# /etc/sysctl.d/98-maximize-net-throughput.conf
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
fs.fuse.max_pages_limit = 32768
Furthermore, enabling Jumbo Ethernet frames reduces the cost/waste of Ethernet frame headers and trailers in large data transfers. Jumbo Ethernet ~9KiB frames improve data transfer speeds by 5-10%, when/if all hosts involved in the particilar route support jumbo Ethernet frame sizes. E.g. both peers in the same network and the switch/router the peers are connected to.
Jumbo Ethernet frames are enabled by explicitly setting connection/link MTU to the maximum MTU supported by particular network adaptors (adaptor's maxmtu in ip -d link show output), instead of default "auto" (1500-byte) MTU in network connection settings.
Wireless adaptors also support Jumbo Frames, contrary to all claims elsewhere. A laptop I type this text on, for example, supports 9216-byte jumbo frames for wired Ethernet and 2304-byte jumbo frames over WiFi.
No comments:
Post a Comment