vNIC Networking
Existing performance improvement techniques:
Minimize copying during Tx
Make use of TSO (TCP Segmentation Offload)
Moderate virtual interrupt rate, heuristic
Generally reduce the number of “VMExits”
NetQueue for scaling with multiple vNICs
Limited use of LRO (for Guest OS that supports it)
VMDirectPath technology
Direct VM access to device hardware
FPT --Fixed Passsthroughin ESX 4.0, not VMotionable
Ways of Measuring Virtual Networking Performance
Metrics
Bandwidth
Packet rate, Particularly when packet sizes are small
Scaling within VM
Increase number of connections
Increase number of vCPUs
Scaling across VMs
Increase number of VM
Test Platform Systems:
ESX
2-socket, Quad-core Intel Xeon X5560 @ 2.80 GHz (Nehalem) system
Each core has L1 and 256KB L2 caches
Each socket has shared 8MB L3 cache
6 GB RAM (DDR 3 -1066 MHz)
pNIC: Intel 82598EB (Oplin) 10GigE, 8x PCIe
ESX 4.0
Other machine
2-socket Intel Xeon X5335 @ 2.66 GHz (Clovertown)
RHEL 5.1
Intel Oplin10GigE NIC (ixgbe; version 1.3.16.1-lro:8 RxQs, 1 TxQ
16GB RAM
Microbenchmark:
Netperf, 5 TCP connections
Single vNIC TCP Performance: Linux VM
Results with RHEL5 VM:
Test configs:
Spectrum of socket and message sizes
Txand Rx both reach ~9Gbps (~wirespeed) with 64kB and or auto-tuned socket sizes
Rx bandwidth of 9+Gbps => over 800k Rx pkts/s (std MTU size pkts)
Very small 8k socket size
Latency bound
reaches ~2Gbps throughput
Number of vCPU smakes little difference in micro-benchmark
Slight drop in Rx throughput going from 2 to 4 vCPUs due to cache effects
vSMP: additional CPU cycles for applications
Single vNIC TCP Performance: Windows VM
Results with Windows 2008 VM: (Enterprise Ed; SP1)
Very similar to Linux VM performance; key differences:
Windows Tx does not use auto-tuning
Rx throughput reaches peak of ~9Gbps with 2 vCPUs
Rx throughput higher then Linux at smaller socket sizes for vSMPs
TCP Throughput Scaling with # Connections
Results with Win2k8, 2-vCPU VM:
Large socket size runs:
Reach9+Gbps with very few connections (just a bit over 4)
Small socket size, moderate messages size:
- throughput continues to scale as number of sockets increase to 20
- Latency bound
Small socket, very small message size:
throughput flattens out, at close to 3Gbps for Rx, and close to 2Gbps for Tx
Multi-VM Scaling: RHEL5 UP VMs
VMs are UP, RHEL5 VMs
For large socket, 9+Gbps (wirespeed) sets the limit
Slight throughput increase going to 2 VMs
No throughput drop as more VMs are added
For small socket size, throughput scales as more VMs are added
For Txall the way through 8 VMs
For Rx, scaling flattens out after 4 to 5 VMs
In all cases, aggregate throughput exceed 5 Gbps
No scalability limit because of virtualization!!!! Only physical limit!!! :)
Multi-VM Scaling: Win2k8 UP VMs
VMs are UP, Win2k8 VMs
For large socket, 9+Gbps (wirespeed) sets the limit
Very similar to Linux VM case; differences:
Large socket size: slightly lower Rx throughput at single VM
Small socket, moderate message size (512): Rx scales extremely well, reaching 9+Gbps
Small socket sizes: Txthroughput somewhat lower than achieved with RHEL5 VM
In all cases, aggregate throughput exceed 4 Gbps
Key difference medium socket RX 8K-512 – Windows higher throughput than Linux– Windows got more acknowledgments
TSO’s Role in Tx Throughput
TSO plays significant role in Netperf Tx microbenchmarking
Large TSOs (>25kB avgsize) with Linux and auto-tuning of socket size
Beneficial even to small message and socket sizes for Linux when transmitting fast enough for aggregation
TSO very beneficial in virtual networking
zeroCopyTx+ largeTSOpacket: amortizes network virtualization overhead across a lot of data
Motivates looking at packet rate as additional performance metric
TSO - when socket is bigger. Win reverse effect lack of autotuning, bigger sizes than Linux.
Network Utilization of Sample Workloads
Very significant workloads at modest amount of network traffic!
Exchange server- loadgen – tool for Exchange benchmarking
TPC-C like benchmark – similar to TPC-C – huge CPU, transactions
Point is – high throughput of previous tests. Real aplication network throughput for Exchange is far lower!!!
Network Utilization of Sample Workloads (2)
SPECweb2005
SPECweb2005 3 modules:
Banking - SSL type of connection
E-Commerce- SSL and non SSL types of communication
Support – downloading patches, download manuals, etc.
People downloading: 2300, 3200, 2200
For support workload (highest network bandwidth workload)
Bandwidth usage highly skewed toward Tx bandwidth:
> 40 to 1 Tx to Rx bandwidth ratio
Tx traffic takes modest advantage of TSO (avg~3 x std MTU size)
Rx traffic has small pkts(avg~500 bytes) – mostly requests
Workload studies references:
Microsoft Exchange Server 2007 Performance on VMware vSphere™ 4, http://www.vmware.com/resources/techresources/10021
SPECweb2005 Performance on ESX Server 3.5, http://www.vmware.com/resources/techresources/1031
Zdroj: Virtual Network Performance
http://www.vmworld.com/docs/DOC-3875
streda 19. mája 2010
Vykon 10GB virtualneho sietovania pre virtualizovane Windows a Linux pocitace
Prihlásiť na odber:
Zverejniť komentáre (Atom)
Žiadne komentáre:
Zverejnenie komentára