pondelok 31. mája 2010

F5 ponuka Long distance VMotion

Pekne video o Long distance VMotion rieseni od F5 pre VMware prostredie.



Introducing: Long Distance VMotion with VMWare
http://devcentral.f5.com/weblogs/nojan/archive/2010/02/02/introducing-long-distance-vmotion-with-vmware.aspx

Gartner hovori: VMware je jasny lider virtualizacie

Najnovsi Gardner pre virtualizaciu hovori:
“VMware stands alone as a leader in this Magic Quadrant”


“VMware is clearly ahead in”:
Understanding the market
Product strategy
Business model
Technology innovation, Product capabilities
Sales execution

“VMware Strengths”:
Far-reaching virtualization strategy enabling cloud computing, new application architectures and broader management
Technology leadership and innovation
High customer satisfaction
Large installed base (especially Global 2000), and rapid growth of service providers planning to use VMware (vCloud)

Ako funguje snapshot a ako riesit problemy v pripade jeho pouzitia

Super online navod na riesenie problemov v suvislosti s pouzitim snapshotov.

Zdroj: http://geosub.es/vmutils/Troubleshooting.Virtual.Machine.snapshot.problems/Troubleshooting.Virtual.Machine.snapshot.problems.html

Cisco UCS - interaktivne zobrazenie platformy

Zaujimavy link na interaktivnu Cisco stranku, kde si mozete pozriet, ako fyzicky vyzera Cisco UCS platforma.

Zdroj: http://www.cisco.com/en/US/prod/ps10265/ps10279/ucs_kaon_model_preso.html

štvrtok 27. mája 2010

Paravirtualizovany SCSI adapter

"VMware’s new paravirtualized SCSI adapter (pvSCSI) offered 12% improvement in throughput at 18% less CPU cost compared to LSI virtual adapter"

Zdroj: http://blogs.vmware.com/performance/2009/05/350000-io-operations-per-second-one-vsphere-host-with-30-efds.html

VMFS resignaturing

VMware dokument ohladne zmien signatur pri snapshot, replikaciach LUN volumes.

Zdroj:
VMware VMFS Volume Management: http://www.vmware.com/files/pdf/vmfs_resig.pdf

Fibre Channel Zoning

Vyborne linky ohladne Fibre Zoning

Zdroje:

Single initiator zoning http://www.yellow-bricks.com/2008/10/28/single-initiator-zoning/

Tech Target Fibre zoning http://searchstorage.techtarget.com/tip/1,289483,sid5_gci881375,00.html

Storage Networking 101: Understanding Fibre Channel Zones http://www.enterprisenetworkingplanet.com/netsp/article.php/3695836

Ako zmenit nazov priecinka, suborov virtualneho pocitaca?

Zmenu nazvu priecinka a suborov podla nazvu, ktory ste nadefinovali vo vCenter zrealizujete tak, ze urobite Storage VMotion. Na cieli bude nazov priecinka a suborov zmeneny podla VM nazvu z vCenter.

Test konsolidacie storage vo virtualizovanom prostredi za pomoci nastroja vscsiStats

DVDStore version 2.0 is an online e-commerce test application with a backend database component, and a client program to generate workload. We used the largest dataset option for DVDStore (100 GB), which includes 200 million customers, 10 million orders/month and 1 million products. The server ran in a RHEL4-U4 64 bit VM with 4 CPUs, 32 GB of memory and a storage backend of 5 disk RAID 5 configuration.



Doporucujem pozriet sa i na NetApp tool Virtualization Data Collection Tool.

Zdroj:
Storage Workload Characterization and Consolidation in Virtualized Enviornments http://communities.vmware.com/docs/DOC-10104

Analyza vykonu Storage za pomoci VMware utilitky vscsiStats

esxtop is a great tool for performance analysis of all types. However, with only latency and throughput statistics, esxtop will not provide the full picture of the storage profile. Furthermore, esxtop only provides latency numbers for Fibre Channel and iSCSI storage. Latency analysis of NFS traffic is not possible with esxtop.

Since ESX 3.5, VMware has provided a tool specifically for profiling storage: vscsiStats. vscsiStats collects and reports counters on storage activity. Its data is collected at the virtual SCSI device level in the kernel. This means that results are reported per VMDK (or RDM) irrespective of the underlying storage protocol. The following data are reported in histogram form:

* IO size
* Seek distance
* Outstanding IOs
* Latency (in microseconds)
* More!



Zdroj:
Using vscsiStats for Storage Performance Analysis http://communities.vmware.com/docs/DOC-10095

Na co si dat pozor pri navrhu storage

Things that affect scalability

Throughput

* Fibre Channel link speed
* Number of outstanding I/O requests
* Number of disk spindles
* RAID type
* SCSI reservations
* Caching or prefetching algorithms

Latency

* Queue depth or capacity at various levels
* I/O request size
* Disk properties such as rotational, seek, and access delays
* SCSI reservations
* Caching or prefetching algorithms.

Factors affecting scalability of ESX storage

Number of active commands

* SCSI device drivers have configurable parameter called LUN queue depth which determines how many commands can be active to a given LUN at any one time.
* QLogic fibre channel HBAs support up to outstanding commands 256, Emulex 128
* Default value in ESX is set to 32 for both
* Any excess commands are queued in vmkernel which increases latency
* When VMs share a LUN, the total number of outstanding commands permitted from all VMs to that LUN is goverened by Disk.SchedNumReqOutstanding. If this is exceeded, commands will be queued in VMkernel. Maximum figure recommended is 64. For LUNs with single VM, this figure is inapplicable, and HBA queue depth is used.
* Disk.SchedNumReqOutstanding should be the same value as the LUN queue depth.
* n = Maximum Outstanding I/O Recommended for array per LUN (this figure should be obtained with help from the storage vendor)
* a = Average active SCSI Commands per VM to shared VMFS
* d = LUN queue depth on each ESX host
* Max number VMs per ESX host on shared VMFS = d/a
* Max number VMs on shared VMFS = n/a
* To establish a look at QSTATS in esxtop, and add active commands to queued commands to get total number of outstanding commands.

SCSI Reservations

* Reservations are created by creating/deleting virtual disks, extending VMFS volume, creating/deleting snapshots. all these result in metadata updates to the file system using locks.
* Recommendation is to minimise these activities during the working day.
* Perform these tasks on the same ESX host that hosts I/O intensive VMs as the SCSI reservations are issued by the same host as there will be no reservation conflicts as the host is already generating the reservations. I/O intensive VMs on other hosts will be affected for the duration of the task.
* Limit the use of snapshots. It is not recommended to run many virtual machines from multiple servers that are using virtual disk snapshots on the same VMFS. Snapshot files grow in 16MB chunks, so for vmdks with lots of changes, this file will grow quickly, and for every 16MB chunk that the file grows by, you will get a SCSI reservation.

Zdroj:
Andy Troup, employed by VMware as a Senior Consultant and am the EMEA Strategy & Operations Practice Lead. http://virtuallyandy.blogspot.com/2009/03/storage-best-practice.html
Scalable Storage Performance http://www.vmware.com/files/pdf/scalable_storage_performance.pdf

piatok 21. mája 2010

Sprava pamate vo VMware® ESX™ Server

In order to quickly monitor virtual machine memory usage, the VMware vSphere™ Client exposes two memory statistics in the resource summary: Consumed Host Memory and Active Guest Memory.



Consumed Host Memory usage is defined as the amount of host memory that is allocated to the virtual machine, Active Guest Memory is defined as the amount of guest memory that is currently being used by the guest operating system and its applications.
These two statistics are quite useful for analyzing the memory status of the virtual machine and providing hints to address potential performance issues.

This article helps answer these questions:
• Why is the Consumed Host Memory so high?
• Why is the Consumed Host Memory usage sometimes much larger than the Active Guest Memory?
• Why is the Active Guest Memory different from what is seen inside the guest operating system?

Terminology

The following terminology is used throughout this paper.
• Host physical memory refers to the memory that is visible to the hypervisor as available on the system.
• Guest physical memory refers to the memory that is visible to the guest operating system running in the virtual machine.
• Guest virtual memory refers to a continuous virtual address space presented by the guest operating system to applications. It is the memory that is visible to the applications running inside the virtual machine.
• Guest physical memory is backed by host physical memory, which means the hypervisor provides a mapping from the guest to the host memory.
• The memory transfer between the guest physical memory and the guest swap device is referred to as guest level paging and is driven by the guest operating system. The memory transfer between guest physical memory and the host swap device is referred
to as hypervisor swapping, which is driven by the hypervisor.

Memory Virtualization Basics

Virtual memory is a well-known technique used in most general-purpose operating systems, and almost all modern processors have hardware to support it. Virtual memory creates a uniform virtual address space for applications and allows the operating system and hardware to handle the address translation between the virtual address space and the physical address space. This technique not only
simplifies the programmer’s work, but also adapts the execution environment to support large address spaces, process protection, file mapping, and swapping in modern computer systems.
When running a virtual machine, the hypervisor creates a contiguous addressable memory space for the virtual machine. This memory space has the same properties as the virtual address space presented to the applications by the guest operating system. This allows the hypervisor to run multiple virtual machines simultaneously while protecting the memory of each virtual machine from being accessed by others. Therefore, from the view of the application running inside the virtual machine, the hypervisor adds an extra level of address translation that maps the guest physical address to the host physical address. As a result, there are three virtual
memory layers in ESX: guest virtual memory, guest physical memory, and host physical memory. Their relationships are illustrated in Figure 2 (a).



As shown in Figure 2 (b), in ESX, the address translation between guest physical memory and host physical memory is maintained by the hypervisor using a physical memory mapping data structure, or pmap, for each virtual machine. The hypervisor intercepts all virtual machine instructions that manipulate the hardware translation lookaside buffer (TLB) contents or guest operating system page tables, which contain the virtual to physical address mapping. The actual hardware TLB state is updated based on the separate shadow page tables, which contain the guest virtual to host physical address mapping. The shadow page tables maintain consistency with the guest virtual to guest physical address mapping in the guest page tables and the guest physical to host physical address mapping in the pmap data structure. This approach removes the virtualization overhead for the virtual machine’s normal memory accesses because the hardware TLB will cache the direct guest virtual to host physical memory address translations read from the shadow page tables. Note that the extra level of guest physical to host physical memory indirection is extremely powerful in the virtualization environment. For example, ESX can easily remap a virtual machine’s host physical memory to files or other devices in a manner that is completely transparent to the virtual machine.

Recently, some new generation CPUs, such as third generation AMD Opteron and Intel Xeon 5500 series processors, have provided hardware support for memory virtualization by using two layers of page tables in hardware. One layer stores the guest virtual to guest physical memory address translation, and the other layer stores the guest physical to host physical memory address translation. These two page tables are synchronized using processor hardware. Hardware support memory virtualization eliminates the overhead required to keep shadow page tables in synchronization with guest page tables in software memory virtualization.

Although the hypervisor cannot reclaim host memory when the operating system frees guest physical memory, this does not mean
that the host memory, no matter how large it is, will be used up by a virtual machine when the virtual machine repeatedly allocates and frees memory. This is because the hypervisor does not allocate host physical memory on every virtual machine’s memory allocation.
It only allocates host physical memory when the virtual machine touches the physical memory that it has never touched before. If a virtual machine frequently allocates and frees memory, presumably the same guest physical memory is being allocated and freed again and again. Therefore, the hypervisor just allocates host physical memory for the first memory allocation and then the guest reuses the same host physical memory for the rest of allocations. That is, if a virtual machine’s entire guest physical memory (configured memory) has been backed by the host physical memory, the hypervisor does not need to allocate any host physical memory for this virtual machine any more.

Memory Reclamation in ESX

ESX supports memory overcommitment from the very first version, due to two important benefits it provides:

- with memory overcommitment, ESX ensures that host memory is consumed by active guest memory as much as possible

- With memory overcommitment, each virtual machine has a smaller footprint in host memory usage, making it possible to fit more virtual machines on the host while still achieving good performance

streda 19. mája 2010

Typy virtualnych sietovych adapterov

vNIC Types on ESX

Four basic vNICs:

Two emulated types:
Vlance – emulation of physical AMD very old network device. Device, PCNET32 linux drivers - emulation of real physical device

E1000 – Intel Card emulation of real physical device

Reason – OS drivers on OS install CD


Other are VMware devices - designed with emulation in mind
Vmxnet2/enhanced- vmxnet2 (ESX3.5)
Vmxnet3 (vSphere)

“Flexible” vNIC: Morphablein Windows and Linux VMs
Combination Two devices in one:
virtual HW version 4 (ESX 3.x): vlance+ vmxnet2
virtual HW version 7 (ESX 4.0): vlance+ enhanced vmxnet2
Operates as vlanceinitially, but “morphs” into vmxnet2/enhanced vmxnet2 if VMware-tools is installed.

vNIC Features



Vlance – no features, very old!
Vmxnet3 – all features – highest performance,flexibility, RSS multiplexing traffic to multiple vCPUs in Windows 2k8
In the middle of functionalities there is vmxnet2 resp. enhanced vmxnet2 – TSO support, Jumbo Frames are the differences between those.

Notes:
vHWversion 4: ESX 3.x, ESX 4.0
vHWversion 7: ESX 4.0
* ESX 3.5 and later only


vNIC Selection on ESX

vmxnet3 gives the best overall performance today!!! Guest drivers Linux, Win, Solaris

Avoid vlanceif possible
Install VMware tools to morph it into vmxnet2/enhanced vmxnet2!!!!

e1000 vNIC
Good compromise between performance and driver support in Guest OS installation CD, and across Guest OS types, better usability during the install than vmxnet2,3. Performance is reasonable, but not as vmxnet3, or enhanced vmxnet2.

Why vmxnet3 not default? E1000 is default most of times of OSs - reason vmxnet not driver in Install CD, currently vmxnet driver is not on Install CD of OSs. That is reason why VMware recommend E1000 as default.

If Non TCP traffic - If larger Rx ring is needed
E1000 vNIC or vmxnet3 vNIC, they got Larger deafault ring sisez, most of time changeable from OSs.
Larger default rxring sizes; size adjustable in most cases.


Conclusion

TCP vNIC traffic does very well.
Very high aggregate throughput and packet rate achievable.
If your application predominantly uses TCP, you should not worry about impact vNIC networking unless you need many Gbps throughput per vNIC.
Few workload even come close to needing > 2Gbps or > 200k pks/s!

At higher data rate, UDP traffic may need larger vNIC Rx ring.
Larger receive socket size may also be needed.
Depending on packet rate, burst rate and loss rate tolerable, may need to watch CPU and memory over-commitment levels.

Low jitter and very low latency requirements
Where work is going on.
Early recommendation: use EPT or NPT support on processors AMD to vola RVI.

Lot of times people don’t have real application for relevant load demand!!!! Nothing worry about!!!! :)

MTU size – change Jumbo frames- make sure that JF are set on switches!!! It can cause problems. For example you can ping but no transfers.
Only vmxnet3, enhanced vmxnet2. Not E1000, apart of physical E1000 which supports Jumbo Frames, but not implemented in virtual E1000 yet.

Zdroj: Virtual Network Performance
http://www.vmworld.com/docs/DOC-3875

Vykon 10GB virtualneho sietovania pre virtualizovane Windows a Linux pocitace

vNIC Networking

Existing performance improvement techniques:

Minimize copying during Tx
Make use of TSO (TCP Segmentation Offload)
Moderate virtual interrupt rate, heuristic
Generally reduce the number of “VMExits”
NetQueue for scaling with multiple vNICs
Limited use of LRO (for Guest OS that supports it)

VMDirectPath technology

Direct VM access to device hardware
FPT --Fixed Passsthroughin ESX 4.0, not VMotionable

Ways of Measuring Virtual Networking Performance

Metrics

Bandwidth
Packet rate, Particularly when packet sizes are small

Scaling within VM
Increase number of connections
Increase number of vCPUs

Scaling across VMs
Increase number of VM

Test Platform Systems:

ESX
2-socket, Quad-core Intel Xeon X5560 @ 2.80 GHz (Nehalem) system
Each core has L1 and 256KB L2 caches
Each socket has shared 8MB L3 cache
6 GB RAM (DDR 3 -1066 MHz)
pNIC: Intel 82598EB (Oplin) 10GigE, 8x PCIe
ESX 4.0

Other machine
2-socket Intel Xeon X5335 @ 2.66 GHz (Clovertown)
RHEL 5.1
Intel Oplin10GigE NIC (ixgbe; version 1.3.16.1-lro:8 RxQs, 1 TxQ
16GB RAM

Microbenchmark:
Netperf, 5 TCP connections

Single vNIC TCP Performance: Linux VM



Results with RHEL5 VM:

Test configs:
Spectrum of socket and message sizes

Txand Rx both reach ~9Gbps (~wirespeed) with 64kB and or auto-tuned socket sizes
Rx bandwidth of 9+Gbps => over 800k Rx pkts/s (std MTU size pkts)
Very small 8k socket size

Latency bound
reaches ~2Gbps throughput

Number of vCPU smakes little difference in micro-benchmark

Slight drop in Rx throughput going from 2 to 4 vCPUs due to cache effects
vSMP: additional CPU cycles for applications


Single vNIC TCP Performance: Windows VM


Results with Windows 2008 VM: (Enterprise Ed; SP1)

Very similar to Linux VM performance; key differences:

Windows Tx does not use auto-tuning
Rx throughput reaches peak of ~9Gbps with 2 vCPUs
Rx throughput higher then Linux at smaller socket sizes for vSMPs


TCP Throughput Scaling with # Connections


Results with Win2k8, 2-vCPU VM:

Large socket size runs:
Reach9+Gbps with very few connections (just a bit over 4)

Small socket size, moderate messages size:
- throughput continues to scale as number of sockets increase to 20
- Latency bound

Small socket, very small message size:
throughput flattens out, at close to 3Gbps for Rx, and close to 2Gbps for Tx


Multi-VM Scaling: RHEL5 UP VMs


VMs are UP, RHEL5 VMs

For large socket, 9+Gbps (wirespeed) sets the limit

Slight throughput increase going to 2 VMs
No throughput drop as more VMs are added

For small socket size, throughput scales as more VMs are added

For Txall the way through 8 VMs
For Rx, scaling flattens out after 4 to 5 VMs


In all cases, aggregate throughput exceed 5 Gbps

No scalability limit because of virtualization!!!! Only physical limit!!! :)


Multi-VM Scaling: Win2k8 UP VMs


VMs are UP, Win2k8 VMs

For large socket, 9+Gbps (wirespeed) sets the limit
Very similar to Linux VM case; differences:
Large socket size: slightly lower Rx throughput at single VM
Small socket, moderate message size (512): Rx scales extremely well, reaching 9+Gbps
Small socket sizes: Txthroughput somewhat lower than achieved with RHEL5 VM

In all cases, aggregate throughput exceed 4 Gbps

Key difference medium socket RX 8K-512 – Windows higher throughput than Linux– Windows got more acknowledgments


TSO’s Role in Tx Throughput


TSO plays significant role in Netperf Tx microbenchmarking
Large TSOs (>25kB avgsize) with Linux and auto-tuning of socket size
Beneficial even to small message and socket sizes for Linux when transmitting fast enough for aggregation

TSO very beneficial in virtual networking
zeroCopyTx+ largeTSOpacket: amortizes network virtualization overhead across a lot of data

Motivates looking at packet rate as additional performance metric

TSO - when socket is bigger. Win reverse effect lack of autotuning, bigger sizes than Linux.


Network Utilization of Sample Workloads



Very significant workloads at modest amount of network traffic!

Exchange server- loadgen – tool for Exchange benchmarking
TPC-C like benchmark – similar to TPC-C – huge CPU, transactions

Point is – high throughput of previous tests. Real aplication network throughput for Exchange is far lower!!!


Network Utilization of Sample Workloads (2)

SPECweb2005


SPECweb2005 3 modules:
Banking - SSL type of connection
E-Commerce- SSL and non SSL types of communication
Support – downloading patches, download manuals, etc.

People downloading: 2300, 3200, 2200

For support workload (highest network bandwidth workload)
Bandwidth usage highly skewed toward Tx bandwidth:
> 40 to 1 Tx to Rx bandwidth ratio
Tx traffic takes modest advantage of TSO (avg~3 x std MTU size)
Rx traffic has small pkts(avg~500 bytes) – mostly requests

Workload studies references:
Microsoft Exchange Server 2007 Performance on VMware vSphere™ 4, http://www.vmware.com/resources/techresources/10021
SPECweb2005 Performance on ESX Server 3.5, http://www.vmware.com/resources/techresources/1031

Zdroj: Virtual Network Performance
http://www.vmworld.com/docs/DOC-3875

utorok 18. mája 2010

Ako dosiahnut prenosovu rychlost 10+ Gbps pre SSH za pouzitia virtualizacie - Pripadova Studia

Test sa zaoberal problematikou ftp, scp & rsync pre data replikaciu/distribuciu.
Otazka znela: Je mozne dostat 10G z existujucich serverov?
Prvotny test 1G prenosu suboru vykazoval vysledok 40MB/s (320Mb/s).
10G prenos suboru bol s vysledkom 70MB/s (560Mb/s), co nieje 10x zlepsenie.

Je vseobecne jasny fakt, ze 10G siet negarantuje 10G applikacny prenos. Napriek tomu bola otazka, ci je mozne realizovat prenos lepsie.
Cielom bola maximalizacia nativneho a hlavne virtualizovaneho prenosu cez 10GbE.


Test na fyzickom hardware

Prva cast prebiehala na fyzickom hardware:
Xeon CPU X5560 @ 2.8 GHz (8 cores, 16 threads); SMT, NUMA, VT-x, VT-d, EIST, Turbo Enabled (default in BIOS); 24GB Memory; Intel 10GbE CX4 Server Adapter with VMDq

Pouzite aplikacie:
Netperf(common network micro-benchmark)
rsync(standard Linux file transfer utilities);
OpenSSH
HPN-SSH (optimized version of OpenSSH) – vyvyjana na Pittsburg SuperComputing Research Center (http://www.psc.edu/)
bbcp(“bit-torrent-like” file transfer utility)
– Stanfort SLAC National Accelerator Laboratory point-to-point network file copy application, like Torrent technology (http://www.slac.stanford.edu/~abh/bbcp/)

Testovaci OS RHEL 5.3 64-bit

Ramdiskused, not disk drives, focused on network I/O, not disk I/O.

Co bolo prenasane: Directory structure, part of Linux repository: ~8G total, ~5000 files, variable file size, average file size ~1.6MB
Nastroj pre zber udajov bol Linux utility “sar”: zber informacii ohladne prijatej prenasanej sietovej prevadzky a CPU vytazenia.

Vysledky Netperf testu:


Netperf odhalil nespravny slot zapojenia 10GbE karty. Pre optimalne vyuzitie 10GbE je nutne pouzit PCIe Gen1 x8.
Netperf je vyborna utilita na zistenie ci mate PCIe na 8x, cize dobre odkomunikovat s vendorom, ako su zapojene PCIe!
Netperf ukazal plny prenos 10GbE, co je ale len teoreticky, umely test.
Zakaznik ocakaval prenos do 600MB /s, kedze uz testoval FTP, ktore malo dany vykon.



Threads
SCP, RCYNC over SSH – aplikacie, vyvijane pre rokmi. Dnes je moznost vyuzivat viacero threads. SCP nedokaze pouzivat viact hreads.
SSH - SCP cez SSH pouziva one active thread, RSYNC ma dva active threads.
Dnes je na 10GbE moznych 16 threads.
SCP cez HPNSSH protocol - Pittsburg Supercomputing Center – bol vyvijany pre long distance high perf links zvacsenim buffer. 4 threads for crypto prevadzku.
Len malo Linux distribucii ma implementovanych HPNSSH.
BBCP – BitTorent – Stanford University – dokaze rozdelit velke packets na paralelnu prevadzku. No encrypt bulk transfer, just handshake. Pri ostatnych sa len mohlo vypnut crypto pre zvysenie vykonu.


Pri teste sa realizovalo zatazenie cez 8 streams.

Netperf iba chceckuje, SCP, RSYNC, BBCP robili realny transfer. Cielom bolo v testoch dostat sa na uroven vysledku NETPERF umeleho testu.

Test ukazal, ze kryptovanie vyrazne zvysuje CPU utilizaciu az na 90%!!! Bez crypto ide CPU na 50%. BBCP na 30%. Intel nabada vyvojarov aplikacnych nastrojovna pouzivanie viac threads. Podpora pre enkrypcne instrukcie je implementovana v novom CPU Intel Westmere napomoze k zlepseniu vykonu sietovej enkryptovanej prevadzky.

Nastavenia BIOS je potrebne zapnut – NUMA,SMT,Turbo – nevadia, troska vedia pomoct.
MultiQueue (VMDq queue virtualization) 16-32 queues for parallel tasking
5.3 enable for RX on RHEL, 6.0 bude mat .
TX is currently limited to one queue in RHEL, SLES 11RC supports MQ Tx

SCP,SSH 1 active thread
RSYNC – only 2 active thread!
HPNSSH – 4 crypto layers, MAC layer limitation, 2 crypto threads, takze 3 zo 16

Viacere paralelne streams su nutne pre prekonanie limitov aplikacii a nastrojov pre maximalny vykon.

Velkym limitom je hruba kryptograficka vykonnostna uroven.


Virtualized Case


Zmena oproti fyzickemu bola, ze jeden Virtualny Pocitac (VM) moze mat max. 8 vCPU.
Pri 8vCPU pri SCP,RSYNC bez kryptovania, ako aj pri BBCP klesal prenos z 9000Mbit na 5800Mbit. S pouzitim kryptografie, pri klasickom SCP, RSYNC cez protokol HPNSSH, alebo s pouzitim SSH je rozdiel medzi physical a virtual mensi. Pri krypto pri SCP cez HPNSSH zapnutom bola 16CPU fyzika lepsia ako 8vCPU vo VM o tretinu. Pri 8CPU vs 8vCPU by to bolo tesnejsie!!!



Ovela lepsie je pustit 8 VMs po 1vCPU, je tak dosiahnuta mensia strata oproti fyzike. Pri SCP over HPNSSH je uz strata stvrtina, ale voci 16 CPU fyzickemu stroju !!! Cize viacero VMs robi lepsi prenos ako jedna velka VM!!!!! Niekedy su lepsie 2vCPU vs 1vCPU machines.


VMDq – frontovanie queues zalozene na threads – traffic load distributed preneseny z hypervisor na physical NIC.

Techniky NetQueue a VMDq realne pomahaju len ak je komunikujucich viacero VMs.

Vmdirectpath pre viacero VMs dedikovane na jeden IO hardware – bude podpora v novych sietovych kartach.

Pri teste neboli pouzite Jumbo Frames, zemerali sa na default sietovy setup. Jumbo je zamerany na storage network iSCSI, NFS!!! Toto bol test SCP,RSYNC. Chceli testovat default out of the box. Uvedomuju si, ze na tuning a upravy je potrebne mat ludi. Existuje vela moznosti pre tuning, ale chceli out of the box technologies.
Predpoklad je, ze jumbo frames by mali napomoct k zvyseniu vykonu v rozmedzi 0-20%.

Netperf sa ukazal ako vhodny identifikacny tool. Nieje ale vhodny pre real workloads. Neda sa len pustit benchmark, musi sa testovat realna aplikacia – priklad je SCP, RSYNC cez SSH.
Nastroje by mali utilizovat viacero threads. Dnes bezne pouzivane aplikacie vykazuju velke hodnoty idle,cize nepracuju efektivne, co vychadza z ich historickeho povodu designu.
Pre nizku latenciu je doporucena kabelas Twinax, SFP Twinax alebo optika, pre nizku latenciu.

Resume:
Najdolezitejsim prvkom testovania sietovych technologii je samotna aplikacia, respektive nastroj operacneho systemu. Stratovy vykon virtualizovaneho prostredia je voci fyzickemu vykonu nizky. K minimalizacii rozdielov vykonov medzi fyzickym a virtualnym nasadenim prispievaju technologie na urovni hardware. Zaverom zakaznik jednoznacne doporucuje virtualizovat.

Zdroj: Achieving 10+ Gbps File Transfer Throughput Using Virtualization - End-User Case Study
http://www.vmworld.com/docs/DOC-3820

Pouzivat Hyper Threading alebo nie?

This question is about to come up again so I can see multiple posts coming in the future on it. Intel’s Nehalem Processor is adding HyperThreading back into the chips so you can expect more posts on this topic in the near future. I have not reviewed HT on Nehalem so I don’t know all of the changes that have been made to HT (if any). This is the position I have responded with in the past:

There are pros and cons to using HT in ESX.

Pros

* Better co-scheduling of SMP VM’s
o Hyperthreading provides more CPU contexts and because of this, SMP VM’s can be scheduled to run in scenarios which would not have enough CPU contexts without Hyperthreading.
* Typical applications see performance improvement in the 0-20% range (the same as non-virtualized workloads).

Cons

* Processor resources are shared with Hyperthreading enabled
o Processor resources are shared such as the L2 and L3 caches. This means that the two threads running on the same processor compete for the same resources if they both have high demand for them. This can, in turn, degrade performance.

All things considered, it is difficult to generalize the performance impact of Hyperthreading. It is highly dependant on the workload of the VM.

One additional point is that you can always utilize the CPU min and max values on a per-VM or Resource Pool basis to reserve certain amounts of CPU for your most critical workloads.

As with the majority of performance items I enounter, test, test, test. Try out the workloads and see what works the best on the hardware you have available.

Zdroj: http://vmguy.com/wordpress/index.php/archives/362

10Gb Etherent vo virtualnom pocitaci

Clanok, pojednavajuci o vykone VMware virtualizovaneho sietoveho adaptera vmxnet3, hovori:

Line Rate 10GigE

Howie Xu, Director of R&D for VMkernel IO remarked recently that after talking with a few customers, many are still unaware we can achieve line rate 10GigE performance on ESX 3.5. Read “10Gbps Networking Performance on ESX 3.5u1” posted on VMware’s network technology resources page.

The story only gets better with vSphere 4 and ESX 4 with the new Intel Nehalem processors. Initial tests from engineering show a staggering 30Gbps throughput.

Zdroj: http://www.vadapt.com/2009/05/vmxnet3/

Na druhej strane treba mysliet na to, ze vmxnet3 nieje podporovany pre VMware Fault Tolerance:

http://kb.vmware.com/kb/1013757

VMware FT cannot be enabled on a virtual machine using either the VMXNET3 or PVSCSI devices; vCenter Server will simply report an error that the network interface or disk controller isn’t supported for VMware FT.

Zdroj: http://blog.scottlowe.org/2009/07/05/another-reason-not-to-use-pvscsi-or-vmxnet3/

Prehladny uvod do VDI riesenia VMware View

Prehladny clanok o VDI rieseni VMware View na kvalitnom portale www.brianmadden.com:

Zdroj: http://www.brianmadden.com/blogs/guestbloggers/archive/2009/01/15/an-introduction-to-vmware-view-3-features-and-best-practices-part-1-of-3.aspx

streda 5. mája 2010

Konfiguracia VMDirectPath I/O

Krasna a uzitocna funkcionalita vSphere - VMDirectPath I/O.
Je to moznost exkluzivne pridelit hardware I/O zariadenie k virtualnemu pocitacu.
Pozor, podmienkou je hardware Intel Virtualization Technology for Directed I/O (VT-d) resp. AMD IP Virtualization Technology (IOMMU).

Zdroj: http://www.youtube.com/watch?v=jmQ5Ej8r-aA

Configuration Examples and Troubleshooting for VMDirectPath
http://www.vmware.com/pdf/vsp_4_vmdirectpath_host.pdf

VMware VMDirectPath I/O
http://communities.vmware.com/docs/DOC-11089.pdf;jsessionid=8666C2B4AEFBCA0CE8ED9BF81C0FB70B