Scale Up or Scale Out?

Well? 

You know the answer to this one.  It depends.  🙂  Of course it depends as every environment is different.  However, today I was looking into the scalability of a XenApp 6.5 farm for a customer to see if we wanted to scale up or out.

Tradition says to scale out but tradition is for ceremonies, Christmas and getting tourists to part with their money 🙂  Tradition doesn’t have a place in today’s modern virtualised world of IT.  So, in order to work out if we need to scale up and out, I needed data!  “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts”.

That data came in two formats – knowledge of what governs why we would scale up or out in the form this article by Citrix Architect Nick Rintalan – very useful.  Also, in order to understand the current hardware used and see what the CPU and NUMA configuration was I found this article and this article on the Xen Wiki very useful indeed!

The second part of the data came from EdgeSight.  Real life actual statistical data from the client’s XenApp farm. Treasure!

To give a little bit of background, the customer is running XenApp 6.5 HRP2 on XenServer 6.0.2 to publish a single, business critical application, which turns out is a bit more CPU hungry than originally envisaged.  The XenApp servers are provisioned via a Provisioning Services 6.1 vDisk.  All basic stuff.

The underlying hardware uses 4 Intel Xeon E7540 2GHz CPUs with 6 cores each which gives us a grand total of 48 logical cores (4 x 6 x 2 ;-p).  Plenty to be getting on with!  The current XenApp VM configuration is 6 vCPU and 32GB RAM.

EdgeSight was showing that the servers were getting their CPU maxed out with around 52 users, each running the same published application.  Rough maths shows that this means each instance of the process was using about 11% of a CPU.  In comparison, we were seeing peak memory usage of 28GB so about 87-88% meaning we did have some memory to spare.

So, do we add more vCPU, say to take it to 8 or add more XenApp servers?  The decision is going to be based on if the XenApp servers will scale linearly or not, i.e if we add an additional 33.3% of CPU, will we get an additional 33.3% of users?  In this case would we max out at 70 users?  If we maxed out at less than 70 users then we’re not scaling linearly so it would be better to scale out – savvy?

In order to try and work this out prior to any testing, I wanted to understand the NUMA topology of the Intel E7540’s to see what the multiples would be (again, reference here for more details).

On the XenServer host, using the command xl info –n, you will see the following:

CPU configuration

CPU configuration

This confirms that we’re running 48 cpus and have 4 nodes.  These are the NUMA nodes.  As we have 4 sockets, that means 1 node per socket.  Scrolling down, we see the following to confirm that and also show we’re getting 12 CPUs per NUMA node.

Numa configuration

Numa configuration 1

Numa configuration 2

Numa configuration 2

Numa configuration 3

Numa configuration 3

So, to summarise, based on Mr Rintalan’s insight, we want to be running CPU multiples of the NUMA configuration, in this instance 12.  We don’t want to use less CPUs than we currently have (4), which means the next one up is 12 .  This is not practical as to have 12 vCPU we’ll need 64GB RAM which isn’t an option at present so with all this in mind we ARE running the OPTIMAL configuration at present so the answer is…

Scale out 🙂

Thanks for reading and I hope you find my experiences and the links useful.