Catalyst09: Cloud Services
One of the tracks at Catalyst this year focussed on Cloud Computing, and included sessions from Burton Group analysts, customers and cloud service providers. Burton Group made a point of distinguishing the different types of cloud services that are currently available:
- Hardware / Infrastructure as a Service
- Services that allow you to run your OS and software on top of virtualized servers, storage and networking: Amazon (EC2, S3, etc.), Rackspace Cloud, various Virtual Private Server vendors, etc.
- Platform as a Service
- Services that provide a development platform, where you have no view into the OS or infrastructure and are bound to a particular application development environment: Salesforce’s force.com, Google AppEngine, EngineYard, etc.
- Application / Software as a Service
- Services that provide applications directly to users, sometimes with opportunities to integrate with in-house apps and services (e.g. authentication) – Google Apps, Zoho, Microsoft Office Live, Salesforce, Basecamp
Burton Group’s Recommendations
Despite the many issues – lack of clarity, too much hype, trust & security problems, and unclear ROI, BurtonGroup believe that not only is cloud computing inevitable for most organizations, in one form or another, but that it is transformational, and helps IT focus on what’s important. Organizations should evaluate their ability to consume the cloud; which infrastructure and applications can be moved to the cloud (especially non-core services), and which cannot. Cloud strategies need to be devised now, and IT departments should build a internal target cloud architecture that aligns with external cloud vendors (e.g. Eucalyptus, which can be run internally, maps closely to Amazon EC2). The cost of an internal cloud service can also be compared to that of an external service.
Cloud offerings should be matched to application needs – e.g. you don’t need to build mail servers on EC2 instances if you can outsource the whole app to Google Mail.
Other points:
- Cloud computing will likely require changes to business processes
- Liability for service delivery and data security will remain with IT
- Organizations must have an exit strategy
Customer Presentations – Eli Lilly
Eli Lilly was looking to reduce costs, enhance collaboration with external researchers, share data securely, reduce development time, and maximize the use of their computing resources. They are using Amazon EC2 for their public cloud, with an internal Eucalyptus and Xen cloud for developing applications and services to deploy in their public cloud, and for emerging internal cloud services. Their private, internal production cloud runs atop VMware. Eli Lilly researchers have been heavy users of compute grids for many years.
How Eli Lilly consumes cloud computing depends on who you ask:
- developers don’t want to involve sysadmins in frequent system setup/teardown
- scientists don’t care about infrastructure, they just want to run intensive jobs
- architects want to easily prototype complex interconnected environments
- collaborators want to share data and apps between third parties with minimal lead time
Eli Lilly has found that using cloud services has forced them toward more automation, but has also made service costs more transparent.
By using cloud services to share appropriate data & applications for collaboration there are fewer holes through the firewall for external collaborators.
Security
Eli Lilly have sophisticated firewalls, IDS and physical protection around their enterprise systems, but needed to “raise their game” when running services in the cloud. They treat all cloud systems they’re open to the internet (which they are), and are using host-based security tools rather than relying on corporate firewalls. While using web services has helped to reduce complexity, they are still struggling with reuse of services.
Case Study #1 – Clinical Trial Optimization
The current appetite for HPC at Eli Lilly exceeds the company’s desire to expand – 20 million task scheduled per month, typical CPU usage is 100%.
Before using cloud services, all the compute grids were CPU bound and were negatively impacting critical deadlines. After the workflow was ported to Cycle Computing’s CycleCloud service (building HPC environments on EC2), with scheduling environments spun up on demand, runtimes became more consistent because there was no contention for resources.
Case Study #2 – Quick Start/Stop of Collaboration
Ely Lilly used to pass data via email with collaborators, with files often lost due to email filtering; sensitive data was often transferred via physical media. Now they use a data sharing service on Amazon EC2/S3/EBS, using strong encryption for all data transfer and storage. They’re using RightScale to enable self-service creation of these collaboration sites, and have found that the cost scales linearly with usage. IT no longer has to modify firewall rules, create VPNs, and issue hardware tokens to external collaborators.
Customer Presentations – International Hotel Group
IHG uses cloud computing to respond to fluctuating demand. IHG is also using a mix of private clouds, public clouds, VMware and big iron (including a mainframe bought in the 1960s that is still running).
The driving forces for cloud adoption at IHG are
- OpEx model – IHG is an asset light, franchise model business and prefer to “pay as they go”
- elasticity – the hotel business has volatile usage demands (e.g. large public events, natural disasters)
- Speed of Light – IHG operators globally, and cloud services allow them to use computing resources closer to end users without having to acquire data center space across the world
- Speed to market – faster provisioning is critical for innovation
IHG took 4 years to prepare their organization (people, finances, processes, software) for cloud computing. They restrict their use of cloud computing to infrastructure as a service (OS and up is IHG’s concern), and moved non-production environments – development, Q/A, integration – to the public cloud first. Like Eli Lilly, IHG is also running an internal private cloud. IHG’s inner cloud is not needed for technical reasons, but due to the lack of SLAs from cloud providers, perceived data risks, PCI compliance ambiguity, organizational maturity, and funding model differences. Running an inner cloud also enables IHG to collect more detailed metrics on cloud usage. Data sensitivity drives the choice of where IHG’s data lives – regular internal systems, internal cloud, public cloud.
Vendor Presentation – Salesforce.com
While Salesforce.com is best known for providing CRM Software as a Service, it also provides Force.com – a Platform as a Service based on the same infrastructure used for the CRM service, that companies can use to develop their own applications, leveraging the platform’s database, web services api, workflow engine, page layout editor, reporting, analytics and multi-language, multi-device features.
According to Peter Coffee, Director of Platform Research for Salesforce.com, the industry is moving to an ideal “0, 1, ∞” model:
- 0 infrastructure capital commitment, acquisition cost, adoption cost, support cost
- 1 coherent and resilient environment – not a brittle “software stack”
- ∞ scalability in response to changing need, integratability / interoperability with legacy assets and other services, customizability / programmability from data , through logic, up into the UI without compromising robust multi-tenancy
While the ∞ advantages are mostly true (other than where the “other services” are competitors of the platform provider), the other advantages are not all correct at this time (or perhaps ever) – there are always going to be acquisition and adoption costs as IT shifts to a new platform, cloud or otherwise, and needs to re-train existing staff or hire new staff, and migrate data out of the old platform and into the new platform. The single “coherent and resilient environment” requires a commitment to that platform, which may not always be possible or desirable.
Roundtable
Some highlights from the Cloud roundtable:
- Cloud is compelling if you’re out of power and space, rather than colocation or building a new data center
- Prices will go down, but there will be bumps along the way when cloud providers need to build more capacity
- When EBS space is released, Amazon will eventually re-assign the blocks, but don’t currently wipe them
Service Level Agreements
Concerns about cloud provider SLAs (or the lack thereof) were raised in most of the question and answer sessions. Peter Coffee from Salesforce countered one questioner by pointing out that most SLAs just result in you not being charged for downtime… which doesn’t help your business; IHG has stricter SLAs with some hosting providers which reimburses lost revenue… but I still don’t think that helps the business either, since users will go somewhere else to complete their request. Generally, the solution is better architecture – using multiple availability zones for EC2, or even multiple cloud providers.