Saturday, December 31, 2011

SaaS (Software as a Service) and Multi-Tenancy

SaaS
SaaS can be defined as:
"software deployed as a hosted service and provided over the internet". 


   It is becoming an increasingly popular word, one that Software Vendors can ill-afford to ignore. It is opening up markets and segments that were previously inaccessible to vendors but it is also throwing up challenges like never before. 
    SaaS allows an application to scale to (theoretically) unlimited number of customers. It is typically achieved by on-demand horizontal and vertical scaling behind the scenes to provide a seamless experience to the user. 
    The practice of Application architecture has to adjust to account for SaaS and to Architect and Design applications for SaaS. Support for Multi-Tenancy is the approach some SaaS vendors follow to effectively address the SaaS challenges and this article will attempt to address the different aspects of Multi-Tenancy. So...


What is Multi-Tenancy
Multitenancy refers to a principle in software architecture where a single instance of the software runs on a server, serving multiple client organizations (tenants).


Why Multi-tenancy?
  • Great and Easy deployment. There is a single system to manage.
  • Cost savings associated with less hardware(or Virtualized Hardware)
  • Less software licenses to buy.

Multi-Tenancy Issues
  • Security testing has to be extensively done.
  • Expensive hardware to buy. Increased cost associated with big rack hardware.
  • Not easy to convert an existing application/software to Multi-tenant. 
  • Requires schema change to existing apps. 
  • Pure Multi-tenancy requires applications and infrastructure to scale-up to address demand.
  • Single point of failure issues. Which is the weakest link?
  • Multiple tenant metadata is difficult to manage.
  • Down time issues. What happens if the Database has to be brought down for maintenance?
  • Access control. This could be managed and configured at diff customers and less control over issues surrounding this.
  • Extensions to data model are tricky and sometimes impact all customers.
  • What about architecture that relies on communication between components in SaaS world and ones behind the firewall ?

Muti-tenancy versus Virtualization
Virtualization to a limited extent seems to be a good alternative for Multi-tenancy. However there are drawbacks. Would virtualization be profitable if there were 1000 customers deployments? How would you deploy and manage such an environment? Also what would we do if the Customers are "on-demand"? Which means they may want to use the system for brief periods of time and then be gone for days/weeks/months. How do you keep a Virtualization solution profitable?


But this is not to say that it just can't be done profitably. There are solutions out there that are using Virtualization to achieve Multi-Tenancy especially If the customer deployments are in single or very low double digits.Virtualization might (better) provide the following benefits:

  • Data Isolation.
  • Security. Both at the PaaS layer and at the Virtualized layer.
  • Performance ( one client can't directly impact the other's performance).

Designing for Multi-tenancy


Key points of Multi-Tenancy
  • Flexibility
  • Share-ability
  • Maintainability
  • Customizability

Architectural Constraints
  • Maintain a single code base to ease deployments and upgrades.
  • Share the data resources to have a consistent view of the Schema.
  • Components must be customizable at every possible level.
  • The Application Tier must be as stateless as possible to allow Scalability.

Trade offs
  • Complexity versus Time to market. What do the customers want and when?
  • Resource sharing vs Security/Availability. Who is my customer? Legal or SLA considerations?
  • Customize-ability vs Maintainability. A myriad of customizable options. Which one was chosen when this issue occurred? How do we fix this without affecting everyone else?

Interfaces
Multi-Tenant applications should expose (and consume) standard based interfaces like
  • REST
  • WS-*
Configurable
The application has to be configurable. In a traditional MVC style of application, the following would have to be extensively configurable:
Model: Allow schema extensions.
Controller: Allow new business logic to be plugged in or existing ones to be enhanced. Allow modification/customization of security policy.
View: Allow look and feel changes, changing of display items, screen order, messages etc. This does assume that there is a metadata somewhere that allows the configuration options and allows the particular view to be "instantiated" based on the metadata.



Security
Security includes, secure the SaaS model as whole and has an application level security architecture and a Data level security architecture. Data level Security is addressed in the next section. Application level security could involve either storing all user accounts with the SaaS provider and/or federating authentication to  trusted STSs ( Security Token Services) or trusted Identity Providers.
The SaaS application can also provide configurable Identity Management modules that either do the authentication/authorization themselves or federate as noted above. If the Authentication is federated, then we need some kind of a mapping service to map roles from Trusted servers to the roles/policies defined in the SaaS application.


Multi-Tenant Data Architecture
  • Separate Database 
    • Easy to Maintain
    • Customizable ( with probable issues later on)
    • Secure
    • Upgradable
    • Higher costs
  • Shared Database with separate Schema
    • Easier to Maintain
    • Customizable ( with probable issues later on)
    • More Secure
    • Upgradable but complex
    • lower costs
  • (Truly) Shared Database. This is the ideal scenario and the one most touted by purists. However the risk is that the different tenant data could inadvertently be mixed resulting in legal, compliance or contract issues. Also there are issues around Database sizes and multiple strategies including Partitioning and Segmentation have to be used to manage the Database. We won't go into details as there is more than adequate literature around these topics.
    • Complex Upgrade process
    • Impacts Multiple customers
    • Data Security at Data Access Layer.
    • Low Cost ( not assuming cost of design, development etc).


Which approach to choose?
Chose the Shared Database option if the number of tenants are higher to justify initial investment. However if number of users per tenant or database size per tenant or per tenant customizations are greater then choose the isolated model. 



Data-Security
Security strategies for shared database could include having views that filter the data visible to a tenant, using access control to the database itself and encryption at the data layer ( decryption at the DAC using tenant specific keys). 

Data-Extensibility
Multiple options are available including using name-value pairs to store data. Traditional data-extensibility approaches include have predefined fields and allowing extensions through metadata tables.

Data-Scalability
Data partitioning as noted in the above sections can aid in horizontal scaling of the Database.



Refer Multi-Tenant Data Architecture for further details.


Conclusion
Achieving the most optimum "degree" of multi-tenancy is something that the organization has to strive for. It could start with the Multi-Tenancy at the Infrastructure layer (IaaS), and the Platform layer (PaaS), move on to SaaS clusters that can provide some degree of Multi-Tenancy and then finally move on to the complete Multi-Tenancy at the Software layer (SaaS). 


Resources
  1. Wikipedia- Multitenancy
  2. Many degrees of Multi-tenancy is an excellent blog post that outlines the current debate and approaches for Multi-tenancy. 
  3. Multi-Tenant Data Architecture is an excellant resource that talks about Data design patterns for Multi-Tenancy.
The opinions and statements in this communication are my own and do not necessarily reflect the opinions or policies of CA.

Saturday, December 10, 2011

Cloud Computing

OK. I made up my mind. Let's cover Cloud Computing before we talk about SaaS and Multi-Tenancy.


Maybe explaining the below terms will be better than trying to come up with a technical definition of cloud Computing.


IaaS(Infrastructure as a Service)
IaaS provides data center, infrastructure hardware and software services over the web. One example is Amazon EC2. This is probably the most dominant form of Cloud Computing that we see today.


PaaS ( Platform as a Service)
PaaS is the next level of abstraction and provides the platform to build software services or products. For instance it may provide a Database, Web Server etc. Example is Salesforce.com's force.com.


SaaS (Software as a Service)
In SaaS, applications are provided as hosted services over the web.Probably the most widely used model of Cloud Computing.


So you could define Cloud Computing as "some service provided over the internet". It could be a computer server, a virtual server, a pre-configured OS, a hosted environment(Middleware), web based apps ( Google Apps), Web Services (Gmail ) etc. The three characteristics of a cloud service are:

  • Elasticity
  • Self-Service access
  • Quick response.

Cloud Computing versus Grid Computing
Grid Computing typically relies on a batch scheduling mechanism to fan-out the task to multiple nodes and then accumulate the results. So Grid computing doesn't necessarily deal with getting processing capacity right now and instead focuses on some mechanism that is predefined and allows the "batch-job" to schedule jobs across these Grid nodes.


One could argue that Cloud Computing is more evolved version of Grid Computing, Virtualization is important for Cloud Computing because it is on-demand and is a key difference versus Grid Computing. However there are ways of achieving this by using Multi-Tenancy (Salesforce.com?) without Virtualization and we need to talk about those reference models.


If I haven't confused you already, then let me try some more. What about Grid Computing that is available in the Cloud? :-)


Public, Private, Hybrid and Community Clouds
Location is not important here but the ownership is. Who "owns" the facility? If the facility is shared by multiple public clients then it's a public cloud. If the facility is co-located within the enterprise ( it could be managed by a third party provider) then it's a private cloud. Hybrid clouds, ofcourse combine elements from both public and private clouds. Community clouds have multiple owners and are shared across those communities.



Cloud Computing benefits
Faster processing by making use of better/faster infrastructure available at much cheaper rates.
Minimize infrastructure bottlenecks by delegating the scalability and on-demand handling to the provider.
Low barrier to entry by allowing SMBs to participate in provinding solutions irrespective of the size of their data center.


In spite of the benefits, we do have to consider the network bandwidth requirements forced by servicing clients. Is it sufficient to meet the clients demands? What is the latency?



So, how do you use the Cloud?
OCCI is working towards standardizing the "API" to access the cloud but unfortunately, it is not completely implemented at vendors yet. Not sure if the major vendors like Amazon and SalesForce.com are part of it, so Customers hoping for interoperability between cloud providers will be disappointed. However it is still a useful resource to keep track of.


Architecture specific focus


Application architectures now have to consider a few extra things in addition to traditional issues such loose-coupling, distributed deployment. They now have to focus on delivering the entire application architecture as a set of composable services. If it can be virtualized, composed and assembled programmatically and quickly then it falls into the perfect category of Cloud applications.


The following are key for successful cloud applications:
Horizontal Scaling is the key to having a successful cloud application. If we can deploy the application components in a distributed fashion and provision additional deployments if the demand increases, we would be able to serve additional requests. Surge computing can be utilized as well to procure the computing needs from public clouds in case we are running in a private one. Horizontal scaling assumes we have Parallelization at some level. Without Parallelization, the nodes would depend on each other or on some other common service which would become the bottleneck.


Security, Compliance are important topics that we can't cover in detail in this post but should be handled in any cloud architecture. Concern or doubts regarding these two are perhaps the main reason why cloud is not adopted in large corporations. They deserve a separate and a detailed post.



IBM cloud reference architecture
Not sure how much detail I would be able to go into but the majority of "fluff" around the IBM cloud reference architecture can be ignored. Most enterprises don't embark on providing IaaS, PaaS and SaaS in the same breath. It's mostly a business decision and what makes economic sense. However, here is what you can take away from it:


Common Cloud Management Platform (CCMP) consists of Operational Support Services (OSS) and Business Support Services (BSS)

  • Business Support Services represents the business-related services involving pricing, metering, billing, account etc
  • Operational Support Services represents the technical-related services involving, provisioning, Ticket Management, Virtualization Management etc.

Qos (Quality of Service) in CCRA
The non-functional aspects like Security, Resiliency, Performance & Consumability are cross-cutting aspects of QoS spanning the hardware infrastructure and Cloud Services and must be viewed from an end-to-end perspective including the structure of CCRA by itself, the way the hardware infrastructure is set up (e.g., in terms of isolation, disaster recovery, etc.) and how the cloud services are implemented. The major aspects of QoS are:
  • Governance and Policy.
  • Threat and Vulnerability Management
  • Data Protection
  • Availability & Continuity Management
  • Ease of doing business
  • Simplified Operations

Summary

  • It's a starting point. If Cloud Architecture seems daunting then this is a good starting point.
  • It's a good "best practices" document. It does define the collective experience of IBM experiences across various cloud solutions. There's got to be something useful here. :-)
  • It does define four architectural principles(referred as ELEG) of which atleast three seem to of value.


    • Efficiency. Basically means we need to increase utilization of cloud services.
    • Lightweightness. Basically use some form of Virtualization or other technique to avoid "heavy" management of IT. 
    • Economies of Scale. The idea is to have common management services that can be shared across cloud flavors.
    • Genericity. No clue what the message here was! Please read the document and help me! :-) 

Three and Four seem similar and not sure what the differentiating factors are. Maybe we need to get some kind of context around the IBM doc to fully appreciate the message because it will come up in discussions and it will be important to explain the best points of this and avoid the unnecessary details. I think I will get back to this someday...


There are other Reference models out there. NIST has one and so does DMTF. Again, I will get to them someday... :-)