ZeroStack’s answer to cloud security and high availability

ZeroStack has many enterprise-grade security and high availability (HA) features that have been designed into the product from the ground up. In this blog post, we will cover how ZeroStack addresses a few of these features.

First a quick overview of the ZeroStack components

ZeroStack Cloud Operating System (Z-COS)

ZeroStack cloud operating system, called Z-COS, runs on each server located on the customer’s premises behind their firewall. Z-COS converts this cluster of industry-standard x86 servers into a hyper-converged, scale-out cloud.  Customers can get  ZeroStack software only and run it on their own servers  (see the Minimum Hardware Specification) or get ZeroStack’s turnkey appliance(s) Zblock.  The Z-COS consists of the following components (more details in this Z-COS data sheet):

ZeroStack SaaS monitoring and management portal (Z-Brain)

Z-Brain is the ZeroStack SaaS service for monitoring, alerting, resource management and orchestration portal which runs in a secure tier-1 datacenter managed and controlled by ZeroStack. The Z-brain consists of the following components (more details in this Z-Brain data sheet):

High Availability

ZeroStack provides high availability at several layers: control plane, storage, VM, networking, and multi-region.

Control Plane HA

ZeroStack’s HA model uses a controller-less architecture.There is no need to configure HA on separate controller hosts. The ZeroStack control plane is distributed and is designed to be self-healing and highly available from the ground up. ZeroStack recommends a minimum of 4 hosts to retain full quorum and allow for a full host failure. When any host in the system goes down, the control plane automatically migrates services to another host without any downtime to the workloads running on the cluster.

Storage HA

ZeroStack provides replicated SSD and HDD pools for data protection. These pools are built using local disks across multiple compute hosts and replication is done so as to protect against both host and disk failures. We use replica count of 3x objects, the failure of a host will retain full storage replication capability, and can still survive one more host failure before reaching “degraded mode”.  Of course, the 3x replica count also provides full protection against one disk failure and can still survive another disk failure without loss of data.

Network HA

Each host has 2 x 10GBase-T NICs and each NIC is connected to a different ToR switch in an active-active configuration using the Link Aggregation Control Protocol (LACP). This configuration protects the cloud from down time when there is a failure, reboot, or software upgrade of a single ToR switch. Such an event results in temporary performance degradation instead of an outage until the switch is back in operational mode.

VM HA

ZeroStack also provides VM high-availability, which protects user workloads and protects user VMs from becoming unavailable when hosts fail. In the event of a host failure, ZeroStack automatically brings up the VM (tagged HA) on another available host.

Multi-region HA

For more high availability options, ZeroStack’s infrastructure is organized into Availability Zones and Regions. Each AZ represents a fault domain, the failure of which doesn’t impact workloads running in another AZ. Similarly, a Region can be a geographically-distributed site that can provide for both disaster recovery as well as better performance and data locality. ZeroStack provides automated placement policies to schedule workloads in a given AZ or region and assigns a given storage pool type based on protection requirements.

Remote replication across sites can be done using external storage partners today. For those countries and dominions where all the customer data and management/operations have to be in-country, ZeroStack can provide the SaaS portal access locally with all the compute and data assets located within the country.

Z-Brain availability

The Z-Brain itself is engineered using the same HA principles described above. However, even if the Z-Brain is nonfunctional or is under maintenance, or otherwise not be reachable in the event of a network outage, the on-premises systems and workloads will continue to operate without any interruption. Furthermore,

Both the API and CLI are accessible and can be used to run against ZeroStack by pointing to each region’s local HA Proxy IP Address.
New workloads can be provisioned using API or CLI on-premises

Security
ZeroStack uses a variety of tools and techniques to ensure that the on-premises customer infrastructure and data are secure and safe. Here are some of the key security features.

Physical and network isolation

The servers running the customer workloads and storing the data are physically located on-premises in a customer location behind their own firewall.  Z-Brain or any other client do not have direct access to the on-premises servers. The only way for Z-COS to send meta-data for monitoring to the Z-Brain is when the host connects out over HTTPS to a validated Z-Brain server. Additionally, the base OS running on each server is protected by a userID and password.

Customer data stays on-premises

All of the customer-created compute instances, data volumes, networks, and object store data are stored on the on-premises hardware. In addition, image and application templates are also stored on premises. Only cloud, business unit and project administrators with the appropriate credentials and roles will have access to the assets in the system. All the passwords and certificates are also stored on premises, where the authentication happens either locally on ZeroStack or via Active Directory or LDAP if it is configured.

Security of Communication Between On-premises and Z-Brain

All communication between the Z-Brain and the on-premises servers uses HTTPS to encrypt traffic.  ZeroStack establishes only outbound connections through a single port 443 which is already open for HTTPS Internet traffic. No inbound connections are established and no new ports need to be opened on the firewall – this ensures an added level of security and reduces vulnerability to attacks.

Events, stats and metadata about VMs, storage, and usage information are the only data (encrypted) that is transferred to the cloud for monitoring, analytics, planning and chargeback features.

The diagram below describes how HTTPS traffic between Z-Brain and Z-COS is encrypted and authenticated using a certificate based client-server authentication system. Once the Z-Brain verifies the identity of the incoming request, traffic moves on the outgoing encrypted https port 443.

Optionally, an external HTTPS proxy service can be used if required by customer’s security policy. ZeroStack recommends use of authenticated proxy connections and to ensure that the Proxy does not alter the HTTPS protocol mechanism in any way.

 

At ZeroStack, we are committed to deliver a self-driving private cloud designed from the ground up to protect confidentiality, integrity, and availability of our customers’ systems and data. We would love to hear your thoughts on these features.

A comprehensive ZeroStack security guide is available for download. Please check it out here

 

Leave a Reply