ZeroStack Low Level Architecture and Key Features For Admins

ZeroStack Company Overview
Cloud computing technology is becoming so advanced that managing all the resources as part of the infrastructure is getting harder with a traditional IT model. Existing solutions are harder to manage both in terms of cost and complexity.

Soon, data center operations will use artificial intelligence (AI) so advanced that it can monitor hardware components, software services, anticipate issues, decide the right course of action and deliver changes faster and more accurately than any human, or group of humans, possibly could. On-site management will be performed through intelligent software and cloud-based AI services, effectively eliminating the integration, operations and management overhead needed to run the infrastructure. Companies will consume their on-site infrastructure through a web portal—similarly to how they interact with AWS, Azure and Google today.

Pioneering self-driving cloud
ZeroStack is at the forefront of this trend. ZeroStack uses intelligent software to deliver a self-driving, fully integrated private cloud platform that offers the agility and simplicity of public cloud with the control and performance of a private cloud at a fraction of the
cost. We leverage advances in distributed computing, computer science and AI to self-manage your on-premises cloud, allowing you to focus on your core business rather than cloud operations.

The heart of the solution is an on-premises distributed control plane that works in conjunction with a cloud brain to provide a software-managed private cloud. The on-premises control plane takes care of immediate problems and failures. On the other hand, the cloud brain is built using a big data cluster to collect, analyze and guide decisions over the long term. Change events, statistics and health checks are relayed up to the cloud brain to do event processing, issue alerts, and help increase infrastructure and workload automation, which improves mean time to recovery, for example. Your existing administrators can monitor and manage application workloads, performance, and capacity planning for VMs, CPUs, storage, and networking, so there’s no need for them to understand all the complex details of building and monitoring a cloud.

Customers deploy ZeroStack for a variety of reasons: to reduce complexity, increase agility, and in many cases to control OpEx. Whatever their motivation, they all agree that the future of the datacenter has arrived sooner rather than later.

ZeroStack’s Intelligent Cloud Platform

On premises, ZeroStack’s cloud operating system converts bare-metal servers into a reliable, self-healing cloud cluster. This cluster is consumed via a self-service SaaS portal. The SaaS portal also collects telemetry data, and the cloud brain uses artificial intelligence to
create models that help customers make decisions about capacity planning, troubleshooting and optimized placement of applications. The integrated App Store enables one-click deployment of many applications that provide the platform for most modern cloud-native applications. This solution is fully integrated with public clouds to offer seamless migration between clouds.

ZeroStack – the Industry’s First All-in-One Scale-Out Private Cloud

ZeroStack uses smart software and machine learning to deliver a self-driving, fully integrated private cloud platform that delivers the agility and simplicity of public cloud at a fraction of the cost. ZeroStack is a ‘next-gen’ cloud computing company, offering a private cloud solution that is easier to configure, consume and manage than any other technology on the market, and which offers the best of both private and public clouds in a single solution that combines on-premises deployment and a SaaS platform. ZeroStack enables customers to consume an automated, self-driving, self-service cloud without significant in-house expertise or time required.

The ZeroStack Cloud Platform represents an industry first with a complete, scale-out private cloud that converges compute, storage, networking and management software with self-healing capabilities. The on-premises hyper-converged system is combined with a sophisticated SaaS-based operational platform that provides monitoring, troubleshooting, capacity planning and chargeback using a powerful analytics engine. The analytics engine enables predictive failure detection with proactive resolution. Another industry first and a unique facility is that ZeroStack can migrate workloads and/or production into and out of the public cloud such as AWS,
as required. This unique combination significantly reduces the time and complexity of deploying and running private clouds, with 40 to 60 percent lower cost of ownership than existing solutions, and AWS.

Many organizations looking to leverage cloud technologies and benefits are choosing private clouds over public clouds because of security concerns, regulatory factors, performance, visibility, data gravity or other reasons where an internally-hosted cloud simply better suits their needs. However, until now, the lack of an all-in-one private cloud solution has forced enterprises to piece together complex software stacks with hardware and face higher operational costs to manage it all. Now, for the first time, organizations can set up a hyper-converged, scale-out private cloud infrastructure in less than 30 minutes, designed to grow with their needs through self-healing, cloud-based management, monitoring, analytics, capacity planning and chargeback – and available with open APIs to facilitate further levels of automation, operation, integration and benefit.

ZeroStack Architecture

ZeroStack’s solution consists of two key components: (1) a cloud operating system called Z-COS and (2) a monitoring, management and orchestration portal called Z-Brain. The cloud operating systems runs on each server installed on a customer’s premises. The Z–Brain
runs as a SaaS service on ZeroStack’s private cloud.Next, we explain each of these components in more detail.

Z-COS Design

The ZeroStack operating system is a very stripped-down version of Linux that is optimized to run VMs. It consists of a built-in KVM hypervisor, drivers for supported storage and networking devices, some key OpenStack services and ZeroStack code to create a cloud cluster by pooling resources across all servers. Z-COS can be installed on any industry-standard x86 hardware with minimal requirements in terms of CPU, memory, storage and networking devices. It allows for very flexible configurations for these resources. Z-COS helps create a hyper-converged, scale-out system using a cluster of servers running on-premises. Customers can start with a minimum of three servers to create a highly available private cloud and add capacity on demand. ZeroStack provides its own hardware appliance as well, called Z-Block, with different configurations. CPU and memory resources are local to each server but can be consumed as a larger pool, and our software automatically does the initial placement of workloads across hosts in an intelligent manner. Furthermore, storage and networking resources are combined together as a clustered resource to provide reliable software-defined storage and software defined networking with features like micro-segmentation and per-VM firewalling. Before going into clustering, let’s look at the storage and networking setup on each server.

Software-defined Storage

Z-COS takes over all the local disks attached to a server. Typically, it expects both SSDs and HDDs to be present in a server. On the first disk, it creates a LVM partition of about 300GB. This partition is used to install the host operating system, ZeroStack software and also for logs. The partition is further divided into multiple logical volumes for operating system, data and logs. It creates two operating system volumes to allow for seamless upgrades from one version to another. Having two volumes allows our software to always install the latest code in a separate partition and reboot into that partition after upgrade. This allows for a non-disruptive upgrade and also an easy rollback procedure in case of any failure. The remaining disks are also formatted by Z-COS and a single partition is created on each disk to be used either as a local disk or as part of a shared storage pool.
Figure 1 shows the typical disk layout, once Z-COS is installed on a server with 2 SSDs and 4 HDDs.

Here /dev/sda1 has a LVM with different volumes for OS, software, data and logs. Once Z-COS is installed, during the cloud create process each disk is either put into a local storage pool or a clustered storage pool. A disk that is in local pool provides local storage to VMs and the data on that pool is not replicated. A disk in a clustered storage pool provides a pool with replicated data. Any virtual disk on that pool is protected against server and disk failures. By default, our software creates two more replicas for each block in that pool, so the storage sub-system can tolerate two disk failures simultaneously. Also these replicas are done across servers, so that it can also tolerate server failures.

Figure 2 below shows a possible configuration of disks in a four-node cluster with 24 drives with four pools: Local SSD, Local HDD, Shared SSD, Shared HDD.

Here, one SSD per host is part of the local pool, which is not replicated. One SSD per host is part of the shared pool, which creates a shared SSD pool with four drives. Similarly, one HDD is part of the local HDD pool per host and three are part of the shared HDD pool. So, therearetotal of 12 drives in a shared HDD pool.

Users can further configure and drop the local pools if they want. Also, if there are no HDDs or SSDs in the servers, the Z-COS software will skip creating those pools. This allows complete flexibility in terms of leveraging local disks to create different storage back-ends for various workloads. Here, local pools are useful either for dev/test workloads or for NoSQL databases that do their own replication at the application level. For example, Hadoop, Cassandra and other such workloads do not need shared and replicated storage underneath them.

Software-defined Networking
Z-COS takes over all the network interfaces on the host. We expect each server to have at least two NICs. For a server with more than single NIC, Z-COS combines all of them together into a bonded NIC and creates a vswitch called zs-ex (short for ZeroStack external) on top of that. There is no IP address assigned to the physical NICs directly; the vswitch has a port called zs-ex, which contains the IP address of the host.
In addition to standard host network, Z-COS uses four different subnets for different traffic types:
1. ZeroStack cluster network: used by ZeroStack services
2. Management network: used by all the management services, including OpenStack
3. VM Tunnel network: used by all the east-west VM traffic
4. Storage network: used by clustered storage for all the storage I/O traffic

Once the user enters these subnets as part of the cloud create process, ZeroStack creates a port for each of these on the vswitch on the host and assigns an IP from the corresponding subnet range. The assignment of IPs to hosts is done by ZeroStack software and it is fixed for
the lifetime of the host.

One can see the bridges and ports using:
sudo ovsvsctl show

To check IP addresses, one can use:

We create an overlay with GRE or VXLANs on top of physical layer 2, layer 3 networks. When a VM is provisioned, it is connected to a private network. These private networks have their own subnets and can even have overlapping subnets. The traffic between VMs is tunneled across hosts using GRE and/or VXLAN tunnels, and the traffic on one private network is completely isolated from other private networks. We also provide per-VM firewall services using security groups and rules at each hypervisor.

On each server (or host), as part of the Z-COS installation, a control VM is also installed. It is called zvm, which is short for ZeroStack vm. All of the clustering code and Openstack services are run inside the zvm, except for some node-level services that need to be on each server.

The following command will show the list of VMs running on each server and also the zvm.
sudovirsh list

Each zvm has an IP address in the same subnet as the host and it also has a private network using a local bridge on the host. Having a zvm completely isolates the ZeroStack and OpenStack services from the actual user VMs that run on the host. All the Openstack services run across zvms on different hosts.

Host Configuration
Once the Z-COS is installed, one can set the IP address on the host using the following command:
zcli network set zhost-ip=<host ip> zvm-
ip=<zvm ip> gateway-cidr=<gateway IP in
CIDR form> dns=<comma separated list of
dns servers> ntp=<ntp server IP>

The ntp parameter is optional.

Here is an example:
zcli network set zhost-ip=
zvm-ip= gateway-
cidr= dns=

If Zerostack servers are part of a vlan, you can also set the vlan id using the wire-vlan parameter in the same command. This parameter is otherwise optional. In that case, the command will look like:
zcli network set zhost-ip=
zvm-ip= gateway-
cidr= dns= wire-

Similarly, one can change the hostname using command
zcli set-hostname <hostname> –local

Once you set the hostname, make sure to reboot he host.

In most cases, we use outbound https access to talk to Z-brain. In cases where that port is blocked, we can also use an http proxy server. To set up the http proxy server, use:
zcli proxy set <proxy-url>

One can then test if it was set correctly using:
zcli proxy get

Each host also runs an agent to collect stats periodically and to monitor all the hardware components within the host. These stats and hardware health are sent to zvm, which then sends the necessary stats and events to Z-Brain.

Clustering and Self-healing
So far, we have looked into a single server’s settings. But cloud is more interesting as a scale-out system where resources from multiple servers are joined together to form a single large pool of resources that can be consumed in a multi-tenant manner. ZeroStack runs a cluster manager process that configures the servers to act like a single cluster. That manager runs inside each zvm and works in conjunction with other managers running on each server inside the zvm. The manager exposes an API to add nodes, remove nodes, configure nodes and unconfigure nodes in a cluster. Furthermore, the manager knows all the services that need to be running in a cluster to provide a cloud API. This manager creates a mapping of each service to one or more ZVMs and brings them up in the desired order. For any cluster-wide operation, one of the cluster manager processes acts as a leader to carry out that operation. This avoids any conflicts in doing that operation. Other manager processes watch the leader process and if that fails, one of the other cluster managers automatically becomes the leader using a leader election process. Once a service starts, the manager on that zvm monitors the health of the service periodically. This health information is periodically sent to the leader. If the leader detects any service failures, it instructs the previous node to actively stop the service and brings up the service on some other node.

The final architecture is shown in Figure 3:

The on-premises cloud built using supported servers or Z-Blocks is configured, consumed, and managed using the ZeroStack SaaS platform. The ZeroStack SaaS platform enables self-service, rapid addition of features without forklift upgrades along with multi-site and multi-tenant deployments and zero touch operations. The analytics and monitoring built into Z-Brain takes the guesswork out of design, ongoing management and troubleshooting. The SaaS portal also collects telemetry data and uses artificial intelligence to create models that
help customers make decisions about capacity planning, troubleshooting and optimized placement of applications.

The integrated App Store enables 1-click deployment of over a hundred applications that provide the platform for most modern cloud-native applications. Z-Brain is fully integrated with public clouds to offer seamless migration between clouds. The result is a highly agile, cloud-managed datacenter that offers ultimate control and governance and dramatically accelerates time to value while keeping costs to a minimum. The Z-Brain consists of a large cluster of VMs running with local storage. These VMs create a large data storage cluster to collect, store and analyze telemetry information. The services running inside these VMs also provide the REST-based API for the Zerostack web UI. The management, upgrade and availability of this platform is completely handled by the ZeroStack team.

The final architecture with all the services and ZeroStack Z-Brain and Web-based UI is shown in Figure 4. Here, ZeroStack control plane and monitoring agents run on-premises. The control plane runs as part of a cluster managed and a lightweight agent runs on each host. The control plane brings up all the services and monitors and migrates them if needed. The images are stored as part of the clusterted storage pool. Cinder uses both the local and shared pools to create volumes, and an external NFS store can be used to take backups. Finally, the ZeroStack web-based UI uses the Openstack services API from Nova, Cinder, Glance, Neutron and Keystone to do VM provisioning and resource orchestration. Health information, events and stats data is sent from the ZeroStack control plane to Z-Brain, which provides an API to access that information in the UI. Similarly, Heat service running on-premises uses various APIs to launch applications from the App Store or otherwise.

Key Features
ZeroStack’s architecture provides some key features that are not possible with other solutions in the market today.

1. Bare-metal to Cloud in 30 minutes
Once Z-COS is installed on a set of servers and they are given a local IP address, a complete private cloud can be built on those servers in less than an hour. All this is done using three simple steps:

a) Create an account on ZeroStack’s web portal
b) Add account id to the servers using zcli
c) Log in to the ZeroStack portal and go through the cloud creation

2. Software-based manageability of the cloud services
ZeroStack Z-COS software takes care of monitoring all the critical cloud services running across zvms and servers. Software and hardware failures of any kind are handled by Z-COS software.

3. Built-in monitoring, alerting and dashboards
With exisitng cloud solutions, customers have to install a separate monitoring and operational tool to monitor all the services, get alerts and do capacity planning. With ZeroStack, all this is provided as part of SaaS platform as a service, so there is no need to install a local cluster with such tools. Furthermore, the SaaS portal provides analytics, alerts, and dashboards to help with decisions.

4. Patching and upgrades delivered through SaaS
All the patches and upgrades are delivered via the SaaS portal. Customers are engaged in choosing when to schedule the upgrade, but ZeroStack software runs a state machine as part of the SaaS portal to take care of the upgrade process. TheSaaS portal software is upgraded automatically every two weeks, and new features are made available to the user when they log in the next time.

Leave a Reply