Distributed Orchestration at Scale

6 min readApr 4, 2022

Introduction

This blog describes the concept of Cloudify Spire, which is a distributed orchestration at scale to support new use cases like Edge and IoT. Distributed orchestration needs to support multiple managers in multiple sites and multiple geographical regions.

This blog describes a three step process, as seen in the figures below:

Discovery — in this case we discover Kubernetes clusters on AWS
Sub-manager installation on discovered clusters
Service/Application deployment on sub-managers

We also introduce a new concept of Bulk Deployment, where you can deploy a service on multiple sub-managers at once.

You can also run Day 2 operations on existing sub-managers by utilizing the same model.

We start by discovering all available Kubernetes clusters in your cloud environment. You can choose on which ones you would like to install Spire sub-managers.

Spire sub-managers are deployed as Helm Charts on Kubernetes clusters.

The screenshot below shows all the blueprints that are required for discovery, sub-managers installation and workload deployments on sub-managers.

Discovery of target Kubernetes Clusters

The first step is to discover Kubernetes clusters.

We can see in the figure below how we start the discovery process.

On the right side we see the “Discover and deploy” AWS sub-menu.

On the left side you can see the discovered Kubernetes clusters. Each discovered Kubernetes cluster is turned into an EaaS (Environment as a Service), which exposes credential attributes, like tokens and secrets, for connecting to the cluster.

Each environment exposes its own unique attributes.

After the Kubernetes clusters are discovered we are ready to install sub-managers on them. We deploy sub-managers as a batch operation on all the chosen clusters. In other words, with one operation we deploy all sub-managers on target Kubernetes clusters.

Sub-Managers Deployment

The figure below shows the deployment of two sub-managers.

The deployment includes multiple steps and dependencies as seen on the task execution graph, on the right side of the figure below.

Cloudify sub-managers are installed as Helm charts, using the Cloudify Helm plugin. It involves several steps, like refreshing the AWS Kubernetes cluster token, getting the Helm repo and deploying it on each one of the selected clusters. This is done by fetching each Kubernetes cluster credentials from the EaaS environment attributes.

The output of this process is the creation and deployment of Cloudify managers on the chosen Kubernetes clusters as shown in the figure below.

You can also view the deployed sub-managers on a geographical map, as seen in the figure below.

Sub-managers Labels

To be able to place a given workload on the right sub-managers, we use labels to tag sub-managers. You can choose, at creation time, any name as a label and assign it to a sub-manager.

When deploying a new workload we specify which labels it should match and the workload will be deployed only on those sub-managers that have this specific label.

Capabilities and Outputs

Deployments expose output capabilities that can be consumed by other deployments. This feature allows automation of large scale environments and workloads.

As you can see in the figure below, the Cloudify sub-managers URI is exposed as an output capability. In the next step, when we run a batch deployment, we will iterate over each sub-manager and install a new workload onto it. We use the sub-manager URI as the endpoint to access the sub-manager endpoint APIs, for creating new workload blueprints and deployments on that sub-manager.

Deploy on — Batch Operations

In this step we created a simple web app, source code could be found in this Git repo. We used the AWS flavor of it. This web app is installed on each sub-manager that has the label WebApp.

For deploying it in one operation, we utilize the “Deploy On” feature as shown in the figure below, on the upper right side. This is a Bulk Operation and it deploys the web app into multiple sub-managers.

When executing the “Deploy On” operation you can choose what workload you want to deploy and where, as can be seen in the second figure below. You choose the attributes like blueprints, labels, deployment inputs, execution parameters, etc.

In this example we chose app-workload, which is a blueprint deploying a simple web app, an apache2 web server and a simple “Hello-World” home page..

We choose to deploy it on all the managers that have the label “webapp”. This label can define a geographic location, linux or Kubernetes version, or any meaningful value you assign to it.

After the match-making is done, the deployment starts.

Workload deployment on Sub-Managers

Now, after initiating the deployment process, we wait till the web app deployment is completed.

In our example it is a simple VM and a Web server deployed on AWS.

The deployment includes the creation of a VPC, a NAT gateway, subnet, Elastic IP, EC2 VM instance and Ansible playbook that install the apache2 web server. The full example can be downloaded from here.

The figure below shows the workload’s execution task graph. We see that everything is green, meaning installed successfully.

The figure below shows the workload topology, the resources that are created and their relationships.

The last figure shows the workload web server Hello World page.

Summary

In this blog we demonstrated the Spire concept of distributed orchestration of multiple sites and geographical regions, utilizing a master manager and sub-managers.

The process includes three steps:

Discovery of Kubernetes clusters on AWS
Sub-managers installation
Application workload deployment on sub-managers

Sub-managers were deployed from the master manager as Helm Charts on Kubernetes.

Sub-managers expose capabilities — we used the sub-manager URI capability to deploy workloads on it.

Sub-managers are also tagged with labels. When deploying workloads we specify on which sub-managers they are deployed, by utilizing the sub-managers labels. Workloads are only deployed on sub-managers that match a given label (“webapp” in our example).

We also demonstrated the concept of a “Bulk Operation”, deploying a workload in one operation, on multiple sub-managers.