Workloads (Pods, VMs) placement on distributed Kubernetes and non-Kubernetes clusters utilizing Enhanced Platform Awareness (EPA) from Intel.

Shay Naeh
6 min readAug 1, 2019

This blog describes independent work at Cloudify to place containerized workloads on multiple distributed kubernetes clusters, both on-premise and cloud clusters. Generically speaking workload types could vary from bare metal to VMs to containers and to FAAS — in hybrid distributed environments we have a mixture of both Kubernetes and non-Kubernetes clusters. In this work we handle Kubernetes clusters as well as non-Kubernetes environments that could reside on-prem or on a public cloud (hybrid environments).

Motivation

  • Not all Kubernetes nodes are born the same — Especially in the NFV world there are workloads like vRouter, vFW or vDPI that require fast processing of network traffic; therefore they need special hardware that support acceleration and offloading work of the CPU. To come to the rescue, Intel provides acceleration support for intensive network traffic acceleration. Another type of acceleration is provided for CPU intensive workloads, for encryption offloading ,etc. Generically speaking, the intel support is called EPA (Enhance Platform Awareness) and it is supported as well on Kubernetes nodes. One can place workloads with special requirement on special Kubernetes nodes equipped with special hardware acceleration capabilities.
  • Edge environments — As we know already, the ‘big clouds’ aka AWS, GCP, Azure, etc. are eventually broken into smaller clouds to run workloads closer to the edge. The motivation here is essentially latency and the need to avoid a data ‘tsunami’, i.e, you don’t want to send all data points to a central cloud but rather process them at the edge, e.g. video processing. From a placement perspective, it is essential to orchestrate what workloads run on which edges.

How it works

Cloudify, as the global orchestrator, provisions workloads to run on distributed Kubernetes clusters based on a set of requirements and available resources that match those requirements.

As described in Figure 1 & Figure 2, based on different criteria like location, resource availability and special resource requirements, Cloudify provisions a workload to the right Kubernetes cluster.

Yet this is only part of the work- each Kubernetes cluster is composed from multiple nodes, each having different hardware capabilities. Cloudify works with Intel EPA for Kubernetes, which supports multiple capabilities like DPDK, SRIOV, QAT, NUMA, CPU Pinning, and Cloudify knows to map workloads to the right Kubernetes nodes utilizing node labels per Kubernetes node and Node Selectors to match Kubernetes pods to specific nodes.

Figure 1- Global POD Placement

Figure 2- Intent Based Placement

How to Map a Kubernetes Cluster

Utilizing Intel’s NFD (Network Feature Discovery) each Kubernetes node is labeled with a list of hardware capabilities it supports. Figure 3 below shows for example that SRIOV is supported only by node 1.

Figure 3 NFD in Kubernetes, taken from Intel WP

The way we expose labels in Kubernetes is shown in Figure 4. You can see a list of labels per Kubernetes node. This example shows the capabilities of the master node, “master-1”.

We use node selectors in the YAML file, in the POD deployment definition, to map PODS to nodes which support the required capabilities, as per Figure 5.

Non-Kubernetes and Hybrid Environments

As previously mentioned, we can also provision workloads to non-Kubernetes hybrid environments. As shown in Figure 1, workloads can be provisioned to AWS- a VPC environment is instantiated on AWS and a VM is created in that VPC. This VM could be a VNF with special requirements for fast/intensive network traffic processing. AWS’s ENA (Elastic Network Adapter) supports DPDK from Intel — therefore it would be required to install the DPDK driver or choose the right AWS AMI for that. By matching the workload requirements (in this case the VNF requirements), Cloudify places the VNF on the right node in AWS, fulfilling intensive network capabilities.

A mixture of Kubernetes and non-Kubernetes environments could be maintained by the orchestrator. Moreover, these environments could be located on-prem or on any public cloud.

Intent Based Abstraction

By saying intent we mean that we specify the ‘what’ and not the ‘how’. As per Figure 2, we say we need ‘CPU intensive’ hardware but not exactly what and in different environments- it holds different definitions and different parameters. This can make the placement process simple and transparent. The user specifies what capabilities it needs and Cloudify will match it with the right compute nodes and network definitions.

Utilizing TOSCA we can write an intent-based blueprint that abstracts a Kubernetes cluster with its own definitions — or a VM based cluster/ different definitions for on-prem clusters cloud clusters. One only needs to specify the need for nodes with certain capabilities and Cloudify will know to match the right resources and provision the workloads correctly.

Intent based definitions decouple the workload requirements from the underlying environment — nothing needs changing at the higher level of the workload definition. Even if you change the underlay environment where the workload runs and move it to a new environment, Cloudify will look for the right resources and definitions on the new environment and define them based on workload requirements.

TOSCA helps also in the ‘matching’ process. As TOSCA defines ‘Requirements’ and ‘Capabilities’ primitives, where one specifies in the “Requirements” primitive what it needs, e.g. CPU intensive or Network intensive and “Capabilities: it also holds a list of supported capabilities by a compute node. In Kubernetes ‘Requirements’ are defined by node selectors and ‘Capabilities’ by node labels. Hence, TOSCA definitions cover the more generic use case and is not restricted to Kubernetes environments, pods and nodes.

To summarize, TOSCA requirements and capabilities provide the mechanism to define a generic case for workload requirements and map them to nodes that supports the capabilities to fulfill those requirements.

In Kubernetes specifically:

TOSCA ‘Capabilities’ → Kubernetes node Label

TOSCA ‘Requirements’ → Kubernetes node Selector

Figure 4- Kubernetes node labels

Figure 5- Ngnix POD placement with “nodeSelector”

Putting it all together

In the following video I present a provisioning of an Nginx POD on a Kubernetes node with a ‘load_balancing’ capability (in our example this capability supports CPU encryption offloading; Kubernetes nodes supporting this capability are grouped into a special group named QAT, marked with a light blue background) and a NodeJS POD on a generic node. This is done on an on-prem Kuberenetes cluster. I also present the provisioning of VM with ENA capabilities on an AWS created environment.

Figure 6Intel EPA

--

--