Monday, August 31, 2015

How to Bring SDN/NFV into Reality


Unless you've been living inside a cave, or on top of a mountain without any Internet connection, you must have heard or read the news about Software-Defined Networking (SDN). In fact, SDN news pops up too often these days it makes some skeptics start thinking whether it is really real or just another hype in networking industry.

The challenge is it seems like everybody comes with their own definition of SDN. Each networking vendor displays its solution based on each own interpretation of SDN implementation. IETF group called the Interface to the Routing System (I2RS) is still trying to standardize southbound programming protocols and network-wide, multilayer topologies that include both virtual and real elements, network overlays and underlays. Open Networking Foundation (ONF), as a user-driven organization dedicated to the promotion and adoption of SDN, until today is mainly focusing on standardization of OpenFlow protocol. And the rise of new SDN startups, no doubt have created lots of excitement with many innovations within SDN spaces, contributes to the confusion at the same time.


The questions from today's business leaders in companies that consume networking technologies: if we want to embrace SDN, are we on the right track? Which way to go? For some, the question is even more basic: What to do to get to SDN? How to start?

To try to find the answers to those questions, perhaps we should refresh our memory how it was all started. Everybody has heard the story how some smart people at Stanford University initiated Project Clean Slate, to explore what kind of Internet we would design if we were to start with a clean slate and 20-30 years of hindsight. Project Clean Slate led the development of OpenFlow.

According to Wikipedia, OpenFlow is a Layer 2 communication protocol that gives access to the forwarding plane of a network switch or router over the network. In "traditional" network paradigm, control and data (forwarding) plane reside within the physical device such as router or switch. In SDN paradigm, not all processing happens inside the same device. Control plane is separated from the physical device that will run only as forwarding plane.


OpenFlow has four parts:

Controller - resides on a server and provides control plane function for the network
OpenFlow Agent - resides on a network devices and fulfill requests from the Controller
Northbound APIs - enable applications to interface with the Controller
OpenFlow Protocol - the Layer 2 protocol that the Controller and Agents use to communicate

Open Networking Foundation (ONF) was formed to accelerate the delivery and use of SDN by standardizing OpenFlow. Deutsche Telekom, Facebook, Goldman Sachs, Yahoo, Google, Microsoft, NTT, Verizon are the board members, while almost all known vendors in networking industry like Cisco, HP, Juniper, Intel etc. sit as ONF members. That shows how important OpenFlow is.

However, the important point to keep in mind that OpenFlow does not equal to SDN. OpenFlow is just one protocol within SDN paradigm. Cisco once shared sample of new protocols for SDN along with their place in multiple layers: from device API layer, to forwarding, control plane, network services, orchestration and management.


Now, you have read this far and may have the One Most Important Question in mind: who cares? What should I care with all this? Who cares about new protocols in the network? How can it help me with my business? Exactly.

Let's first quickly review who were the first ones making noise about SDN in real network: James Hamilton from Amazon released a famous white paper in 2010 saying "Datacenter Networks Are In My Way". As network engineers we have to admit that auto provisioning technique in networking space is way behind the server space. Automation and orchestration in provisioning new Virtual Machine in Data Center can take only minutes to build, to configure the system and even to install service packages (e.g. Apache, DNS etc) in order to make it ready to provide services to the users. While to provision the new VLANs and subnets, to inject the new prefix into Routing Protocol, to create or re-configure Security policy and QoS... well, most networks today are still using manual provisioning method that require at least days or even weeks to complete those tasks.

Urs Holzle, Senior Vice President of Technology Infrastructure at Google, gave keynote speech at the
second annual Open Networking Summit (April 2012) and mentioned that Google has been using OpenFlow as the protocol for their WAN. That was a pretty big statement and somewhat validated OpenFlow as a viable technology in the SDN space.


You may argue that you are not Amazon nor Google. You don't run Massive Scale Data Center that requires multi-tenancy and very fast auto provisioning for services. And you are still happy with your current WAN since the traditional networking protocols that you use can provide predictable behavior, "self healing" the network will find alternate path when the primary path is down, and your network can provide sub-second convergence time during failure. You don't want to even consider new protocol, or new paradigm, that is not proven yet to replace the current one you've been using for years.

And you are absolutely right. For us mere mortals, nothing has changed in networking since Cisco built their first router about 30 years ago. The "traditional" network paradigm has remained mostly intact, until mid-2012. That was the time when VMWare bought a startup company called Nicira, that created Network Virtualization Platform as overlay network, for around $1.2 billion in total. Rumor has it that Cisco was trying to acquire Nicira too, but outbid by VMWare at that time.

What does it mean?

When major networking players such as VMWare and Cisco are willing to spend more than a billion dollar to acquire a startup in SDN, that's a market validation. That gives a strong signal to confirm that SDN business is real. And obviously that acquisition was able to boost lots of investors to started investing in many other SDN startups.

Software defined networking (SDN), as per Wikipedia, is an approach to computer networking that allows network administrators to manage network services through abstraction of lower-level functionality. To understand this abstraction concept, let's see the different types of Network Programmability.


The traditional network device we know, shown as the most left in the picture, can be managed by the applications using well known protocol from device CLI to SNMP or Netflow. Picture (1) shows how several network vendors have created vendor specific API to access their devices to allow some degree of programmability by the applications, even though we have to follow what is defined in the API. When SDN with OpenFlow comes to the picture (2a) it separates the control plane and data/forwarding plane. This "Pure SDN" approach lets the controller to provide Northbound API for the applications, and to use Southbound API to communicate with network device. Northbound API can be open/standard based or vendor specific, and the Southbound API can be OpenFlow, PCEP, I2RS and other types of open API, or vendor specific API. The result is an abstraction where the application does not need to deal directly with lower level functionality within network device.

Some people argue that all networking technologies that have been built for the past three decades can't be thrown away and replaced by new SDN protocols. Hence they say it's better to keep the network as it is, but have the controller ready to program the network when it's required. "Preserve what works, program when needed". This is the idea behind Hybrid SDN (2b) approach, where the network can continue with its predictable behavior, self healing, and fast convergence, but can be re-program based on the input from the applications. Just like pure SDN, controller can sit between applications and network device to provide the abstraction.


What is overlay networks (3)? When we have physical network built using physical devices and links, called the underlay, then on top of it we create new virtual network topology that can be different with the physical network, then we get an overlay network. It is not a new concept in networking. GRE tunnels or MPLS VPN are some example of overlay network we've been using for years. That idea is being sold by the vendors who sell overlay networks as SDN solution: leave the underlay network as it is, and build completely new virtual network on top of it.

Before we start looking for the answer of which network programmability is the best, let's discuss NFV quickly. Network Functions Virtualization (NFV) is a network architecture concept that proposes using IT virtualization related technologies to virtualize entire classes of network node functions into building blocks that may be connected, or chained, to create communication services. So instead of having dedicated hardware for router, switch, firewall, cache engine, DPI, BRAS, NAT etc. we can run those network functions as software on top of common x86 hardware.

That's not a new idea either, is it? Before network devices use dedicated hardware with either "merchant silicon" off-the-shelf chip or vendor's designed-specific ASIC, we used to have Linux firewall or Linux router. So we used to have network functions running as software on x86. What's new? NFV brings the idea to not only virtualized the network functions but to connect them as well, known as "Service Chaining", in order to provide benefits to the business such as time reduction for service provisioning, and dynamic and elastic scale for network services.


Obviously not all network functions make sense to be virtualized. For Network Forwarding function with high throughput, high bandwidth, stateless and predictable, using designated hardware or custom Network Processing Unit (NPU) is a better way. But for many Network Services function with low to medium throughput, stateful and unpredictable, this is the area where virtualization on x86 can shine. In short, when you need more muscle functions, it's better to use special designed hardware. When you need more brain functions, it's better to virtualized the functions.

Now we reach the $1 million question:
How to bring SDN, NFV, Overlay, Programmability etc into reality?

Based on my experience working in this space the past few years, to make any SDN/NFV solutions successful in real world:

1. It must solve Real business problem

Different customers have different business problem to solve. Separation of control plane and data plane as offered by OpenFlow may not be interesting to others outside research or academic institution.


For Massive Scaled Data Center or SP Data Center or Cloud provider, what matter is the automation of new services to scale the multi-tenancy environment. Deep programmability or network overlay may be interesting to support application mobility. For Service Providers that have invested heavily on the WAN infrastructure, what important is to have deep network programmability to monetize the current infrastructure. WAN optimization is another requirement, as well as providing SLAs to their customers. For Enterprise customers, especially the ones that only use network infrastructure to support their core business, the requirement is to have auto provisioning and device config management to provide services.

2. It should be orchestrated to deliver business outcome

SDN to separate the control plane and data plane, and NFV to have network functions and software running on any open standards-based hardware, mean nothing if they don't deliver business outcome. A new component called Services Orchestration should be used to provide automation, provisioning and interworking of physical and virtual resources in order to enable new service offering.


The industry agrees that the combination of these capabilities offer business a path to agility and dynamic networks. To optimally benefit from virtualization, the intersection of all three must be addressed – and on a carrier grade level which means services can be scaled up as well as down easily with complete reliability and security. The confluence of SDN, NFV, and orchestration takes advantage of cloud economics and provides service automation so we can nimbly address new market opportunities and dynamically modify existing services to best benefit the business. With this approach service innovation dramatically changes from weeks and months to minutes and days.

3. It has to follow industry standard and open architecture reference

We don't have industry standard yet for SDN architecture reference. However, ETSI - European Telecommunications Standards Institute has already published NFV Reference Architectural Framework as shown below.


The three main components of the reference architecture:

- VNF is the actual Network Function application, for example virtual router, virtual switch, virtual firewall, virtual load balancer, virtual web filter and so on, along with the Element Management System (EMS)
- NVF Infrastructure consists of virtualization layer that provides abstraction between the physical resources (compute, storage, network hardware) and virtual resources, where we can run the VNFs
- NVF MANO includes Virtualized Infrastructure Manager as resource manager to manage NFVI, VNF Manager to manage the life cycle of the VNFs, and the Orchestrator manager to orchestrate the overall solution

ETSI even defines formalized NFV use cases with potentially virtualized functions.


4. It should be modular, loosely coupled and extensible

Any SDN/NFV solutions will be built using various components. Modular architecture provides benefits since it can be scaled and extended when it's needed. One important note here is: every component should be connected as loosely coupled, and not "locked". It means the solution is expected to run in multi-vendor environment and will have lots of integration to build the whole system.


The picture above shows an example of SDN solution architecture. All physical and virtual resources are managed by Management and Orchestration box, that consists of network control, compute and storage control and service control, with cross-domain orchestration to manage all of them. Customer facing service portal is common to provide self service to the users. Management and Orchestration box provides northbound API for the application, and use various southbound API from all the control layer to the resources. Many other boxes like billing system, trouble management, inventory and service assurance can be attached to have the end-to-end solution.

5. It has to use Open API

In order to make sure all components can interconnect without any vendor locked-in, the solution must use Open API between each component. The Northbound API between Management/Orchestration and Applications, for example, can use REST API, NetConf, SOAP or even XML RPC with standard data format like XML and JSON. The Southbound API from Management/Orchestration toward compute, network and storage resources can use NetConf, OpenFlow, SNMP, Telnet/SSH or even device specific API as long as it is open.

6. It must have continuous development and support, and embrace FOSS

There are many components from SDN solution that can embrace Free and Open Source Software (FOSS). The operating system for the Compute resource, for example, can use Linux. Virtual Infrastructure Manager can use OpenStack. Network controller can use OpenDaylight. All programming languages used like Python or Java are free to use. The main benefit to embrace FOSS components in the solution is we can get continuous development and support from the community, and we don't need to create all the components in the solution only some focused components that we think matter the most.


Do you think there is SDN/NFV solution today following the points above?

I can give you one example of SDN/NFV solution that complies with all the six points above. And you may be surprised when I tell you that kind of solution comes from a network vendor that is known to be closed source: Cisco. Yes, Cisco with its Virtualized Managed Services (VMS) actually follow all the points in the list as explained below:

1. VMS solves SP's real business problem in managed services

SPs that offer VPN solution have seen the strong adoption by large and medium enterprises, however the aggregate growth rate is pretty much stable (around 5-7% in many countries). This requires them to find the blue ocean, a new market with expansion opportunity such as Small and Medium Business. However, SMB is very price sensitive and has adopted public cloud services to shift the workload. Huge number of SMBs provides indeed big opportunity, however the solution must be attractive from pricing perspective, time to market/provision the service, and integrated or bundled with other service offering.

2. VMS is orchestrated to cut down time to provision

VMS uses the idea of cloud-based managed service by moving the network functions from CPE to SP cloud.


The CPE is moving to be less and less intelligent, to a point where it becomes L2 network insertion device only. Virtualized Network Functions in SP Cloud are service chained to provide new service offering to the customer. All initiation and configuration of the VNFs and CPE are done automatically to provide agile service provisioning to the end customer. The CPE even uses zero touch deployment mechanism to reduce the requirements to have a technician sent to the location only to fix the device. Once the CPE is powered up, it will call home to register itself to the system and start building encrypted tunnel to its assigned service chained VNFs.

3. VMS follows ETSI architecture reference

I can only provide the following diagram to show Cisco VMS architecture.


However, I can mention that Cisco VMS follows ETSI architecture reference:

- Cisco has extensive list of VNFs from virtual router to virtual web filter, and there is nothing can stop the solution to use third party VNFs
- NVF Infrastructure uses Linux KVM as virtualization layer, running on top of physical hardware resources for compute, network and storage. Again, even though the solution is built using Cisco UCS and networking products, it can accommodate third party hardware as well
- NVF MANO includes OpenStack as Virtualized Infrastructure Manager, Cisco ESC as VNF life cycle manager, and Cisco NSO (previously known as Tail-f NCS) as the Cross Domain Orchestrator

4. VMS is modular, loosely coupled between components, extensible

VMS solution uses modular architecture. There is no locked-in between components. The solution can be integrated with customer existing setup. It can also be extended beyond the main functionality when needed.

5. Yes, all APIs between VMS components are open

The self-service portal can send the service request to orchestrator using Netconf. NSO as orchestrator can use NetConf or REST API to instruct Cisco ESC to initiate the VNFs, which requires ESC to tell OpenStack either with REST or Nova Client API to spin up the VMs. Cisco NSO can provision the configuration of network devices either using NetConf, SSH, REST or device specific API.

6. VMS embraces FOSS

As you can see above, Cisco VMS embraces FOSS like OpenStack, Linux, KVM along with open API to connect them to Cisco specific components. By doing that, Cisco can focus on the component that matters the most like the orchestrator. And innovation with life cycle manager or end-to-end system integration is the competitive advantage from Cisco compare to the other vendors.


Before you accuse me as Cisco fanboy, let me make a statement first: I'm not trying to sell Cisco solution here. Cisco has the best sales force in the market to do that. I just want to give example that even a market leader like Cisco is willing to adapt; to change from closed source to be more open as shown in its VMS solution. This is a big shift in networking industry. And I believe we will see more and more solutions that are inline with the six points above.

And if you are in a company that consumes networking technology, next time any networking vendor comes to offer their SDN solution, you can ask them questions if the solution offered really solves your business problem, how the solution is orchestrated to deliver business outcome, whether the solution follow industry standard and open architecture reference, ask if the solution is modular so can be extended easily without any locked-in to specific vendor, check if it supports open API, and if they have plan to embrace FOSS to ensure continuous development and support.

That, I believe, is the only way to bring SDN/NFV into reality.