Posts in category presentation-notes

Mark Medovich on Software Defined Networks

Mark Medovich, Juniper's Chief Architect for the Public Sector, gave a very interesting talk at the Kansas City Software Defined Networking Luncheon, hosted by FishNet Security.

It directly addressed a gap in my knowledge regarding our new NetworkInnovation project, where we plan to "evaluate the applicability of GENI software defined networking and OpenFlow/Openstack technologies to support the secure transmission and storage of personal health information with the Google Fiber network."

I have enough experience as a customer of VM clusters to have vague notion of what OpenStack is about, but I'm brand new to OpenFlow and software defined networks. My first thought on exposure to them was: so what happened to The Rise of the Stupid Network?

Medovich explained that OpenFlow development was driven by high performance computing (HPC). Researchers are trying to reduce the latency for moving data between compute nodes.

The term "switch fabric" flew by... one of many buzzwords that I'm slowly picking up. "If we are going All L2..." I recognized L2 as a reference to a layer in the OSI model, but I didn't remember much about it, and I didn't have enough connectivity to look it up. Afterward, I reminded myself that it's where switches live, as opposed to hubs below and routers above.

"Networks within networks" was another phrase that caught my attention. It appealed to me as like scale-free design in Web Architecture. It reminded me of heated discussions in the IETF about the evils of NAT vs. the end-to-end purity of IPv6. The people I trust were on the IPv6 side (and IPv6 is great for lots of other reasons) but as I reflected later, the idea of one big flat IPv6 network seems like a monoculture, not scale-free.

He talked about multi tenancy data centers:

photo of "Multi-tenant flows within an end site" slide by Medovich

He used an example from when he was at Sun, visiting CVS Caremark: they had to provision for the monthly Medicaid Monday burst, which left a lot of excess capacity for most of the month. My understanding is that Amazon's cloud services came about roughly the same way: they have to provision for Christmas, which left them with a lot of spare capacity most of the time.

Traditional three-level networks are OK provided capacity is reasonably predictable, he said, but they don't deal with dynamic demand.

 "photo of "SCALING Multi-tenant SERVICES" slide by Medovich"

You can't make service level agreements (SLA) for dynamic demand with traditional networks; the best you can do is a service level probability (SLoP).

This brings us to OpenFlow. "Open flow is all about the data center and making virtualization better."

He introduced it using a slide from the OpenFlow Presentation:

"Here's the problem: that step 2. Encapsulate and forward to controller. What controller?" The controller isn't specified, he said.

Variability of OpenFlow devices (switches from this vendor or that) introduces too many variables. The only way the Juniper engineering team could see to make the scaling work was an any-to-any switch fabric. They had to collapse the network, from 3 tiers to 2 to 1.

People are building this sort of scalable multi-tenancy network, he said. But not it's not OpenFlow. Cisco UCS, Juniper fabric. 10000 ports. Software programmable.

"Don't get me wrong; I'm not here to knock OpenFlow. Juniper does support OpenFlow." Just don't expect OpenFlow to be the whole solution. I gather Juniper has filled in all the gaps implicit in OpenFlow use cases.

He threw out "... close to the lambda ..." as a goal the audience would be familiar with. This audience member was not.

Lambda switching uses small amounts of fiber-optic cable and differing light wavelengths (called lambdas) to transport many high-speed datastreams to their destinations -- Network World research center

His discussion of trust, interfaces, and economics reminded me of studying Miller's work on object capability security, the principle of least authority, and patterns of cooperation without vulnerability. More on that in another item. Meanwhile...

I felt more on solid ground when he started to discuss software architecture.

Way back, juniper decided to put a XML RPC server in every switch. This was adopted by the IETF as Netconf. Juniper has a rich SDK layered on top. This is how they rapidly ported OpenFlow to their devices. In answer to criticism that they don't have OpenFlow implemented in firmware, he compared it with developing a new platform from scratch, an argument with obvious appeal, to me. Who's going to implement "legacy" protocols like PPOE, baked into various purchasing specs?

While working toward Junosphere, the software team saw that they couldn't wait until the hardware was finished; they virtualized the whole thing. The result is now used by major telcom providers (Comcast, Telecom Italia) as a test lab.

At the other extreme, he explained how their architecture scales down to multi-tenancy embedded applications to meet military needs.

He mentioned in passing that their architecture includes a JBoss application server. I have a bit of a bad taste from using JBoss as the platform for HERON and hence i2b2, but I gather the version we use is ages out of date. So this nod encourages me to keep an open mind.

He mentioned a "single pane of glass" user interface with roles and permissions and templates. Again, I wondered to what extent this architecture employs the principle of least authority.

The Q&A that followed the talk quickly went over my head with "top of rack architectures" and such. But I did pick up a few more details about virtualization:

Medovich: Which are you using, Xen or KVM?

Audience member: KVM

Medovich: Good for you.

Medovich brought up SAN storage architectures and noted a trend... with aggregate bandwidth of racks is approaching a TB...

Medovich: Are you using a SAN or local storage?

Audience member: local storage

Medovich: That's the right answer.

He brought up the big Amazon outage and explained the causes in some detail. A big compute job could generate a bunch of data in one zone and then decommission the nodes. Then the customer wants to re-instantiate the 200 nodes, but that much compute is only available in another zone. So Amazon would have to migrate the data. It's like de-fragmenting a disk. And eventually, the aggregate bandwidth brought the whole thing down.

The next generation architecture will have to continuously de-fragment the data center.

p.s. A capsule subset of the slides he used is available:

01/25/2012 Winter 2012 ESCC/Internet2 Joint Techs Software Defined Networks - Juniper Networks