Insights from the NetDevOps Fall 2016 Survey

For my bachelor's thesis I've been doing research into DevOps and how it would apply to our network operations team. We were just starting on our journey to automate everything, and some of the questions I had were:

  • What tasks do network operators generally automate first?
  • What tools do most people use to automate their networks?

Lucky for me, right at the time I needed this information, the Network-to-Code community was running a NetDevOps Survey. The final report is not finished yet, but the raw data is available so I could use that to try to help my research. Pending that report, I thought it might be worthwhile to share my conclusions.

What is the NetDevOps Survey?

The NetDevOps Fall 2016 Survey is the first edition of what will probably be a regular survey amongst network engineers about network automation and DevOps. It's spearheaded by
Damian Garros and run mainly from #netdevops-survey on the NetworktoCode Slack channel.

The survey had 307 responses from network engineers around the globe, running all kinds of networks ranging from small to very large. Since it comes from a community of engineers interested in networking as code, there is certainly a self-selection bias towards people already running all kinds of automation tools. That's fine for my purpose since these are the people I want to learn from, but it's probably not representative of what Average Joe Networker is currently running in production.

Survey Responses

So, what to automate first?

My assumption is that tasks that most network operations teams are automating, are either easy to automate or have great impact when you do. Either way, that's probably the low-hanging fruit and a good place to start.

automated tasks

It seems like there's a strong preference for automating configuration-related tasks (configuration management, changes and new deployments). This makes sense, since as network engineers we're very much used to dealing with a device through its configuration file. Reporting is also often automated, probably because it's generally boring work that's always the same. Other tasks are much less commonly automated.

What programming language to learn?

A common question is whether us network engineers should learn to be a coder, and if so, what language to learn.

programming languages

By far the most common language is Python, followed at some distance by shell scripting (Bash, PowerShell, etc.). The other languages are far less commonly used in automating networks. This isn't very surprising, influential people in the networking community have been telling us to learn Python for years.

What tools should we look at?

Once we've mastered Python, we can write all the tooling we need ourselves, but it's probably more useful to check what tools other people are using in automating their network.

automation tools

So everybody is using or at least looking at Git as a central repository. Makes sense, with the advent of free services like GitHub and GitLab it has pretty much become an industry standard. Ansible is also quite common amongst network engineers, leaving competing products like Puppet, Chef and SaltStack in the dust.

Since most organizations seem to start with automating configuration management, it might be worthwhile to zoom in to the tools they use specifically for those tasks.

tooling for config management

Ansible is once more the most popular tool, but the majority of people use custom-built tooling. Apparently learning Python is still the smart move. What's a bit surprising to me is that only 17% of respondents use vendor-specific tools for configuration management; nearly all vendors push their own proprietary management platform when selling kit, but clearly those tools do not fit the automation needs of network operators.

Conclusion

My conclusion is that we ought to look at a toolchain including Git and Ansible, stitched together using Python scripting. Obviously, there is a lot more insight to be found in the dataset, and you should certainly look for the finished report.

If you want to know more about all these tools and how to automate networks, check out Network Programmability and Automation by Matt Oswalt, Jason Edelman, and Scott Lowe for vendor independent advise, and Automating Junos Administration or Programming and Automating Cisco Networks for excellent vendor specific information. And if you too would like to start learning Python, I can heartily recommend Kirk Byers' free course Python for Network Engineers. These are the resources that were most useful to me during my studies.

Running ESXi nodes in EVE-NG

If you're like me and are curious enough to check EVE-NG under the hood, you might have noticed that there is a template for VMware ESXi, but it's not listed as supported and is in fact hidden by default. However, if you're willing to tinker a bit, it is possible to run ESXi nodes in EVE-NG. So let's give it a shot!

Getting ESXi images

First stop is acquiring some images for ESXi. Of course it's possible to create your own QEMU image by installing from an official ESXi installation CD, but that's a lot of work. It's much easier to start with the Nested ESXi images by William Lam, who has done most of the work for us. You can grab images for ESXi5.5, ESXi6.0 and ESXi6.5 straight from vmware.com.

All these images are .ova files with three disks:

  • A 2GB system disk (disk1)
  • A 4GB data disk (disk2)
  • A 8GB data disk (disk3)

Nested ESXi OVA contents

To get an image that's compatible with EVE-NG, you need to take the following steps:

1. Upload the disk files to EVE-NG

Open the OVA with your favorite archiving program, extract disk 1 and 2, and copy them to your EVE-NG virtual machine. I'll assume you've managed to copy disk1 and disk2 from the ESXi6.0 OVA to /tmp.

2. Convert the disks

QEMU doesn't handle vmdk files, so you need to convert them to qcow2 format.

cd /tmp
qemu-img convert -f vmdk -O qcow2 Nested_ESXi6.x_Appliance_Template_v5-disk1.vmdk hda.qcow2
qemu-img convert -f vmdk -O qcow2 Nested_ESXi6.x_Appliance_Template_v5-disk2.vmdk hdb.qcow2

3. Expand disk 2

As mentioned, disk 2 is all of 4GB large. That's fine if you're going to use some external storage, but it's probably smart to enlarge it a bit, it'll be sparse anyway.

qemu-img resize hdb.qcow2 50G

4. Move the files

Move the files to their final destination. The folder should be named /opt/unetlab/addons/qemu/esxi-(version) for EVE-NG to be able to find them:

mv  /tmp/*.qcow2 /opt/unetlab/addons/qemu/esxi-6.0u2

5. Clean and fix permissions

rm /tmp/*.vmdk
/opt/unetlab/wrappers/unl_wrapper -a fixpermissions

Repeat as desired for ESXi5.5 and ESXi6.5. For now I prefer to use the 6.0u2 image, it seems to be a bit more stable and the embedded Web Client works fine.

Enabling ESXi in EVE-NG

So now we have QEMU images for ESXi, but we're still a ways from being able to run them. We need to ensure that we're able to run nested hypervisors, and enable the ESXi template in EVE-NG.

Nested Virtualization

Nested virtualization should be enabled in your EVE-NG image by default, but it's easy to verify. You can check whether VT-x and EPT are enabled in your system:

cat /sys/module/kvm_intel/parameters/nested
cat /sys/module/kvm/parameters/ignore_msrs
cat /sys/module/kvm_intel/parameters/ept

Each of these commands should return 'Y'. If they don't, set them to '1' manually, and reboot EVE-NG:

echo 'options kvm_intel nested=1' >>  /etc/modprobe.d/kvm-intel.conf
echo 'options kvm ignore_msrs=1' >>  /etc/modprobe.d/kvm-intel.conf

ESXi template

The template for ESXi is already included in EVE-NG, you can find it under /opt/unetlab/html/templates/esxi.php. You need to make a tiny change to the template to get it to work. If you don't , you'll get a purple screen complaining about an "unsupported CPU".

Original:

$p['qemu_options'] = '-machine pc,accel=kvm -serial none -nographic -nodefconfig -nodefaults -display none -vga std -rtc base=utc';

Working:

$p['qemu_options'] = '-machine pc,accel=kvm -cpu host -serial none -nographic -nodefconfig -nodefaults -display none -vga std -rtc base=utc';

Activate the template

As a final step, you need to activate the template. There used to be a (commented-out) line for the ESXi template in /opt/unetlab/html/includes/init.php, but it's gone in the latest version of EVE-NG. No worries, you can just add it yourself:

Init.php with esxi_template added

Now select 'VMware ESXi' as node type:

ESXi in dropdown menu

Running ESXi nodes

Finally, we're ready to run our ESXi server. Add it to your topology, and don't reduce the CPUand memory settings; ESXi requires at least 2 CPU's and 4 GB of RAM to be able to boot at all. I usually connect the first NIC (e0) to a bridged network, so I'll be able to access the ESXi-server through the Web Client from the client for management access.

Once you've started your ESXi server, you can configure the network through VNC, and then access the Web Client from your device. Default credentials for the image I'm using are root without a password. Your first step should be to add a datastore; check the VMware documentation for detailed steps. You'll have one local disk available to create the datastore, that'll be cloned from hdb.qcow2 that we created from disk 2 from the OVA file.

Now that we have a datastore, we can start to add VMs. With my setup I'm now three hypervisors deep, so I'm a bit short on CPU to run actual workloads; running a vCenter Server Appliance at this point is an exercise in wishful thinking. To validate my setup I'll be using the smallest VM I can find that's still a fully functional machine with VMware tools installed: yVM. Grab the OVA here, and deploy using the webinterface.

Unfortunately, there is still another problem to solve. The VM won't start, and ESXi throws an error:

Failed to power on virtual machine yVM. You are running VMware ESX through an incompatible hypervisor. You cannot power on a virtual machine until this hypervisor is disabled.

Luckily, this too can be solved: add vmx.allowNested = TRUE to your VM's configuration (in the Web Client under Edit Settings > VM Options > Advanced > Edit Configuration > Add Parameter.

vmx.allowNested parameter

You'll have to do this for all your VMs, or add it to /etc/vmware/config from the ESXi console:

[[email protected]:~] echo 'vmx.allowNested = "true"' >>  /etc/vmware/config

Now you can start your VM, and lab away!

yVM connection verified

EVE-NG preview released

On January 1st the UnetLab team has released the first preview of EVE-NG. EVE (or formally: "Emulated Virtual Environment") is the successor to UnetLab. For those of you that don't know what UnetLab is: it's a piece of software that you can use to emulate network equipment, much like GNS3. You can use it to spin up and connect a couple of switches and routers, in order to validate designs, test changes, or prepare for certification exams.

The great thing about UnetLab (and thus also about EVE-NG) is that everything is contained within a single VM, and you use a web-interface to create and manage your labs. Just throw a couple of images at it (EVE-NG supports a whole lot of different vendors), and start labbing. Take a look at this video to see UnetLab in action.

What's new

So what's the big deal about EVE then? There's a list of new features, including the obligatory bug fixes and an upgrade to Ubuntu 16.04 as base platform. To me, the most interesting new features are:

  • New HTML5 UI, including browser-based implementations of telnet, vnc, and rdp to connect to your devices without requiring opening more TCP-ports or installing stuff on your client. This will make it much easier to run EVE on a server somewhere and provide remote labs for co-workers or students. Unfortunately the new UI is not yet free of bugs in this (preview) release, and still requires client software on the client for wireshark integration anyway. So for now I'm still using the legacy UI for most of my work.
  • Stopped nodes are now a different color than running nodes (grey-ish instead of blue-ish), so it's a lot easier to quickly see which nodes are running and which aren't. Okay, so it's just a small cosmetic change, but I still like it!
  • UKSM is implemented and enabled by default, greatly reducing memory requirements compared to UnetLab.
  • More image types are supported. In addition to this, you can now search and filter in the list when adding a new node, which is a good thing because it's a long list and it could be quite a hassle to find the image you want in UnetLab.

There are plenty of other improvements in EVE compared to UNL, but so far the list above are my personal favorites.

UKSM is awesome

I mentioned UKSM, and that it reduces memory demand for EVE. So how does it work and why is it interesting? UKSM (Ultra Kernel Samepage Merging) is an improvement over the Linux kernel module KSM. It automatically scans all processes in memory and applies deduplication techniques to eliminate the waste of having the same information in memory multiple times.

With EVE, the benefit of memory deduplication could be huge. If you're running a lab with ten routers, you're essentially running the same virtual machine ten times, and the memory pages these VMs are using will probably not be very much different from each other. Deduplication will mean you can run even more routers in your lab, without your EVE-machine running out of memory.

This sounds too good to be true, so of course I wanted to test this. Luckily it's pretty easy to turn UKSM on and off, by setting /sys/kernel/mm/uksm/run to 0 (disabled) or 1 (enabled). I took the biggest lab I had, which is my JNCIP-ENT preparation lab based on the topology from the book Junos Enterprise Routing:

Junos Enterprise Routing topology

The topology contains:

  • 11 Juniper vMX routers, with 1GB RAM each.
  • 2 Cisco routers (IOL), with 256 MB RAM.
  • 3 Route Injectors, also 256MB RAM each.
  • 1 Traffic generator, with 512 MB RAM.

In total this is almost 13 GB memory, which is really pushing my EVE machine with 12 GB RAM to the limit. After the topology is completely started, it's using all available memory:

Routing lab without UKSM

At this point, I enabled UKSM:

echo 1 > /sys/kernel/mm/uksm/run

Immediately the UKSM process (uksmd) started grabbing as much CPU as it could get its hands on, and memory use began dropping. After a while it settled on just 41% memory use, and uksmd dropped to using under 1% CPU for running in the background.

Routing lab with UKSM

In my experience, lab size is mostly limited by the amount of memory I have available. I used to try to reduce the amount of RAM to the minimum needed for the device to boot at all, usually way beyond its published minimum requirements. With UKSM I can now run much larger and more stable labs on the same hardware that I was using for UNL. So yes, it sounds too good to be true but in this case it actually is.

Try for yourself

So you want to try for yourself? There's a great post on getting started with UnetLab on Lab Time, and the procedure for EVE is pretty much the same.

  • Grab the ISO here.
  • Update to the latest EVE (apt-get update && apt-get upgrade)
  • Import images the same way as with UNL.
  • Start your first lab
  • Check the official UNL/EVE YouTube channel for tips and howto videos.

Happy labbing!

Introduction

I've decided to restart writing a technical blog. My previous blog, mainly Microsoft Lync and telephony related, was pretty much short lived and has now disappeared from the internet. However, writing a blog is an excellent way to hone my technical writing skills and perhaps share some valuable insights with whomever might be reading along; so now that I have some time again to delve into new technology and write stuff down, it seemed like a good time to start over.

About me

I am Robin Gilijamse. I'm a network architect living and working in the Netherlands. When I was still young I studied to become a rocket scientist, but when I failed miserably at that I settled for the next best thing: a job in IT. I have worked my way up from the helpdesk to where I am now: designing networks for customers and coordinating research into new technology at [OGD ict-diensten] (http://www.ogd.nl).

This blog

Interesting traffic is a term describes data traffic that would trigger specific behavior in a network device: dial up on an ISDN line or encrypt and sent through an IPsec tunnel. It also sounds like an apt description of this blog, since it's going to be about data networking (traffic) and will deal with matters that I find interesting. Yes, that last part is a bit subjective, but since I'm the one writing and you're the one reading you'll have to deal with it.

However, since this is my first entry, I'll leave you with this picture of the other kind of interesting traffic:

Taxis in Bangkok, Thailand, 2011