Enabling DHCP Security on Juniper switches using ELS

I've been helping a few co-workers prep for the Juniper JNCIS-ENT certification. One thing I noticed is that while there is plenty of older material to be found online, it's hard to find information on enabling Port Security features using the modern ELS syntax. The new JNCIS-ENT exam (JN0-347) uses ELS syntax, so I wrote down some information myself.

ELS

In case you missed it, ELS stands for Enhanced Layer 2 Software. Juniper introduced it in Junos 13.2 for EX- and QFX-switches to provide a more uniform syntax across the product line; it's now similar to layer-2 configuration on MX routers.

Check this page for an overview on what has changed. Most of it is fairly straightforward, but some features (like DHCP Snooping) require a little more explanation.

Layer-2 Security features

In case you're wondering: yes, you really need layer-2 security. Unless you take the necessary precautions, it's quite easy to perform a man-in-the-middle or denial of service attack in a switched layer-2 environment.

There is a whole bunch of layer-2 security features supported on Juniper switches for both IPv4 and IPv6, but for this post I'll limit myself to DHCP Snooping and its related features for IPv4.

DHCP Snooping

DHCP Snooping means inspecting DHCP packets traversing the switch, and then deciding whether to allow or drop that packet. There's a detailed description better than I can provide over at the Packet Pushers blog.

There are two main security benefits to inspecting DHCP traffic:

  1. You can check whether any DHCP offers are coming from a trusted DHCP-server; and
  2. You can create a table with valid MAC/IP bindings based on those DHCP offers.

The first point is pretty straightforward. The DHCP exchange consists of four steps:

DHCP-steps

A client should only sent Discover and Request packets; the DHCP server is the only one allowed to send Offers and Acks. Any Offer packets from ports where you would not expect them can be dropped. This means that your network won't go down when someone decides to bring their home router for some extra wireless coverage and plug in the LAN port (or do something more nefarious.

The second point is not particularly useful of itself, but once you have a list of valid MAC/IP bindings you can use that table to perform some additional security checks: IP Source Guard and Dynamic ARP Inspection.

bindings-table

IP Source Guard

It's pretty easy to change the source IP address for outgoing IP traffic. When you do this, any replies to your packets are sent to the fake IP instead of to your real IP. This technique is widely used in DDOS attacks. To combat this, the IETF has published guidelines for ISPs to implement filtering on the edge of their networks to restrict forged traffic: BCP 38. This eases the problem for the internet at large, but it's not applicable to layer-2 switched networks.

However, now that we have snooped the DHCP packets, we have a table connecting IP addresses to the switchport they belong to. IP Source Guard uses the DHCP Snooping table to drop any packets with source IP addresses that are spoofed.

Dynamic ARP Inspection

Dynamic ARP Inspection (DAI) can protect your network against ARP Spoofing. ARP Spoofing means answering ARP requests for IP addresses that you don't own. This would allow you to perform a Man-in-the-Middle attack by letting other machines on the LAN think you are the default gateway. This is actually surprisingly easy to do.

The DHCP Snooping table contains MAC addresses coupled to the IP address that are actually assigned to these hosts. By inspecting ARP packets and checking them against this table, the switch can drop any ARP replies that contain invalid IP/MAC bindings.

Configuration

The demonstrate the configuration of DHCP Security in ELS, we're going to use a very straightforward sample topology:

sample-topology-small-2

Enabling DHCP Snooping

Both IP Source Guard and Dynamic ARP Inspection require a DHCP Snooping table, so you have to configure DHCP Snooping first. Prior to Junos OS 17.1R1, you actually cannot enable DHCP-snooping itself. This is a change from non-ELS Junos, where it is possible. Instead DHCP Snooping is enabled automatically when you configure any of the following DHCP Security options:

  • Dynamic ARP inspection (DAI)
  • IP source guard
  • DHCP option 82
  • Static IP

From release 17.1R1 onward, you can enable DHCP Snooping on its own:

vlans {
    vlan-name {
        forwarding-options {
            dhcp-security;
        }
    }
}

Enabling IP Source Guard and Dynamic ARP Inspection

IP Source Guard and Dynamic ARP Inspection are enabled on a per-VLAN basis. This is pretty straightforward:

vlans {
    vlan-name {
        forwarding-options {
            dhcp-security {
                arp-inspection;
                ip-source-guard;
            }
        }
    }
}

Keep in mind that this will also automatically enable DHCP Snooping.

Trusted Ports

At this point, we still need to exempt our DHCP server from DHCP Snooping, otherwise all DHCP Offers will be dropped by the switch. We do this by configuring the port as "trusted". Trunk ports are automatically trusted in Junos, but our server is connected to an access port so it requires some more configuration:

vlans {
    vlan-name {
        forwarding-options {
            dhcp-security {
                group DHCP-server {
                    overrides {
                        trusted;
                    }
                    interface ge-0/0/0.0;
                }
            }
        }
    }
}

Static Bindings

In our sample topology, we have one client with a static IP address. This client will not send DHCP requests, so its IP and MAC addresses will not be entered in the DHCP Snooping table. As a result, IP Source Guard and Dynamic ARP Inspection will drop all traffic from this client.

We can fix this by setting the interface to "trusted" like we did with the DHCP server, but in general we do not trust our clients. The solution is to set a static MAC/IP binding:

vlans {
    vlan-name {
        forwarding-options {
            dhcp-security {
                group static-client {
                    interface ge-0/0/2.0 {
                        static-ip 192.0.2.1 mac 00:53:00:00:12:34;
                    }
                }
            }
        }
    }
}

Combining all these, we have a basic DHCP Snooping configuration for our sample topology:

vlans {
    vlan-name {
        forwarding-options {
            dhcp-security {
                arp-inspection;
                ip-source-guard;
                group DHCP-server {
                    overrides {
                        trusted;
                    }
                    interface ge-0/0/0.0;
                }
                group static-client {
                    interface ge-0/0/2.0 {
                        static-ip 192.0.2.1 mac 00:53:00:00:12:34;
                    }
                }
            }
        }
    }
}

If you want to know more about these and all the other options you can enable, check the Juniper documentation on dhcp-security.

An oddly specific post about group_fwd_mask

When I started writing this blog, I didn't intend to write about low-level technical details, because frankly that is not what I'm concerned with in my day job anymore. However, I like to tinker with network labs in my spare time, and I ran into problems getting ethernet multicast protocols like LLDP and LACP to work using Linux bridges. It took me a while to google all the answers, so I thought it might be worthwhile to jot down my notes for whenever I run into this again.

IEEE 802.1D MAC Bridge Filtered MAC Group Addresses

The protocols I'm talking about are using MAC addresses in the range 01-80-C2-00-00-0x, a range that is defined in IEEE standard 802.1D as "MAC Bridge Filtered MAC Group Addresses". This is a range set aside by the IEEE for standard protocols that use link local multicast to communicate with a neighboring device. Frames using these MAC addresses are supposed to be explicitly link local; quoting the IEEE Standard Group MAC Addresses tutorial:

IEEE 802.1D MAC Bridge Filtered MAC Group Addresses: 01-80-C2-00-00-00 to 01-80-C2-00-00-0F; MAC frames that have a destination MAC address within this range are not relayed by MAC bridges conforming to IEEE 802.1D.

IEEE maintains a list of protocols for which these addresses are reserved[1], but it's not very readable so I've compiled my own list here:

MAC address Protocol
01-80-C2-00-00-00 Spanning Tree (STP/RSPT/MSTP)
01-80-C2-00-00-01 Ethernet Flow Control (pause frames)
01-80-C2-00-00-02 Link Aggregation Control Protocol (LACP)
01-80-C2-00-00-03 802.1X Port-Based Network Access Control
01-80-C2-00-00-08 Provider Bridge protocols (STP)[2]
01-80-C2-00-00-0D Provider Bridge protocols (MVRP)
01-80-C2-00-00-0E 802.1AB Link Layer Discovery Protocol (LLDP)[3]

That's nice, so why do I care?

Protocols using these MAC addresses are strictly link local, and any bridge[4] compliant with IEEE 802.1D must filter these frames: either process or drop them. In no circumstance would it be permitted to forward them along to other receivers. This made sense in the early 2000s, when bridges were chunks of iron where you would plug in other chunks of iron. In the age of virtualization that's not quite the case anymore.

In my case I'm using KVM-based host to simulate multiple virtual network devices. I'm interconnecting these devices using the Linux Bridge, and of course I would like to have the bridge behave as much as possible like a wire directly connecting my virtual devices. Unfortunately, the Linux Bridge is neatly written to conform to IEEE 802.1D, and it's filtering all Ethernet frames using destination addresses in the range 01-80-C2-00-00-0x. This means I cannot use my virtual environment to test any of the protocols in the table above. Sad!

Fixing things

Now I do want to test LLDP, LACP, STP, 802.1X whenever I'm building network labs, so I have to figure out a workaround. Obviously I'm not the first one to encounter this, so there are solutions available. Since Linux kernel 2.6 there is a setting that allows you to control which link local frames from the range defined in IEEE 802.1D the bridge should forward, by setting a specific bitmask in /sys/class/net/bridge-iface/bridge/group_fwd_mask. The default value 0 means the Linux bridge does not forward any link local frame. Settings this value to 16384 for example would allow the bridge to forward LLDP frames (01-80-C2-00-00-0E):

echo 16384 > /sys/class/net/br0/bridge/group_fwd_mask

A slight catch is that in the default kernel distribution, bitmask values for the first three MAC addresses (-00, -01 and -02) are restricted, meaning we still cannot use this trick to enable STP and LACP protocols in our labs[5]. To remove this restriction, you need to patch and compile your kernel yourself, like the folks at EVE-NG do. I am lazy, so I just grab the compiled kernel from the EVE-NG repository. Now we can set the group_fwd_mask to any value we like.

Bitmasks for group_fwd_mask?

So what value should we use for this bitmask? The bitmask is a 16-bit number, where the first (least significant) bit represents MAC address 01-80-C2-00-00-00 and the 16th bit (most significant) represents 01-80-C2-00-00-0F. The default value (all bits are 0) does not forward any link local frame. To enable forwarding of frames for a specific MAC address, we need to set the corresponding bit to 1. For example, to allow forwarding of LLDP frames (01-80-C2-00-00-0E) we would need to set the 15th bit 1, and leave the rest at 0:

MAC 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00
BIT 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This means we use the binary number 0100 0000 0000 0000 as bitmask, which translates to decimal number 16384, just like we used in the example earlier.

If we would like to add LACP (01-80-C2-00-00-02) and 802.1X (01-80-C2-00-00-03) to the mix, we would also set the 3rd and 4th bit to 1:

MAC 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00
BIT 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0

This results in the binary number 0100 0000 0000 1100, which is 16396 in decimal. Using this method, you now have full control over which type of frame to forward or filter.

That's great, but I'm lazy. For my lab I just want to allow as much as possible so I can test whatever protocol I want. So I just set all bit to 1:

  • for my patched kernel I can use all bits: 1111 1111 1111 1111 = 65535
  • for the unpatched kernel I cannot use the first three bits: 1111 1111 1111 1000 = 65528

In other words: to enable out-of-the-box Linux bridges to forward all IEEE 802.1D MAC Bridge Filtered MAC Group Addresses except the restricted three types, execute this command:

echo 65528 > /sys/class/net/br0/bridge/group_fwd_mask

Final note: it might not be the smartest thing to do in a production environment, especially with a patched kernel flooding STP BPDUs. You're probably best off using this for (virtual) network labs only. Enjoy at your own risk!


  1. About halve of the addresses are still "Reserved for future standardization", so you can still grab one that new layer-2 multicast protocol you're working on. You can apply for an assignment directly through the IEEE website - let me know how that works out. ↩︎

  2. Provider Bridge protocols are identical to regular layer-2 protocols, except that they use a different MAC group address. This makes it possible to differentiate between a customer network's layer-2 protocols when tunneling them through the provider network (using the likes of QinQ and L2PT). To me as a modern enterprise engineer having multitenant nested spanning trees sounds as somebody's worst nightmare, but apparently it made sense two decades ago. ↩︎

  3. Ever wondered why you would find CDP information all over the network when using non-Cisco or unmanaged switches, but never had this problem with LLDP? That's because CDP (01-00-0C-CC-CC-CC) uses a generic multicast MAC address, which if flooded by non-Cisco switches. LLDP (01-80-C2-00-00-0E) uses a Bridge Filtered group MAC address that is dropped (filtered) by any switch worth its salt. ↩︎

  4. Ancient word for "switch". ↩︎

  5. Advise for people running Juniper equipment (for example using Wistar: you can change STP to use 01-80-C2-00-00-08 instead of the restricted group address 01-80-C2-00-00-00 by setting the protocol to use the provider-bridge-group. ↩︎

Playing around with Wistar

As of last week I have a new study project: the brand-new Juniper certification track for JNCIS-DevOps; finally a vendor-cert that lets me sink my teeth in anything other than a CLI. This means building a couple of new labs, so it seemed like a good time to dive into Wistar again.

About Wistar

What is Wistar? The Wistar documentation has a nice definition:

"Wistar is a tool to help create and share network topologies of multiple virtual machines. It uses a drag and drop interface to make connecting instances together into large networks super easy. These topologies can then be saved, shared, cloned, or destroyed at will. The idea is to make experimenting with virtual networks and virtual machines as frictionless as possible."

In other words, it's a network lab tool, much like GNS3 and EVE-NG, but geared towards Juniper equipment. This means that it has excellent support for Juniper routers, switches and firewalls, but it's a bit harder to run other types of (virtual) devices.

sample topology

For me, the main advantages of Wistar are:

  • Wistar abstracts the complexity of running separate VMs for routing engine (RE) and forwarding engine (PFE) roles, that are required for vMX and vQFX devices.
  • Wistar handles initial configuration such as hostnames, users, etc.
  • Wistar has built-in functionality to execute operational commands and apply configuration snippets across all devices

Junos CLI Automation

Installing Wistar

Sounds good, right? Unfortunately, Wistar does not provide a nice prepackaged OVA to get started, and instructions to build a Wistar-server yourself are quite extensive. Luckily for us, Ansible is there to help us out: Wistar includes a sample playbook that includes all the required steps. This boils the installation procedure down to the following steps:

  1. Install a fresh new Ubuntu 16.04 server, for example using this ISO image.

  2. Install ansible using pip:
    sudo pip install ansible

  3. Grab a copy of the ansible playbook and the apache configuration file from the Wistar repository.

  4. Run the ansible playbook
    sudo ansible-playbook install_wistar_ubuntu_16_pb.yml

  5. If necessary, reboot the server.

You're ready to run, the Wistar interface is available on port 8080

Get the right images

Having Wistar running is nice, but like GNS3 and EVE-NG you'll still need to add actual images for your virtual equipment. Juniper equipment is available for download, as long as you have a Juniper.net SSO account (create one here).

For a complete set of devices, you need images for a switch, a router, a firewall and a server. Wistar is a bit peculiar about which versions perform well (or boot at all), so here is my list of working images.

vQFX

For a switch use the vQFX which is available as evaluation from the Juniper site. Use 15.1X53-D60 for the routing instance (RE), and 20160609-2 as the PFE.

vMX

For a router, use the X86 version of vMX-15.1F4.15. This is the last version that includes the "simulated" PFE (known as "riot" in Wistar). This makes it by far the most lightweight version of vMX around. Just extract jinstall64-vmx-15.1F4.15-domestic.img from the .tgz file and upload as Junos vMX RE, and Wistar will automatically add a Riot PFE based on the same image. When you select the correct images while adding a VM, it will default to use 1 vCPU and 512MB for both the RE and PFE:

This is how it looks when adding a vMX with RIOT PFE

vSRX

As firewall I'm using vSRX 17.3R1.10, but it takes a really long time to boot. Many people have had the best results with vSRX 15.1X49-D60, but that is optimized for Linux kernel 4.4, and since I'll be messing with the kernel later on in this post, I'll stick with 17.3.

Ubuntu

And finally I'm using Ubuntu server as lightweight client. Be sure to grab the cloud-img version.

Enhancing Wistar for LACP and LLDP

By default, the bridges Wistar uses to connect devices are unable to pass LACP, STP and LLDP frames. That's not a very big deal, I can handle most of my labs without needing either; but it would be nice to be able to enable these features.

For LLDP this is actually easy. All you need to do is set the correct bit on the forwarding mask for each bridge after it's created:

echo 16384 > /sys/class/net/t1_br1/bridge/group_fwd_mask

It's a bit harder for to enable LACP and STP frames, since the Linux kernel restricts the group_fwd_mask you're allowed to set. These types of frames are definitely not meant to be forwarded by a bridge! You need to patch your kernel before compiling to circumvent this restriction.

That sounds like a lot of work, so I simply grab a pre-patched kernel from EVE-NG. This kernel is tuned for network labbing, and gives me some added enhancements like UKMS for free. We're installing this kernel by appending some tasks to the ansible playbook:

  - name: Get the EVE-NG repo key
    apt_key:
      url: http://www.eve-ng.net/repo/[email protected]
      state: present
 
  - name: Add EVE-NG repository
    apt_repository:
      repo: deb [arch=amd64]  http://www.eve-ng.net/repo xenial main
      state: present

  - name: Install the EVE-NG kernel
    apt:
      name: linux-image-4.9.40-eve-ng-ukms-2+
      state: present
      update_cache: yes

Rebooting will start with the new kernel, so now we can add a bitmask to our bridges that will pass all types of frames:

echo 65535 > /sys/class/net/t1_br1/bridge/group_fwd_mask

Where to go next

Does this whet your appetite? Here is what you need to do next:

Try for yourself, and have fun labbing!

Insights from the NetDevOps Fall 2016 Survey

For my bachelor's thesis I've been doing research into DevOps and how it would apply to our network operations team. We were just starting on our journey to automate everything, and some of the questions I had were:

  • What tasks do network operators generally automate first?
  • What tools do most people use to automate their networks?

Lucky for me, right at the time I needed this information, the Network-to-Code community was running a NetDevOps Survey. The final report is not finished yet, but the raw data is available so I could use that to try to help my research. Pending that report, I thought it might be worthwhile to share my conclusions.

What is the NetDevOps Survey?

The NetDevOps Fall 2016 Survey is the first edition of what will probably be a regular survey amongst network engineers about network automation and DevOps. It's spearheaded by
Damian Garros and run mainly from #netdevops-survey on the NetworktoCode Slack channel.

The survey had 307 responses from network engineers around the globe, running all kinds of networks ranging from small to very large. Since it comes from a community of engineers interested in networking as code, there is certainly a self-selection bias towards people already running all kinds of automation tools. That's fine for my purpose since these are the people I want to learn from, but it's probably not representative of what Average Joe Networker is currently running in production.

Survey Responses

So, what to automate first?

My assumption is that tasks that most network operations teams are automating, are either easy to automate or have great impact when you do. Either way, that's probably the low-hanging fruit and a good place to start.

automated tasks

It seems like there's a strong preference for automating configuration-related tasks (configuration management, changes and new deployments). This makes sense, since as network engineers we're very much used to dealing with a device through its configuration file. Reporting is also often automated, probably because it's generally boring work that's always the same. Other tasks are much less commonly automated.

What programming language to learn?

A common question is whether us network engineers should learn to be a coder, and if so, what language to learn.

programming languages

By far the most common language is Python, followed at some distance by shell scripting (Bash, PowerShell, etc.). The other languages are far less commonly used in automating networks. This isn't very surprising, influential people in the networking community have been telling us to learn Python for years.

What tools should we look at?

Once we've mastered Python, we can write all the tooling we need ourselves, but it's probably more useful to check what tools other people are using in automating their network.

automation tools

So everybody is using or at least looking at Git as a central repository. Makes sense, with the advent of free services like GitHub and GitLab it has pretty much become an industry standard. Ansible is also quite common amongst network engineers, leaving competing products like Puppet, Chef and SaltStack in the dust.

Since most organizations seem to start with automating configuration management, it might be worthwhile to zoom in to the tools they use specifically for those tasks.

tooling for config management

Ansible is once more the most popular tool, but the majority of people use custom-built tooling. Apparently learning Python is still the smart move. What's a bit surprising to me is that only 17% of respondents use vendor-specific tools for configuration management; nearly all vendors push their own proprietary management platform when selling kit, but clearly those tools do not fit the automation needs of network operators.

Conclusion

My conclusion is that we ought to look at a toolchain including Git and Ansible, stitched together using Python scripting. Obviously, there is a lot more insight to be found in the dataset, and you should certainly look for the finished report.

If you want to know more about all these tools and how to automate networks, check out Network Programmability and Automation by Matt Oswalt, Jason Edelman, and Scott Lowe for vendor independent advise, and Automating Junos Administration or Programming and Automating Cisco Networks for excellent vendor specific information. And if you too would like to start learning Python, I can heartily recommend Kirk Byers' free course Python for Network Engineers. These are the resources that were most useful to me during my studies.

Running ESXi nodes in EVE-NG

If you're like me and are curious enough to check EVE-NG under the hood, you might have noticed that there is a template for VMware ESXi, but it's not listed as supported and is in fact hidden by default. However, if you're willing to tinker a bit, it is possible to run ESXi nodes in EVE-NG. So let's give it a shot!

Getting ESXi images

First stop is acquiring some images for ESXi. Of course it's possible to create your own QEMU image by installing from an official ESXi installation CD, but that's a lot of work. It's much easier to start with the Nested ESXi images by William Lam, who has done most of the work for us. You can grab images for ESXi5.5, ESXi6.0 and ESXi6.5 straight from vmware.com.

All these images are .ova files with three disks:

  • A 2GB system disk (disk1)
  • A 4GB data disk (disk2)
  • A 8GB data disk (disk3)

Nested ESXi OVA contents

To get an image that's compatible with EVE-NG, you need to take the following steps:

1. Upload the disk files to EVE-NG

Open the OVA with your favorite archiving program, extract disk 1 and 2, and copy them to your EVE-NG virtual machine. I'll assume you've managed to copy disk1 and disk2 from the ESXi6.0 OVA to /tmp.

2. Convert the disks

QEMU doesn't handle vmdk files, so you need to convert them to qcow2 format.

cd /tmp
qemu-img convert -f vmdk -O qcow2 Nested_ESXi6.x_Appliance_Template_v5-disk1.vmdk hda.qcow2
qemu-img convert -f vmdk -O qcow2 Nested_ESXi6.x_Appliance_Template_v5-disk2.vmdk hdb.qcow2

3. Expand disk 2

As mentioned, disk 2 is all of 4GB large. That's fine if you're going to use some external storage, but it's probably smart to enlarge it a bit, it'll be sparse anyway.

qemu-img resize hdb.qcow2 50G

4. Move the files

Move the files to their final destination. The folder should be named /opt/unetlab/addons/qemu/esxi-(version) for EVE-NG to be able to find them:

mv  /tmp/*.qcow2 /opt/unetlab/addons/qemu/esxi-6.0u2

5. Clean and fix permissions

rm /tmp/*.vmdk
/opt/unetlab/wrappers/unl_wrapper -a fixpermissions

Repeat as desired for ESXi5.5 and ESXi6.5. For now I prefer to use the 6.0u2 image, it seems to be a bit more stable and the embedded Web Client works fine.

Enabling ESXi in EVE-NG

So now we have QEMU images for ESXi, but we're still a ways from being able to run them. We need to ensure that we're able to run nested hypervisors, and enable the ESXi template in EVE-NG.

Nested Virtualization

Nested virtualization should be enabled in your EVE-NG image by default, but it's easy to verify. You can check whether VT-x and EPT are enabled in your system:

cat /sys/module/kvm_intel/parameters/nested
cat /sys/module/kvm/parameters/ignore_msrs
cat /sys/module/kvm_intel/parameters/ept

Each of these commands should return 'Y'. If they don't, set them to '1' manually, and reboot EVE-NG:

echo 'options kvm_intel nested=1' >>  /etc/modprobe.d/kvm-intel.conf
echo 'options kvm ignore_msrs=1' >>  /etc/modprobe.d/kvm-intel.conf

ESXi template

The template for ESXi is already included in EVE-NG, you can find it under /opt/unetlab/html/templates/esxi.php. You need to make a tiny change to the template to get it to work. If you don't , you'll get a purple screen complaining about an "unsupported CPU".

Original:

$p['qemu_options'] = '-machine pc,accel=kvm -serial none -nographic -nodefconfig -nodefaults -display none -vga std -rtc base=utc';

Working:

$p['qemu_options'] = '-machine pc,accel=kvm -cpu host -serial none -nographic -nodefconfig -nodefaults -display none -vga std -rtc base=utc';

Activate the template

As a final step, you need to activate the template. There used to be a (commented-out) line for the ESXi template in /opt/unetlab/html/includes/init.php, but it's gone in the latest version of EVE-NG. No worries, you can just add it yourself:

Init.php with esxi_template added

Now select 'VMware ESXi' as node type:

ESXi in dropdown menu

Running ESXi nodes

Finally, we're ready to run our ESXi server. Add it to your topology, and don't reduce the CPUand memory settings; ESXi requires at least 2 CPU's and 4 GB of RAM to be able to boot at all. I usually connect the first NIC (e0) to a bridged network, so I'll be able to access the ESXi-server through the Web Client from the client for management access.

Once you've started your ESXi server, you can configure the network through VNC, and then access the Web Client from your device. Default credentials for the image I'm using are root without a password. Your first step should be to add a datastore; check the VMware documentation for detailed steps. You'll have one local disk available to create the datastore, that'll be cloned from hdb.qcow2 that we created from disk 2 from the OVA file.

Now that we have a datastore, we can start to add VMs. With my setup I'm now three hypervisors deep, so I'm a bit short on CPU to run actual workloads; running a vCenter Server Appliance at this point is an exercise in wishful thinking. To validate my setup I'll be using the smallest VM I can find that's still a fully functional machine with VMware tools installed: yVM. Grab the OVA here, and deploy using the webinterface.

Unfortunately, there is still another problem to solve. The VM won't start, and ESXi throws an error:

Failed to power on virtual machine yVM. You are running VMware ESX through an incompatible hypervisor. You cannot power on a virtual machine until this hypervisor is disabled.

Luckily, this too can be solved: add vmx.allowNested = TRUE to your VM's configuration (in the Web Client under Edit Settings > VM Options > Advanced > Edit Configuration > Add Parameter.

vmx.allowNested parameter

You'll have to do this for all your VMs, or add it to /etc/vmware/config from the ESXi console:

[[email protected]:~] echo 'vmx.allowNested = "true"' >>  /etc/vmware/config

Now you can start your VM, and lab away!

yVM connection verified