Juniper’s software play, why you’ll benefit even if you never buy it.

Juniper made a bold move this week, announcing software disaggregation on their switching products. They are fully committed to allow Junos to run on other vendor’s switches, and allow other vendor’s OS to run on their switches. This will start with the new QFX5200 and fully transition in the next 12 months. 

We’ve seen this coming for a while now, with merchant silicon, whitebox hardware, an some smaller players already taking smaller steps toward this separation. But Juniper is one of the big incumbents, and already has a big stake in the market. Their switching sales have been taking marketshare the past few quarters and have been one of the early bright spots in what appears to be a surge in enterprise sales for them.

Now, for myself, I am an enterprise customer. I already buy lots of Juniper gear, and I already understand the value of Junos to me. As the decision maker in my company, I don’t see myself buying any disaggregated hardware in the next 3-5 years. I’ll buy 1st party hardware and bundled software because that still makes sense for my organization. And yet I see tremendous value in this move, and I think customers like us will still benefit. 

The two takeaways for myself are:

Its the software stupid! And now I’m certain Juniper understands the value of Junos. Let’s be honest, most switches today are almost chip for chip the same. For the past 3 or 4 iterations we’re all buying the same Broadcom silicon, with different paint jobs on the tin. Sure there is some customization, but the core hardware features, speeds and density are all dictated by the chipsets and so there is little value add between the vendors on the hardware. So what make Juniper, Arista, Cisco et all different? Its the software. And Juniper has had one of the most powerful and flexible software stacks in the industry. They have best in class automation, open standards and interoperability, and a CLI that users fall in love with. And with disaggregation, their developers are free to focus on core features and iterate more quickly. I also believe this will lead to fewer versions of code paths for them to support. The abstraction layer they’ve built should allow the OS to be more similar across all platforms and features.


Abstractions are awesome! This disaggregation didn’t come out of thin air, it has been a 3 year effort that has already offered benefits to customers and Juniper. First a summary of the tech. Juniper has built a new Linux foundation that powers the hardware of their products. The traditional Junos now runs as a VM on that Linux OS. They’ve already shipped several successful products on that foundation. With that Linux foundation and abstraction we’ve seen some great new features. The vSRX and vMX have been possible, largely because of that abstraction. The QFX5100 can run In-Service upgrades because multiple Junos VMs can simulate dual routing engines. And customers can also deploy VMs directly on the QFX switches, that run alongside of Junos and have direct access to the hardware and stats. Examples of that would be HFT apps, wireshark, or ELK stacks, running right on the switch.  Juniper is also able to support new hardware platforms with only changes to the Linux stack, no code changes to Junos, no regression testing on their large code base. This means fewer bugs and faster support for the latest chipsets.


My take as a customer

This isn’t really a surprise move. Analysts and pundits have been talking about the rise of white box for several years now. We’ve seen smaller players take some steps in this direction, though none have gone this far. And though I don’t see myself purchasing gear this way, I love a.) the virtual editions of their products, b.) seeing new hardware come to market quickly, c.) the promise of fewer bugs and d.) Junipers commitment to Junos and the things that really make it special in the industry.

Posted
AuthorKelly McDowell

Recently I was asked to troubleshoot a performance problem. The developers had written some synthetic load tests and were seeing their tests come grinding to a halt a minute or two into the test.

The test involved sending a simple message and the server echoing the same in the response. Each request created a new TCP socket. 

 

     THE ENVIRONMENT

The physical setup was pretty basic, looking something like the following

 

|Server|---|TOR Switch|---|AGG Switch|---|FW|---|AGG Switch|---|TOR Switch|---|Server|

 

The firewall servers as the router for the subnets. All paths are 10gig or greater.

 

The servers are VMware hosts running multiple guests. Guests are all running CentOS linux 6.5.

 

    THE TROUBLESHOOTING

 

First things first, check some utilization metrics and the firewall logs. VMware hosts aren't busy and the guest is only pushing about 40mbit of traffic through the NICs, nothing seemed out of place there, certainly nothing that should be pushing against infrastructure limits.

 

Next check was for the firewall logs. We use Splunk in our environment and we feed all of our logs there. The Firewall logs gave me some concern right off the bat. In under a minute they had opened up about 80,000 sessions before the volume trailed off. 

 

Now my instinct here was to investigate that further. As you know, each TCP session has to bind to a unique outbound port. Each server has, at most, 65,535 outbound ports to bind to. Right away I became concerned about a port exhaustion issue. The question was, which device was running into limits? Were the sessions closing properly, was the Linux guest recycling the ports correctly? Were the sessions being closed out of the firewall correctly?

 

Next step was to request a capture from both sides. This particular network doesn't have physical taps and we don't have SPAN setup in the dVswtich yet so we requested the server admins run a TCPDUMP on each guest so we could compare the results.

 

There is always a method for filtering your trace files to find something relevant. Too much filtering and you can miss the important parts, too little and the noise is too high to sift through. In this instance I was concerned about port exhaustion so I knew right away what level of filtering I wanted to apply.  I quickly found at the end of the trace that there were sessions trying to open, but not being able to (SYN packets being re-sent). That flow was using source port 45760 and all flows are using destination port 9100. I want to see every example of  source port 45760 to see how many times it was used and when so the Wireshark display filter was a simple "tcp.port==45760".

 

This is a screenshot from Wireshark after filtering down on that port. Column 2 is the delta time since the previous packet shown.

 

 


As you can see source port 45760 was used, closed, and attempted to be re-used about 19 seconds later.  The second connection wasn't successful, it never received a response from the other system.


Luckily we did a trace at both ends so we can check to see if that second socket ever made it there. The trace file from the destination only shows one socket ever used source port 45760 and those SYN packets never were received.


In a few minutes we know it isn't an issue with the destination server, and the client server is able to bind again to 45760 so it doesn't appear to be a port exhaustion issue on the client. That leaves us with the firewall in the middle. 

Now our firewall is a Juniper SRX 3600. It is important to know what happens when a session is used in the firewall. All firewall maintain a session table. We can't open two simultaneous sessions using the same 5 TUPLE (Source IP, Source Port, Dest IP, Dest Port, and Protocol).  while the behavior will vary by protocol (i.e. TCP vs UDP), with TCP things are pretty straight forward because the client and server keep track of the session state and it is transmitted in the communication flow, so the firewall simply has to monitor the flow and watch for the goodbyes  to happen (FINs or RSTs in packet lingo). 

In our screenshot above you can see the goodbye messages start at packet 13. Once the first FIN packet is received the firewall marks the session as closing. It is too soon to close the session now because we want to give both sides time to finish gracefully, so we move the session to a TIME_WAIT state. If we see the goodbye messages complete we clear the session from the table immediately, otherwise we wait some period and then close the session. Junipers default is 150 seconds to wait after seeing first FIN packet. 

The behavior of the client is a bit screwy. We would expect the client to send one final ACK, but instead sends a RST (or two). This doesn't satisfy the firewalls requirements to complete the TIME_WAIT  and the firewall keeps that in state until the 150s timer expires. We quickly decided that the application behavior could not be modified so we began to investigate the firewall timers. 

Juniper sets that TIME_WAIT as a global option. 

        set security flow tcp-session time-wait-state session-timeout <2-600 seconds>

Lowering that session timer to 20 improved performance. 10 seconds was nearly unthrottled and 5 seconds let the test run at full performance.


The only lingering question I have is about the RST. I would have expected the firewall to clear the session immediately on the RST but I haven't found that documentation from Juniper yet so it looks like I need to do some lab testing. Does the RST also trigger the same 150s wait or was the fact that the session was already in the TIME_WAIT a factor?


This was a fun little problem, we got to look at some packets and solve some problems, a great  date as a network engineer.





Posted
AuthorKelly McDowell