If ip-monitoring does not work on an SRX cluster..

During a recent project, I built a Juniper SRX cluster where a Reth connected via a LAG to a switch, which in turn is connected it to the Internet. In case of a failure, they Reth should failover to the second node where the 2nd part of a 4 cable LAG was configured. This LAG was connected to a 2nd LAG on the same switch. Just like the documentation says it should.
Sounds easy right?..

Well.. it did, BUT.. during extensive Systems Acceptance Tests, we found out that on regular occasions, the second node that was NOT primary for the Reth, suffered from ip-monitoring reachability problems when testing the internet connectivity via ip-monitoring.

The way it is supposed to work, is that the secondary node should on a regular bases verify connectivity to the monitored IP address (usually the default gateway/router) via a secondary IP address.
And for some reason.. that started intermittently failing for inexplicable reasons. Not very nice, as it mean that the secondary node declared the Reth on that node unfit for action. In other words: the Primary Node Reth could NOT FAILOVER.

On this RG, both IP monitoring as well as Interface monitoring was configured, which apparently are both Dataplane functionalities.

I can not find anything in the documentation that states that ip-monitoring and interface monitoring on a Reth that consists of a LAG is not supported. But because of a remark on this Juniper website:

“https://www.juniper.net/documentation/en_US/junos/topics/topic-map/security-chassis-cluster-ip-address-monitoring.html”

where they state :

” do not recommend configuring chassis cluster IP monitoring on Redundancy Group 0 (RG0) for SRX Series devices.”

I became suspicious.

So.. i turned OFF interface monitoring and left ip-monitoring. And la voilà, ça marche!
The downside of this is of course that in case of an interface failure, your failover time is now a lot longer as the ip-monitoring will have to timeout first, whereas interface monitoring will failover virtually instantaneously. But the customer had no problem with that so this was accepted as a workable solution.

For all the other Reths that did not deploy ip-monitoring, interface monitoring was left in place and worked admirably.

I hope this helps you, the ip-monitoring failures on the 2nd node was intermittent but would NOT go away by any restart i could find. 🙁
And let’s face it: who wants to restart ANYTHING on a running production cluster?!

This entry was posted in Juniper-Junos. Bookmark the permalink.

2 Responses to If ip-monitoring does not work on an SRX cluster..

  1. nsfgav says:

    Hi, I know this post is from a while back but I came across it while investigating a similar issue. In the document you reference it does state…

    “On SRX Branch Series devices, when the reth interface has more than one physical interface configured, IP monitoring for redundant groups is not supported. The SRX uses the lowest interface in the bundle for tracking on the secondary node. If the peer forwards the reply on any other port except the one it received it on, the SRX drops it”

    This would produce the intermittent reach-ability problems on the secondary node you described, assuming you were using branch series devices. Removing interface monitoring would in theory not have fixed this issue though, but just thought I’d share my thoughts.

  2. Devin says:

    Thank you for addressing the potential of offering virtual assistant services and supporting businesses or entrepreneurs with administrative tasks, organization, or project management. click here for more details.

Leave a Reply

Your email address will not be published. Required fields are marked *