Table of Contents
Problem Summary
An enterprise experienced intermittent BGP neighbor flaps between core and WAN routers. The problem began after a WAN upgrade to a new service provider with slightly different interface MTU defaults.
- BGP adjacency between two routers established successfully but would flap every few minutes.
- Logs showed:
BGP HOLD TIMER EXPIRED
andBGP NOTIFICATION - Hold Time expired
. - The physical link remained up/up, and interface errors were not observed.
- Network services relying on BGP were intermittently impacted, affecting cloud peering, MPLS reachability, and backup routing.
Symptoms Observed
- BGP neighbor state toggled between
Established
andIdle/Active
. - Flap frequency aligned with the BGP Hold Timer (typically 180s).
show ip bgp summary
showed increased number of flaps:Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 192.0.2.2 4 65002 1045 1040 0 0 0 00:03:00 Idle (flapping)
- No CRC errors, drops, or interface resets.
- Other routing protocols (OSPF, EIGRP) over same WAN link were stable.
debug ip bgp
ordebug bgp events
showed:%BGP-5-ADJCHANGE: neighbor 192.0.2.2 Down BGP Notification sent - hold time expired
Root Cause Analysis
After deeper analysis using packet captures and interface checks, it was discovered that:
- One router had a default MTU of 1500 bytes.
- The WAN provider or adjacent router had an MTU of 1496 or lower, possibly due to encapsulation overhead (MPLS, GRE, PPPoE).
- BGP keepalives or update packets larger than the path MTU were being silently dropped.
- Since BGP is a TCP-based protocol, if the keepalive packet is not acknowledged due to MTU fragmentation/drop, the Hold Timer expires, causing the session to reset.
Root Cause: MTU mismatch between BGP peers, leading to unacknowledged TCP keepalives and resulting in Hold Timer expiry.
The Fix
Step-by-Step Resolution:
1. Identify the MTU on both ends
show interfaces <interface>
Check the MTU
field on both BGP neighbor-facing interfaces.
2. Enable BGP TCP Path MTU Discovery (PMTUD)
In most Cisco devices, this is enabled by default, but confirm:
router bgp 65001 neighbor 192.0.2.2 transport path-mtu-discovery
3. Match MTU Manually (Best Practice)
Set both ends to a safe, common MTU value, e.g., 1400 or 1476:
interface GigabitEthernet0/0 mtu 1400
4. Clamp TCP MSS if needed
For BGP over VPN or tunneling:
interface Tunnel0 ip tcp adjust-mss 1360
5. Clear BGP session and monitor
clear ip bgp 192.0.2.2
Monitor with:
show ip bgp summary debug ip tcp transactions
EVE-NG Lab Topology

- Simulate MTU mismatch on WAN interface
- Capture BGP session resets using
debug bgp
- Fix with matching MTU and observe stable session
Verification
Use These Commands:
show ip bgp summary show interfaces <wan> show tcp brief | include 179 debug ip bgp debug ip tcp transaction
- Confirm no more flaps in
bgp summary
- TCP sessions on port 179 should remain ESTABLISHED
- Wireshark should show no fragmented or dropped TCP SYN/ACKs
Key Takeaways
- BGP session resets without interface errors often point to TCP-level issues like MTU mismatch.
- Ping works, but BGP fails because TCP sessions drop due to PMTU violations.
- Always validate WAN MTU, especially with MPLS, GRE, or VPN overlays.
- BGP relies on Keepalives and Hold Timers; MTU drop breaks keepalive → triggers reset.
- Matching MTU or clamping MSS can resolve persistent flapping.
Best Practice / Design Tips
- Always match MTU across BGP peers, especially over third-party networks.
- Use MSS clamping for VPN tunnels and overlays.
- Prefer path-mtu-discovery to be enabled (it is by default).
- Monitor BGP using
sh ip bgp summary
and trackFlap Count
. - Avoid jumbo frames across WAN unless end-to-end supported.
- Document MTU settings for each peer in your NMS.
- Use SNMP BGP traps for alerting on flaps.
- Design networks with consistent MTU domains.
- Disable ICMP filtering, as PMTUD uses ICMP “fragmentation needed” messages.
- For DMVPN or MPLS, lower MTU to ~1400 by default.
FAQs
1. Why does BGP flap even when the interface is up?
Answer: Because BGP is TCP-based and requires successful Keepalive exchanges. If TCP packets are dropped (e.g., due to MTU issues), the Hold Timer expires and causes session reset.
2. What is a BGP Hold Timer?
Answer: It’s the maximum time a BGP speaker waits without receiving a Keepalive or Update message from its peer before declaring the session down. Default: 180 seconds.
3. How is MTU related to BGP?
Answer: If MTU is mismatched, TCP packets (like BGP updates or Keepalives) can be dropped mid-transit, silently breaking the BGP session.
4. What is Path MTU Discovery (PMTUD)?
Answer: It’s a mechanism where TCP negotiates the largest possible packet size along the path. It depends on ICMP “Fragmentation Needed” messages to function.
5. How to check if MTU mismatch is the problem?
Answer: Use ping <IP> size <bytes> df-bit
, or Wireshark to look for dropped large packets or ICMP “Fragmentation Needed” responses.
6. Why do other protocols like OSPF/EIGRP work fine?
Answer: OSPF/EIGRP often use multicast/UDP or shorter packets, so they’re less impacted by MTU issues compared to BGP, which runs over TCP.
7. How to fix this without changing MTU?
Answer: You can clamp TCP MSS on interfaces to ensure TCP packets stay below the MTU.
8. Can this happen over MPLS or VPNs?
Answer: Yes, and it’s very common due to added headers (MPLS = 4 bytes, IPsec = ~60 bytes), reducing effective MTU.
9. Does Cisco ASA affect BGP MTU?
Answer: If BGP traverses ASA VPNs, and MSS clamp is not configured, similar issues can occur.
10. Is this an issue on LAN?
Answer: Rarely. LANs typically have uniform MTU. This is more common on WAN, Internet, or VPN segments.
11. What MTU is recommended for BGP over VPN?
Answer: Safe value: 1400 bytes. For IPsec, 1350–1380 works well.
12. What’s the command to verify BGP session uptime?
show ip bgp summary
Check Up/Down
column and ensure it’s stable.
13. Can PMTUD be disabled?
Answer: It’s possible, but not recommended. PMTUD improves TCP performance and avoids manual MTU headaches.
14. Does BGP support jumbo frames?
Answer: Technically yes, but only if both ends and all intermediate devices support and allow larger MTUs (>1500). Otherwise, it’s unsafe.
15. What tools help in troubleshooting MTU issues?
Answer:
ping df-bit size
debug ip tcp
- Wireshark (look for fragmentation)
show interface mtu
YouTube Link
Watch the Complete CCNP Enterprise: BGP Neighbor Flap – Hold Timer Expiry Due to MTU Mismatch Lab Demo & Explanation on our channel:
Final Note
Understanding how to differentiate and implement BGP Neighbor Flap – Hold Timer Expiry Due to MTU Mismatch is critical for anyone pursuing CCNP Enterprise (ENCOR) certification or working in enterprise network roles. Use this guide in your practice labs, real-world projects, and interviews to show a solid grasp of architectural planning and CLI-level configuration skills.
If you found this article helpful and want to take your skills to the next level, I invite you to join my Instructor-Led Weekend Batch for:
CCNP Enterprise to CCIE Enterprise – Covering ENCOR, ENARSI, SD-WAN, and more!
Get hands-on labs, real-world projects, and industry-grade training that strengthens your Routing & Switching foundations while preparing you for advanced certifications and job roles.
Email: info@networkjourney.com
WhatsApp / Call: +91 97395 21088
Upskill now and future-proof your networking career!