[Day #59 PyATS Series] Detect Split-Horizon Issues in Large Networks Using pyATS for Cisco [Python for Network Engineer]
Table of Contents
Introduction — key points
Split-horizon and related route-advertisement problems are subtle but catastrophic in large networks: routes sometimes don’t propagate where they should, or — worse — are advertised back like a boomerang creating loops. Detecting these issues at scale requires automation, consistent evidence (raw CLI + parsed JSON), and a methodical validation workflow.
In this Article you will get:
- A practical detection strategy for split-horizon and advertisement anomalies across EIGRP/RIP/BGP scenarios.
- A pyATS job that snapshots interfaces, neighbors, and route tables, builds a topology map, and applies heuristics to flag suspicious routes.
- Step-by-step guidance on how to interpret results, reduce false positives, and integrate with GUI reporting (Elasticsearch/Kibana).
- Hands-on CLI examples (what to expect) and remediation tips.
Topology Overview
Use a compact multi-site lab that exercises interaction between distributed protocols:

PE-AandPE-Bare edge routers: they peer between sites and with route reflectors.C1/C2are customer/LAN devices running distance-vector protocols (EIGRP/RIP) connecting into the backbone.- The automation host (pyATS) can SSH to all devices and also run data-plane tests (ping/traceroute) where needed.
This topology exercises the three common split-horizon flavors:
- Classic split-horizon (distance-vector like RIP/EIGRP): routes learned on an interface should not be advertised back out that same interface. Misconfig causes re-advertisement and possible loops.
- iBGP route distributon / route reflection anomalies: route reflectors failing to forward to clients or accidentally re-advertising routes in incorrect ways create reachability differences.
- Asymmetric propagation: a route is present on one side of the network but missing on the other (propagation failure).
Topology & Communications — what we collect and why
Management plane: SSH via pyATS (Genie-backed Device objects). For scale, run concurrency (we show a single-threaded version; you can upgrade to thread pools).
Key CLI outputs to collect (per device):
show ip interface brief— build IP → device mapping (needed to map next-hop IP to device).show ip route— full RIB to discover prefixes, route codes, and next-hops.- Protocol neighbor commands:
- EIGRP:
show ip eigrp neighbors - RIP:
show ip rip database/show ip rip statusor simplyshow ip routefilter forRentries - BGP:
show ip bgp summary,show ip bgp
- EIGRP:
show logging | tail 200— search for route-withdraw / flapping messages.- Optional:
show ip cefor forwarding table for data-plane validation.
Why we collect both interface lists and routes?
To detect split-horizon anomalies we often need to map next-hop addresses (the ‘via ‘) back to the actual device that owns the address. The interface table provides that mapping.
Validation signals we produce:
- Mutual next-hop anomalies: A learns prefix P via B and B learns P via A — suspicious (possible advertisement back).
- Missing neighbor propagation: Originator has P → neighbor does not see P (neighbor should receive it).
- Asymmetric reachability: Control-plane mismatch vs data-plane (traceroute/ping failure) — puts confidence into findings.
- Event correlation: Syslog denies/withdraws at times of change.
Workflow Script — full pyATS job (runnable)
Below is a single script detect_splithorizon.py. It is self-contained and includes robust parsing heuristics. Save next to your testbed.yml and run with python detect_splithorizon.py --testbed testbed.yml --run-id run001.
Warning: This script reads devices (read-only). Do not add write/clear commands. Test in lab.
#!/usr/bin/env python3
"""
detect_splithorizon.py
Detect split-horizon and route advertisement anomalies using pyATS.
Produces results/<run_id>/* with raw CLI + parsed JSON + anomaly report.
"""
import argparse, json, os, re, time
from pathlib import Path
from datetime import datetime
from genie.testbed import load
OUTDIR = Path("results")
OUTDIR.mkdir(exist_ok=True)
# Regex helpers
IP_RE = r'(?:\d{1,3}\.){3}\d{1,3}'
PREFIX_RE = r'\d+\.\d+\.\d+\.\d+/\d+'
# Basic route line regex (many IOS outputs follow this)
ROUTE_LINE_RE = re.compile(r'^(?P<code>[A-Z]+)\s+(?P<prefix>' + PREFIX_RE + r')\s+.*(?:via\s+(?P<next>' + IP_RE + r'))?', re.IGNORECASE)
IFACE_LINE_RE = re.compile(r'^(?P<intf>\S+)\s+(?P<ip>' + IP_RE + r')\s+\S+\s+\S+\s+(?P<status>\S+)\s+(?P<protocol>\S+)', re.IGNORECASE)
NEIGH_IP_RE = re.compile(IP_RE)
def ts():
return datetime.utcnow().isoformat() + "Z"
def save_text(run_id, device_name, label, text):
d = OUTDIR / run_id / device_name
d.mkdir(parents=True, exist_ok=True)
p = d / f"{label}.txt"
with open(p, "w") as f:
f.write(text or "")
return str(p)
def save_json(run_id, device_name, label, obj):
d = OUTDIR / run_id / device_name
d.mkdir(parents=True, exist_ok=True)
p = d / f"{label}.json"
with open(p, "w") as f:
json.dump(obj, f, indent=2)
return str(p)
def collect_device_outputs(device, run_id):
"""Collect the minimal set of outputs needed for analysis."""
name = device.name
print(f"[{ts()}] Collecting from {name}")
device.connect(log_stdout=False)
device.execute('terminal length 0')
outputs = {}
cmds = {
"interfaces": "show ip interface brief",
"routes": "show ip route",
"bgp_summary": "show ip bgp summary",
"eigrp_neighbors": "show ip eigrp neighbors",
"rip_status": "show ip rip database",
"logs": "show logging | tail 200"
}
for label, cmd in cmds.items():
try:
out = device.execute(cmd)
except Exception as e:
out = f"ERROR executing {cmd}: {e}"
outputs[label] = out
save_text(run_id, name, label, out)
device.disconnect()
save_json(run_id, name, "raw_outputs", outputs)
return outputs
def parse_interfaces(raw):
"""Return dict ip -> (intf, status)."""
ip2intf = {}
if not raw:
return ip2intf
for line in raw.splitlines():
m = IFACE_LINE_RE.match(line.strip())
if m:
ip = m.group('ip')
intf = m.group('intf')
status = m.group('status')
ip2intf[ip] = {"intf": intf, "status": status}
return ip2intf
def parse_routes(raw):
"""
Parse show ip route crude lines and return list of routes:
[{prefix, code, next_hop (may be None), raw_line}]
"""
routes = []
if not raw:
return routes
for line in raw.splitlines():
m = ROUTE_LINE_RE.match(line.strip())
if m:
routes.append({
"prefix": m.group('prefix'),
"code": m.group('code'),
"next_hop": m.group('next'),
"raw": line.strip()
})
return routes
def parse_neighbors(raw):
"""Extract neighbor IPs (from any neighbor command output) — heuristic."""
neighs = set()
if not raw:
return neighs
for m in NEIGH_IP_RE.finditer(raw):
neighs.add(m.group(0))
return list(neighs)
def build_ip_owner_map(all_ifaces):
"""
all_ifaces: dict device -> {ip -> info}
return ip -> device mapping (first match wins)
"""
ip2device = {}
for dev, ifs in all_ifaces.items():
for ip in ifs.keys():
ip2device[ip] = dev
return ip2device
def detect_mutual_next_hop(routes_by_device, ip2device):
"""
Find cases where device A has prefix P via next-hop B, and B has prefix P via next-hop A.
Returns list of anomalies.
"""
anomalies = []
# build prefix-> device->nexthop mapping
prefix_map = {}
for dev, routes in routes_by_device.items():
for r in routes:
prefix_map.setdefault(r['prefix'], {})[dev] = r.get('next_hop')
for prefix, dev_map in prefix_map.items():
for a, next_a in dev_map.items():
if not next_a:
continue
b = ip2device.get(next_a)
if not b:
continue
# does b have the same prefix learned via next hop pointing to an IP owned by a?
b_routes = prefix_map.get(prefix, {})
next_b_ip = b_routes.get(b)
if not next_b_ip:
continue
owner_of_next_b = ip2device.get(next_b_ip)
if owner_of_next_b == a:
anomalies.append({
"prefix": prefix, "device_a": a, "device_b": b,
"a_next_hop": next_a, "b_next_hop": next_b_ip,
"description": "Mutual next-hop detected (possible advertisement loop)"
})
return anomalies
def detect_missing_propagation(prefix_origins, routes_by_device, adjacency_map):
"""
For each prefix origin device, check immediate neighbors (adjacency_map)
to see if they have the prefix. If a neighbor lacks it, flag missing propagation.
adjacency_map: device -> list of neighbor devices (by management IP mapping)
"""
missing = []
for prefix, origins in prefix_origins.items():
for origin in origins:
# neighbors of origin (list of neighbor device names)
neighbors = adjacency_map.get(origin, [])
for nbr in neighbors:
# does neighbor have prefix in its routes?
has = any(r['prefix'] == prefix for r in routes_by_device.get(nbr, []))
if not has:
missing.append({
"prefix": prefix, "origin": origin, "neighbor": nbr,
"description": "Neighbor missing prefix (expected to receive advertisement)"
})
return missing
def find_prefix_origins(all_ifaces, routes_by_device):
"""
Heuristic: if a device has an interface IP that falls within prefix network,
treat it as origin. Very coarse: we treat /24 and larger prefixes by simple containment.
"""
origins = {}
def ip_to_int(ip):
parts = [int(p) for p in ip.split('.')]
return (parts[0]<<24)|(parts[1]<<16)|(parts[2]<<8)|parts[3]
def prefix_contains(prefix, ip):
p, plen = prefix.split('/')
plen = int(plen)
ipn = ip_to_int(ip)
pn = ip_to_int(p)
mask = (0xffffffff << (32-plen)) & 0xffffffff
return (ipn & mask) == (pn & mask)
# build quick list of interface ips per device
for dev, ifs in all_ifaces.items():
for ip in ifs.keys():
for dev2, routes in routes_by_device.items():
for r in routes:
pre = r['prefix']
try:
if prefix_contains(pre, ip):
origins.setdefault(pre, set()).add(dev)
except Exception:
pass
# convert sets to lists
return {p: list(s) for p,s in origins.items()}
def build_adjacency_map(neigh_raw_by_device, ip2device):
"""
neigh_raw_by_device: device -> raw neighbor output text
Use neighbor IPs from neigbour raw and map to device names via ip2device
"""
adj = {}
for dev, raw in neigh_raw_by_device.items():
adj.setdefault(dev, set())
for ip in parse_neighbors(raw):
owner = ip2device.get(ip)
if owner and owner != dev:
adj[dev].add(owner)
adj.setdefault(owner, set()).add(dev)
# convert sets to lists
return {k:list(v) for k,v in adj.items()}
def main(testbed_file, run_id):
tb = load(testbed_file)
all_ifaces = {}
routes_by_device = {}
neigh_raw_by_device = {}
# 1) collect outputs
for name, dev in tb.devices.items():
outputs = collect_device_outputs(dev, run_id)
# parse interfaces
all_ifaces[name] = parse_interfaces(outputs.get('interfaces'))
# parse routes
routes_by_device[name] = parse_routes(outputs.get('routes'))
# neighbor raw (concatenate protocol neighbor outputs)
neigh_raw = (outputs.get('eigrp_neighbors') or '') + "\n" + (outputs.get('bgp_summary') or '') + "\n" + (outputs.get('rip_status') or '')
neigh_raw_by_device[name] = neigh_raw
# 2) build ip->device map
ip2device = build_ip_owner_map(all_ifaces)
save_json(run_id, "global", "ip2device", ip2device)
# 3) build adjacency map
adjacency_map = build_adjacency_map(neigh_raw_by_device, ip2device)
save_json(run_id, "global", "adjacency", adjacency_map)
# 4) detect mutual next-hop anomalies
mutuals = detect_mutual_next_hop(routes_by_device, ip2device)
# 5) detect missing propagation (heuristic)
origins = find_prefix_origins(all_ifaces, routes_by_device)
missing = detect_missing_propagation(origins, routes_by_device, adjacency_map)
# 6) assemble report
report = {
"run_id": run_id,
"collected_at": ts(),
"devices": list(tb.devices.keys()),
"ip2device": ip2device,
"adjacency": adjacency_map,
"mutual_next_hop_anomalies": mutuals,
"missing_propagation": missing,
"summary": {
"total_devices": len(tb.devices),
"mutual_issues": len(mutuals),
"missing_propagations": len(missing)
}
}
save_json(run_id, "global", "splithorizon_report", report)
print(f"[{ts()}] Report saved under results/{run_id}/global/splithorizon_report.json")
print(json.dumps(report['summary'], indent=2))
return report
if __name__ == "__main__":
ap = argparse.ArgumentParser()
ap.add_argument("--testbed", required=True)
ap.add_argument("--run-id", required=True)
args = ap.parse_args()
main(args.testbed, args.run_id)
How to run (example):
python detect_splithorizon.py --testbed testbed.yml --run-id run001
Explanation by Line
I’ll walk through the important parts so you and your students understand decisions and limitations.
Top-level structure
collect_device_outputs()— runs a set of read-only commands and saves raw output (audit trail). The script stores raw outputs so you can always re-parse or hand them to engineers for investigation.
Parsing interface addresses
parse_interfaces()usesshow ip interface brieflines and a regex to map IP → interface for each device. This mapping is crucial to resolve next-hop IP addresses to actual devices, so we know which device “owns” a next hop.
Parsing routes
parse_routes()uses a conservative regex to find route lines with a code (likeO,B,D,R) and a next-hop IP usingvia <ip>. This captures most route entries in IOS/IOS-XE/IOS-XR style RIB dumps. It does not attempt to be a full route parser (Genie parsers exist), but the heuristic is robust enough for our detection logic. You can and should replaceparse_routes()with Geniedevice.parse('show ip route')when the platform parser is available for more accuracy.
Building ip→device map
build_ip_owner_map()simply walks all interface IPs discovered and maps them to device names. For multi-homed IPs or NATs this can be ambiguous; first match wins. In practice, management and interface IPs in lab environments are unique.
Adjacency map
build_adjacency_map()scans neighbor outputs (EIGRP/BGP/RIP) for IPs and resolves them to device names using the ip→device map. This forms a lightweight graph of immediate neighbors — used to decide “which neighbors should have seen an advertisement”.
Mutual next-hop detection (detect_mutual_next_hop)
- This is the core heuristic for finding split-horizon-like anomalies: if Device A has prefix P with next-hop an IP owned by B, and B has P with next-hop an IP owned by A, then each appears to be depending on the other — symptomatic of misadvertisement or loops.
- This is not absolute proof of split-horizon being disabled, but it is a strong signal worth investigating.
Missing propagation detection
- We heuristically find prefix origins by checking if any interface IP sits within a prefix. If device X is an origin for prefix P, we expect its neighbors (from adjacency map) to see P. If a neighbor lacks P, it’s either a valid policy (intended) or a propagation problem. The script flags it so an operator can investigate.
Limitations & false positives
- Not all missing advertisements are bugs — policy filters, route-maps, VRFs, and distribute-lists intentionally limit propagation. Always cross-check suspected anomalies with intended policies. The script provides raw evidence (saved CLI) to do that.
- Multi-area or multi-VRF networks require extending the script to be VRF-aware (
show ip route vrf <vrf>). - BGP route reflection complexities (communities, localpref, suppress-maps) require protocol-specific logic for accurate detection. We provide a practical starting point.
testbed.yml Example
A minimal testbed for the lab. Put this as testbed.yml and update credentials/IPs for your lab.
testbed:
name: splithorizon_lab
credentials:
default:
username: netops
password: NetOps!23
devices:
PE_A:
os: iosxe
type: router
connections:
cli:
protocol: ssh
ip: 10.0.100.11
PE_B:
os: iosxr
type: router
connections:
cli:
protocol: ssh
ip: 10.0.100.12
C1:
os: iosxe
type: router
connections:
cli:
protocol: ssh
ip: 10.0.100.21
C2:
os: iosxe
type: router
connections:
cli:
protocol: ssh
ip: 10.0.100.22
Notes:
- Use separate management IPs accessible to the automation host.
- Add
customtestbed fields describing role/site to enrich the final report if desired.
Post-validation CLI (Real expected output)
Below are textual screenshots (fixed-width) that you can paste into a blog or teaching slides. These are realistic outputs and what the script expects to parse.
A — show ip interface brief example
PE_A# show ip interface brief Interface IP-Address OK? Method Status Protocol GigabitEthernet0/0 10.0.1.1 YES manual up up Loopback0 10.10.0.1 YES manual up up GigabitEthernet0/1 192.168.1.1 YES manual up up
B — show ip route snippet (mutual next-hop suspicious case)
PE_A# show ip route O 172.16.10.0/24 [110/2] via 10.0.1.2, 00:02:10, GigabitEthernet0/0 B 203.0.113.0/24 [20/0] via 192.0.2.2, 00:01:23
PE_B# show ip route O 172.16.10.0/24 [110/2] via 10.0.2.1, 00:02:11, GigabitEthernet0/0 B 203.0.113.0/24 [20/0] via 192.0.2.1, 00:02:00
Interpretation: PE_A reports 172.16.10.0/24 via next-hop 10.0.1.2 (likely owned by PE_B); PE_B reports same prefix via 10.0.2.1 (owned by PE_A) — mutual next-hop.
C — show ip eigrp neighbors example
C1# show ip eigrp neighbors K Address Interface Hold Uptime SRTT RTO Q Seq 0 10.0.1.1 Gi0/0 11 00:02:30 20 500 0 12
D — Example saved report excerpt (results/run001/global/splithorizon_report.json)
{
"run_id": "run001",
"collected_at": "2025-08-28T12:00:00Z",
"mutual_next_hop_anomalies": [
{
"prefix": "172.16.10.0/24",
"device_a": "PE_A",
"device_b": "PE_B",
"a_next_hop": "10.0.1.2",
"b_next_hop": "10.0.2.1",
"description": "Mutual next-hop detected (possible advertisement loop)"
}
],
"missing_propagation": [
{
"prefix": "10.10.5.0/24",
"origin": "C2",
"neighbor": "PE_A",
"description": "Neighbor missing prefix (expected to receive advertisement)"
}
]
}
These artifacts are what you present to NOC teams or attach to change tickets.
FAQs
Q1 — What exactly is split-horizon and why does it matter here?
A: Split-horizon (in distance-vector protocols) prevents a router from advertising a route back out the interface it learned that route from. This avoids two-router loops. If split-horizon is disabled or misapplied, routes may be re-advertised back, causing mutual dependencies and potential loops. Our script identifies patterns that look like mutual advertisement (device A routes via B while B routes via A), which is a strong red flag.
Q2 — How does the script avoid false positives when policy filters intentionally block propagation?
A: It doesn’t — intentionally. The output flags anomalies; you must correlate with configuration (ACLs, distribute-lists, route-maps). Because the script saves raw show running-config, show ip route and show logging, you get evidence to determine intent. To reduce noise, add a configuration rule check step: ignore missing propagation when a policy explicitly prevents propagation.
Q3 — Can this detect issues in BGP route reflection setups?
A: Yes, indirectly. For iBGP with route reflectors, the symptom is missing prefixes on clients. The script flags “missing propagation” when a prefix originates on a device but a neighbor reachable via the RR doesn’t see it. For precise RR behavior you should enrich the script to parse show ip bgp neighbors and community/localpref data and verify RR client lists.
Q4 — How will this scale to hundreds of devices?
A: The single-threaded script is a starting point. For scale:
- Run collectors concurrently (ThreadPoolExecutor or pyATS test runner with concurrency).
- Use streaming telemetry (gNMI/telemetry) instead of CLI polling where available.
- Centralize processing (collect raw outputs to an object store and run analysis jobs on a server cluster).
- Limit parsing to prefixes of interest (critical services) rather than the full Internet routing table.
Q5 — What about VRFs and multi-tenant networks?
A: Extend the collector to iterate per VRF: show ip route vrf <vrf> and show ip interface brief vrf <vrf> and maintain VRF context in the ip→device map. Our heuristic assumes a global table; for VRFs you must namespace prefixes and interface lookups by VRF.
Q6 — How should operators remediate a mutual next-hop anomaly?
A: Typical steps:
- Confirm the anomaly in the saved raw outputs.
- Inspect where the route originated (check
show ip routeon origin) and whether policies intentionally filter. - If unintended: check for
no split-horizonor manual route redistribution rules on interfaces and disable misconfiguration. - Use
clear ip route <prefix>andclear ip bgpcautiously if needed (prefer controlled restart). - After remediation, re-run the script to verify the anomaly cleared.
Q7 — Can we get a confidence score for each finding?
A: Yes — implement scoring by combining signals:
- Mutual next-hop = high severity.
- Missing propagation + no policy permitting = medium.
- Missing propagation but a distribute-list found = low.
Add scoring logic by correlating config lines and syslog context.
Q8 — How do we visualize findings for NOC and change managers?
A: Index the JSON report into Elasticsearch (index splithorizon-*) and build Kibana dashboards:
- Table: recent runs with anomalies count.
- Heatmap: devices with most anomalies.
- Per-prefix detail panels linking to raw CLI snapshots.
Alternatively, generate a simple HTML report fromreport.jsonand attach to change tickets.
YouTube Link
Watch the Complete Python for Network Engineer: Detect split-horizon issues in large networks Using pyATS for Cisco [Python for Network Engineer] Lab Demo & Explanation on our channel:
Join Our Training
If you want guided, instructor-led, hands-on training to implement, harden, and productionize automation flows like this — including pyATS, Genie parsers, telemetry, CI/CD integration and dashboards — join Trainer Sagar Dhawan’s 3-month instructor-led course: Python, Ansible, API & Cisco DevNet for Network Engineers. The course walks you through building full toolchains, from scripts to enterprise deployment, and will accelerate your path to become a confident Python for Network Engineer.
Learn more and enroll: https://course.networkjourney.com/python-ansible-api-cisco-devnet-for-network-engineers/
Join the program and start automating network reliability with confidence — from split-horizon detection to automated remediation.
Enroll Now & Future‑Proof Your Career
Email: info@networkjourney.com
WhatsApp / Call: +91 97395 21088

![[DAY#10 PyATS Series] Parsing and Normalizing ARP Tables (Multi-Vendor) using pyATS (Vendor-Agnostic) [Python for Network Engineer]](https://networkjourney.com/wp-content/uploads/2025/07/Parsing-and-Normalizing-ARP-Tables-Multi-Vendor-using-pyATS-Vendor-Agnostic-1.png)
![[Day #35 Pyats Series] Parsing and validating access-lists (ACLs) using pyATS for Cisco [Python for Network Engineer]](https://networkjourney.com/wp-content/uploads/2025/08/Parsing-and-validating-access-lists-ACLs-using-pyATS-for-Cisco.png)