[Day #59 PyATS Series] Detect Split-Horizon Issues in Large Networks Using pyATS for Cisco [Python for Network Engineer]
Table of Contents
Introduction — key points
Split-horizon and related route-advertisement problems are subtle but catastrophic in large networks: routes sometimes don’t propagate where they should, or — worse — are advertised back like a boomerang creating loops. Detecting these issues at scale requires automation, consistent evidence (raw CLI + parsed JSON), and a methodical validation workflow.
In this Article you will get:
- A practical detection strategy for split-horizon and advertisement anomalies across EIGRP/RIP/BGP scenarios.
- A pyATS job that snapshots interfaces, neighbors, and route tables, builds a topology map, and applies heuristics to flag suspicious routes.
- Step-by-step guidance on how to interpret results, reduce false positives, and integrate with GUI reporting (Elasticsearch/Kibana).
- Hands-on CLI examples (what to expect) and remediation tips.
Topology Overview
Use a compact multi-site lab that exercises interaction between distributed protocols:

PE-A
andPE-B
are edge routers: they peer between sites and with route reflectors.C1
/C2
are customer/LAN devices running distance-vector protocols (EIGRP/RIP) connecting into the backbone.- The automation host (pyATS) can SSH to all devices and also run data-plane tests (ping/traceroute) where needed.
This topology exercises the three common split-horizon flavors:
- Classic split-horizon (distance-vector like RIP/EIGRP): routes learned on an interface should not be advertised back out that same interface. Misconfig causes re-advertisement and possible loops.
- iBGP route distributon / route reflection anomalies: route reflectors failing to forward to clients or accidentally re-advertising routes in incorrect ways create reachability differences.
- Asymmetric propagation: a route is present on one side of the network but missing on the other (propagation failure).
Topology & Communications — what we collect and why
Management plane: SSH via pyATS (Genie-backed Device
objects). For scale, run concurrency (we show a single-threaded version; you can upgrade to thread pools).
Key CLI outputs to collect (per device):
show ip interface brief
— build IP → device mapping (needed to map next-hop IP to device).show ip route
— full RIB to discover prefixes, route codes, and next-hops.- Protocol neighbor commands:
- EIGRP:
show ip eigrp neighbors
- RIP:
show ip rip database
/show ip rip status
or simplyshow ip route
filter forR
entries - BGP:
show ip bgp summary
,show ip bgp
- EIGRP:
show logging | tail 200
— search for route-withdraw / flapping messages.- Optional:
show ip cef
or forwarding table for data-plane validation.
Why we collect both interface lists and routes?
To detect split-horizon anomalies we often need to map next-hop addresses (the ‘via ‘) back to the actual device that owns the address. The interface table provides that mapping.
Validation signals we produce:
- Mutual next-hop anomalies: A learns prefix P via B and B learns P via A — suspicious (possible advertisement back).
- Missing neighbor propagation: Originator has P → neighbor does not see P (neighbor should receive it).
- Asymmetric reachability: Control-plane mismatch vs data-plane (traceroute/ping failure) — puts confidence into findings.
- Event correlation: Syslog denies/withdraws at times of change.
Workflow Script — full pyATS job (runnable)
Below is a single script detect_splithorizon.py
. It is self-contained and includes robust parsing heuristics. Save next to your testbed.yml
and run with python detect_splithorizon.py --testbed testbed.yml --run-id run001
.
Warning: This script reads devices (read-only). Do not add write/clear commands. Test in lab.
#!/usr/bin/env python3 """ detect_splithorizon.py Detect split-horizon and route advertisement anomalies using pyATS. Produces results/<run_id>/* with raw CLI + parsed JSON + anomaly report. """ import argparse, json, os, re, time from pathlib import Path from datetime import datetime from genie.testbed import load OUTDIR = Path("results") OUTDIR.mkdir(exist_ok=True) # Regex helpers IP_RE = r'(?:\d{1,3}\.){3}\d{1,3}' PREFIX_RE = r'\d+\.\d+\.\d+\.\d+/\d+' # Basic route line regex (many IOS outputs follow this) ROUTE_LINE_RE = re.compile(r'^(?P<code>[A-Z]+)\s+(?P<prefix>' + PREFIX_RE + r')\s+.*(?:via\s+(?P<next>' + IP_RE + r'))?', re.IGNORECASE) IFACE_LINE_RE = re.compile(r'^(?P<intf>\S+)\s+(?P<ip>' + IP_RE + r')\s+\S+\s+\S+\s+(?P<status>\S+)\s+(?P<protocol>\S+)', re.IGNORECASE) NEIGH_IP_RE = re.compile(IP_RE) def ts(): return datetime.utcnow().isoformat() + "Z" def save_text(run_id, device_name, label, text): d = OUTDIR / run_id / device_name d.mkdir(parents=True, exist_ok=True) p = d / f"{label}.txt" with open(p, "w") as f: f.write(text or "") return str(p) def save_json(run_id, device_name, label, obj): d = OUTDIR / run_id / device_name d.mkdir(parents=True, exist_ok=True) p = d / f"{label}.json" with open(p, "w") as f: json.dump(obj, f, indent=2) return str(p) def collect_device_outputs(device, run_id): """Collect the minimal set of outputs needed for analysis.""" name = device.name print(f"[{ts()}] Collecting from {name}") device.connect(log_stdout=False) device.execute('terminal length 0') outputs = {} cmds = { "interfaces": "show ip interface brief", "routes": "show ip route", "bgp_summary": "show ip bgp summary", "eigrp_neighbors": "show ip eigrp neighbors", "rip_status": "show ip rip database", "logs": "show logging | tail 200" } for label, cmd in cmds.items(): try: out = device.execute(cmd) except Exception as e: out = f"ERROR executing {cmd}: {e}" outputs[label] = out save_text(run_id, name, label, out) device.disconnect() save_json(run_id, name, "raw_outputs", outputs) return outputs def parse_interfaces(raw): """Return dict ip -> (intf, status).""" ip2intf = {} if not raw: return ip2intf for line in raw.splitlines(): m = IFACE_LINE_RE.match(line.strip()) if m: ip = m.group('ip') intf = m.group('intf') status = m.group('status') ip2intf[ip] = {"intf": intf, "status": status} return ip2intf def parse_routes(raw): """ Parse show ip route crude lines and return list of routes: [{prefix, code, next_hop (may be None), raw_line}] """ routes = [] if not raw: return routes for line in raw.splitlines(): m = ROUTE_LINE_RE.match(line.strip()) if m: routes.append({ "prefix": m.group('prefix'), "code": m.group('code'), "next_hop": m.group('next'), "raw": line.strip() }) return routes def parse_neighbors(raw): """Extract neighbor IPs (from any neighbor command output) — heuristic.""" neighs = set() if not raw: return neighs for m in NEIGH_IP_RE.finditer(raw): neighs.add(m.group(0)) return list(neighs) def build_ip_owner_map(all_ifaces): """ all_ifaces: dict device -> {ip -> info} return ip -> device mapping (first match wins) """ ip2device = {} for dev, ifs in all_ifaces.items(): for ip in ifs.keys(): ip2device[ip] = dev return ip2device def detect_mutual_next_hop(routes_by_device, ip2device): """ Find cases where device A has prefix P via next-hop B, and B has prefix P via next-hop A. Returns list of anomalies. """ anomalies = [] # build prefix-> device->nexthop mapping prefix_map = {} for dev, routes in routes_by_device.items(): for r in routes: prefix_map.setdefault(r['prefix'], {})[dev] = r.get('next_hop') for prefix, dev_map in prefix_map.items(): for a, next_a in dev_map.items(): if not next_a: continue b = ip2device.get(next_a) if not b: continue # does b have the same prefix learned via next hop pointing to an IP owned by a? b_routes = prefix_map.get(prefix, {}) next_b_ip = b_routes.get(b) if not next_b_ip: continue owner_of_next_b = ip2device.get(next_b_ip) if owner_of_next_b == a: anomalies.append({ "prefix": prefix, "device_a": a, "device_b": b, "a_next_hop": next_a, "b_next_hop": next_b_ip, "description": "Mutual next-hop detected (possible advertisement loop)" }) return anomalies def detect_missing_propagation(prefix_origins, routes_by_device, adjacency_map): """ For each prefix origin device, check immediate neighbors (adjacency_map) to see if they have the prefix. If a neighbor lacks it, flag missing propagation. adjacency_map: device -> list of neighbor devices (by management IP mapping) """ missing = [] for prefix, origins in prefix_origins.items(): for origin in origins: # neighbors of origin (list of neighbor device names) neighbors = adjacency_map.get(origin, []) for nbr in neighbors: # does neighbor have prefix in its routes? has = any(r['prefix'] == prefix for r in routes_by_device.get(nbr, [])) if not has: missing.append({ "prefix": prefix, "origin": origin, "neighbor": nbr, "description": "Neighbor missing prefix (expected to receive advertisement)" }) return missing def find_prefix_origins(all_ifaces, routes_by_device): """ Heuristic: if a device has an interface IP that falls within prefix network, treat it as origin. Very coarse: we treat /24 and larger prefixes by simple containment. """ origins = {} def ip_to_int(ip): parts = [int(p) for p in ip.split('.')] return (parts[0]<<24)|(parts[1]<<16)|(parts[2]<<8)|parts[3] def prefix_contains(prefix, ip): p, plen = prefix.split('/') plen = int(plen) ipn = ip_to_int(ip) pn = ip_to_int(p) mask = (0xffffffff << (32-plen)) & 0xffffffff return (ipn & mask) == (pn & mask) # build quick list of interface ips per device for dev, ifs in all_ifaces.items(): for ip in ifs.keys(): for dev2, routes in routes_by_device.items(): for r in routes: pre = r['prefix'] try: if prefix_contains(pre, ip): origins.setdefault(pre, set()).add(dev) except Exception: pass # convert sets to lists return {p: list(s) for p,s in origins.items()} def build_adjacency_map(neigh_raw_by_device, ip2device): """ neigh_raw_by_device: device -> raw neighbor output text Use neighbor IPs from neigbour raw and map to device names via ip2device """ adj = {} for dev, raw in neigh_raw_by_device.items(): adj.setdefault(dev, set()) for ip in parse_neighbors(raw): owner = ip2device.get(ip) if owner and owner != dev: adj[dev].add(owner) adj.setdefault(owner, set()).add(dev) # convert sets to lists return {k:list(v) for k,v in adj.items()} def main(testbed_file, run_id): tb = load(testbed_file) all_ifaces = {} routes_by_device = {} neigh_raw_by_device = {} # 1) collect outputs for name, dev in tb.devices.items(): outputs = collect_device_outputs(dev, run_id) # parse interfaces all_ifaces[name] = parse_interfaces(outputs.get('interfaces')) # parse routes routes_by_device[name] = parse_routes(outputs.get('routes')) # neighbor raw (concatenate protocol neighbor outputs) neigh_raw = (outputs.get('eigrp_neighbors') or '') + "\n" + (outputs.get('bgp_summary') or '') + "\n" + (outputs.get('rip_status') or '') neigh_raw_by_device[name] = neigh_raw # 2) build ip->device map ip2device = build_ip_owner_map(all_ifaces) save_json(run_id, "global", "ip2device", ip2device) # 3) build adjacency map adjacency_map = build_adjacency_map(neigh_raw_by_device, ip2device) save_json(run_id, "global", "adjacency", adjacency_map) # 4) detect mutual next-hop anomalies mutuals = detect_mutual_next_hop(routes_by_device, ip2device) # 5) detect missing propagation (heuristic) origins = find_prefix_origins(all_ifaces, routes_by_device) missing = detect_missing_propagation(origins, routes_by_device, adjacency_map) # 6) assemble report report = { "run_id": run_id, "collected_at": ts(), "devices": list(tb.devices.keys()), "ip2device": ip2device, "adjacency": adjacency_map, "mutual_next_hop_anomalies": mutuals, "missing_propagation": missing, "summary": { "total_devices": len(tb.devices), "mutual_issues": len(mutuals), "missing_propagations": len(missing) } } save_json(run_id, "global", "splithorizon_report", report) print(f"[{ts()}] Report saved under results/{run_id}/global/splithorizon_report.json") print(json.dumps(report['summary'], indent=2)) return report if __name__ == "__main__": ap = argparse.ArgumentParser() ap.add_argument("--testbed", required=True) ap.add_argument("--run-id", required=True) args = ap.parse_args() main(args.testbed, args.run_id)
How to run (example):
python detect_splithorizon.py --testbed testbed.yml --run-id run001
Explanation by Line
I’ll walk through the important parts so you and your students understand decisions and limitations.
Top-level structure
collect_device_outputs()
— runs a set of read-only commands and saves raw output (audit trail). The script stores raw outputs so you can always re-parse or hand them to engineers for investigation.
Parsing interface addresses
parse_interfaces()
usesshow ip interface brief
lines and a regex to map IP → interface for each device. This mapping is crucial to resolve next-hop IP addresses to actual devices, so we know which device “owns” a next hop.
Parsing routes
parse_routes()
uses a conservative regex to find route lines with a code (likeO
,B
,D
,R
) and a next-hop IP usingvia <ip>
. This captures most route entries in IOS/IOS-XE/IOS-XR style RIB dumps. It does not attempt to be a full route parser (Genie parsers exist), but the heuristic is robust enough for our detection logic. You can and should replaceparse_routes()
with Geniedevice.parse('show ip route')
when the platform parser is available for more accuracy.
Building ip→device map
build_ip_owner_map()
simply walks all interface IPs discovered and maps them to device names. For multi-homed IPs or NATs this can be ambiguous; first match wins. In practice, management and interface IPs in lab environments are unique.
Adjacency map
build_adjacency_map()
scans neighbor outputs (EIGRP/BGP/RIP) for IPs and resolves them to device names using the ip→device map. This forms a lightweight graph of immediate neighbors — used to decide “which neighbors should have seen an advertisement”.
Mutual next-hop detection (detect_mutual_next_hop
)
- This is the core heuristic for finding split-horizon-like anomalies: if Device A has prefix P with next-hop an IP owned by B, and B has P with next-hop an IP owned by A, then each appears to be depending on the other — symptomatic of misadvertisement or loops.
- This is not absolute proof of split-horizon being disabled, but it is a strong signal worth investigating.
Missing propagation detection
- We heuristically find prefix origins by checking if any interface IP sits within a prefix. If device X is an origin for prefix P, we expect its neighbors (from adjacency map) to see P. If a neighbor lacks P, it’s either a valid policy (intended) or a propagation problem. The script flags it so an operator can investigate.
Limitations & false positives
- Not all missing advertisements are bugs — policy filters, route-maps, VRFs, and distribute-lists intentionally limit propagation. Always cross-check suspected anomalies with intended policies. The script provides raw evidence (saved CLI) to do that.
- Multi-area or multi-VRF networks require extending the script to be VRF-aware (
show ip route vrf <vrf>
). - BGP route reflection complexities (communities, localpref, suppress-maps) require protocol-specific logic for accurate detection. We provide a practical starting point.
testbed.yml Example
A minimal testbed for the lab. Put this as testbed.yml
and update credentials/IPs for your lab.
testbed: name: splithorizon_lab credentials: default: username: netops password: NetOps!23 devices: PE_A: os: iosxe type: router connections: cli: protocol: ssh ip: 10.0.100.11 PE_B: os: iosxr type: router connections: cli: protocol: ssh ip: 10.0.100.12 C1: os: iosxe type: router connections: cli: protocol: ssh ip: 10.0.100.21 C2: os: iosxe type: router connections: cli: protocol: ssh ip: 10.0.100.22
Notes:
- Use separate management IPs accessible to the automation host.
- Add
custom
testbed fields describing role/site to enrich the final report if desired.
Post-validation CLI (Real expected output)
Below are textual screenshots (fixed-width) that you can paste into a blog or teaching slides. These are realistic outputs and what the script expects to parse.
A — show ip interface brief
example
PE_A# show ip interface brief Interface IP-Address OK? Method Status Protocol GigabitEthernet0/0 10.0.1.1 YES manual up up Loopback0 10.10.0.1 YES manual up up GigabitEthernet0/1 192.168.1.1 YES manual up up
B — show ip route
snippet (mutual next-hop suspicious case)
PE_A# show ip route O 172.16.10.0/24 [110/2] via 10.0.1.2, 00:02:10, GigabitEthernet0/0 B 203.0.113.0/24 [20/0] via 192.0.2.2, 00:01:23
PE_B# show ip route O 172.16.10.0/24 [110/2] via 10.0.2.1, 00:02:11, GigabitEthernet0/0 B 203.0.113.0/24 [20/0] via 192.0.2.1, 00:02:00
Interpretation: PE_A reports 172.16.10.0/24 via next-hop 10.0.1.2 (likely owned by PE_B); PE_B reports same prefix via 10.0.2.1 (owned by PE_A) — mutual next-hop.
C — show ip eigrp neighbors
example
C1# show ip eigrp neighbors K Address Interface Hold Uptime SRTT RTO Q Seq 0 10.0.1.1 Gi0/0 11 00:02:30 20 500 0 12
D — Example saved report excerpt (results/run001/global/splithorizon_report.json
)
{ "run_id": "run001", "collected_at": "2025-08-28T12:00:00Z", "mutual_next_hop_anomalies": [ { "prefix": "172.16.10.0/24", "device_a": "PE_A", "device_b": "PE_B", "a_next_hop": "10.0.1.2", "b_next_hop": "10.0.2.1", "description": "Mutual next-hop detected (possible advertisement loop)" } ], "missing_propagation": [ { "prefix": "10.10.5.0/24", "origin": "C2", "neighbor": "PE_A", "description": "Neighbor missing prefix (expected to receive advertisement)" } ] }
These artifacts are what you present to NOC teams or attach to change tickets.
FAQs
Q1 — What exactly is split-horizon and why does it matter here?
A: Split-horizon (in distance-vector protocols) prevents a router from advertising a route back out the interface it learned that route from. This avoids two-router loops. If split-horizon is disabled or misapplied, routes may be re-advertised back, causing mutual dependencies and potential loops. Our script identifies patterns that look like mutual advertisement (device A routes via B while B routes via A), which is a strong red flag.
Q2 — How does the script avoid false positives when policy filters intentionally block propagation?
A: It doesn’t — intentionally. The output flags anomalies; you must correlate with configuration (ACLs, distribute-lists, route-maps). Because the script saves raw show running-config
, show ip route
and show logging
, you get evidence to determine intent. To reduce noise, add a configuration rule check step: ignore missing propagation when a policy explicitly prevents propagation.
Q3 — Can this detect issues in BGP route reflection setups?
A: Yes, indirectly. For iBGP with route reflectors, the symptom is missing prefixes on clients. The script flags “missing propagation” when a prefix originates on a device but a neighbor reachable via the RR doesn’t see it. For precise RR behavior you should enrich the script to parse show ip bgp neighbors
and community/localpref data and verify RR client lists.
Q4 — How will this scale to hundreds of devices?
A: The single-threaded script is a starting point. For scale:
- Run collectors concurrently (ThreadPoolExecutor or pyATS test runner with concurrency).
- Use streaming telemetry (gNMI/telemetry) instead of CLI polling where available.
- Centralize processing (collect raw outputs to an object store and run analysis jobs on a server cluster).
- Limit parsing to prefixes of interest (critical services) rather than the full Internet routing table.
Q5 — What about VRFs and multi-tenant networks?
A: Extend the collector to iterate per VRF: show ip route vrf <vrf>
and show ip interface brief vrf <vrf>
and maintain VRF context in the ip→device map. Our heuristic assumes a global table; for VRFs you must namespace prefixes and interface lookups by VRF.
Q6 — How should operators remediate a mutual next-hop anomaly?
A: Typical steps:
- Confirm the anomaly in the saved raw outputs.
- Inspect where the route originated (check
show ip route
on origin) and whether policies intentionally filter. - If unintended: check for
no split-horizon
or manual route redistribution rules on interfaces and disable misconfiguration. - Use
clear ip route <prefix>
andclear ip bgp
cautiously if needed (prefer controlled restart). - After remediation, re-run the script to verify the anomaly cleared.
Q7 — Can we get a confidence score for each finding?
A: Yes — implement scoring by combining signals:
- Mutual next-hop = high severity.
- Missing propagation + no policy permitting = medium.
- Missing propagation but a distribute-list found = low.
Add scoring logic by correlating config lines and syslog context.
Q8 — How do we visualize findings for NOC and change managers?
A: Index the JSON report into Elasticsearch (index splithorizon-*
) and build Kibana dashboards:
- Table: recent runs with anomalies count.
- Heatmap: devices with most anomalies.
- Per-prefix detail panels linking to raw CLI snapshots.
Alternatively, generate a simple HTML report fromreport.json
and attach to change tickets.
YouTube Link
Watch the Complete Python for Network Engineer: Detect split-horizon issues in large networks Using pyATS for Cisco [Python for Network Engineer] Lab Demo & Explanation on our channel:
Join Our Training
If you want guided, instructor-led, hands-on training to implement, harden, and productionize automation flows like this — including pyATS, Genie parsers, telemetry, CI/CD integration and dashboards — join Trainer Sagar Dhawan’s 3-month instructor-led course: Python, Ansible, API & Cisco DevNet for Network Engineers. The course walks you through building full toolchains, from scripts to enterprise deployment, and will accelerate your path to become a confident Python for Network Engineer.
Learn more and enroll: https://course.networkjourney.com/python-ansible-api-cisco-devnet-for-network-engineers/
Join the program and start automating network reliability with confidence — from split-horizon detection to automated remediation.
Enroll Now & Future‑Proof Your Career
Email: info@networkjourney.com
WhatsApp / Call: +91 97395 21088