[Day 103] Cisco ISE Mastery Training: High Availability & Failover Testing

[Day 103] Cisco ISE Mastery Training: High Availability & Failover Testing


Introduction

In enterprise networks, downtime is unacceptable. Whether you are protecting a hospital’s patient data, a bank’s financial records, or a defense network’s classified systems, Cisco ISE (Identity Services Engine) must be available 24×7 with zero disruption. A single node failure in an ISE cluster without proper high availability (HA) can cripple authentication, break policy enforcement, and lock users/devices out of critical services.

That’s why High Availability (HA) and Failover Testing is not a “nice-to-have” – it is a mission-critical requirement in every Cisco ISE deployment. Designing HA ensures redundancy, fault tolerance, and continuous authentication services. But design is not enough — you must validate and test failover scenarios in controlled labs and production rollouts, simulating real-world node crashes, database sync issues, or PSN overloads.

This session is where we move from theory to battle-ready engineering. You’ll not only configure HA, but also:

  • Validate session persistence during failover (wired, wireless, VPN).
  • Test redundancy for Policy Service Nodes (PSNs), PANs, and MnT nodes.
  • Simulate failures with CLI shutdowns, VM suspensions, and interface blocking.
  • Use ISE GUI, CLI, and syslogs to confirm that the cluster seamlessly absorbs the failure.

By the end of this workbook, you won’t just “know” HA design—you’ll be able to prove your Cisco ISE deployment can survive failures before your production network ever sees them.


Problem Statement (What breaks without discipline)

  • Outages during PSN patching → Wi-Fi login storms fail.
  • PAN outage → no policy changes, join/leave failures, stale config.
  • MnT outage → compliance gaps (lost logs).
  • WAN flap → remote sites time out against distant PSNs.
  • AD/DC hiccup → EAP-TLS OK, but PEAP/MSCHAPv2 fails; posture/guest flows degrade.
    You need a repeatable failover test to prove: no auth loss, no portal loops, logs preserved, clean promotion, fast detection.

Solution Overview (What we will implement & test)

  • PSN Tier: ≥2 PSNs behind LB with health probes + persistence (RADIUS & HTTPS).
  • PAN Tier: 1× Primary PAN + 1× Secondary PAN (manual promotion).
  • MnT Tier: 1× Active + 1× Standby (automatic role assumption).
  • Shared Services: NTP/DNS/PKI/AD reachability verified; latency within guidance.
  • Runbooks: Planned drain/patch, failover tests, rollback.

Sample Lab Topology (VMware / EVE-NG)

Compute/Apps

  • ISE nodes: PAN-PRI, PAN-SEC, MNT-ACT, MNT-STD, PSN-1..4
  • Load Balancer: F5 BIG-IP or Citrix ADC (VIPs: RADIUS-Auth 1812, RADIUS-Acct 1813, HTTPS 443)
  • Services: AD Domain Controller(s), CA/PKI, NTP/DNS, Syslog/SIEM
  • NADs: Catalyst 9300 (wired 802.1X/MAB), WLC 9800 (CWA/BYOD), AnyConnect/ASA or FTD VPN
  • Endpoints: Windows 11 (EAP-TLS), iOS/Android (Guest/BYOD), IoT (MAB)

Topology diagram:


Step-by-Step Guide (GUI + CLI Validation)

Step 1 – Verify Node Roles & Deployment State

Why: Before testing HA, you must confirm your nodes (PAN, MnT, PSN) are properly registered and synchronized.

GUI Validation

  1. Log in to the ISE Primary Admin Node.
  2. Navigate to:
    Administration → System → Deployment
    [Screenshot: ISE Deployment Nodes Page]
  1. Confirm:
    • Primary PAN and Secondary PAN roles.
    • Primary MnT and Secondary MnT roles.
    • Multiple PSNs registered.

CLI Validation

On each node, run:

show application status ise

Ensure Application Server, Database Listener, and ISE Indexing Engine are running.
Note which node shows Primary and which shows Secondary.


Step 2 – Test Primary PAN Failure (Admin Node)

Why: PAN availability is critical for management + config pushes.

Simulation

On the Primary PAN, shut down the ISE application:

application stop ise

Or simulate network isolation:

interface gigabitEthernet 0
shutdown

GUI Validation

  1. From another console, log into Secondary PAN GUI.
  2. Go to:
    Administration → System → Deployment
    [Screenshot: Deployment showing Secondary PAN as Active]
  1. Confirm failover role promotion — Secondary PAN becomes Active.

CLI Validation

On the Secondary PAN:

show application status ise

Look for:

Primary: No
Secondary: Yes (Active)

Rollback:

application start ise

or bring the interface back up.


Step 3 – Test MnT Failover (Monitoring Node)

Why: MnT collects logs, RADIUS live sessions, reports.

Simulation

Suspend the VM for Primary MnT or shut down ISE process.

GUI Validation

  1. Navigate to:
    Operations → Reports → Authentications
    [Screenshot: Authentication Logs on Secondary MnT]
  1. Ensure reports/logs are still accessible.

CLI Validation

show logging system tail

Check that logs are being recorded on Secondary MnT.


Step 4 – Test PSN Failover (Policy Service Node)

Why: PSN failure impacts live authentication traffic (critical test).

Simulation

On one PSN:

application stop ise

GUI Validation

  1. On PAN GUI:
    Administration → System → Deployment
    [Screenshot: One PSN Down, Others Active]
  1. On WLC or Switch live environment, authenticate a client (Wired/Wireless).

CLI Validation

From ISE CLI:

show radius statistics

Verify that authentication requests are now handled by other PSNs.

On WLC/Switch CLI:

show aaa servers

Confirm failover from failed PSN to healthy PSN.


Step 5 – Session Persistence Test

Why: During failover, ongoing sessions must remain intact.

Test Procedure

  1. Connect a test endpoint (laptop/VM) to wired switch with 802.1X enabled.
  2. Authenticate successfully (dot1x/MAB).
  3. Simulate PSN failure as above.
  4. Monitor:
    • Endpoint should not get disconnected.
    • Session continues until re-authentication is triggered.

CLI Validation on Switch

show authentication sessions interface Gi1/0/10

Session should still show as Authorized.


Step 6 – Database Synchronization Test

Why: PAN/Secondary must sync configs.

GUI Validation

  1. On Secondary PAN GUI:
    Administration → System → Deployment → Synchronization Status
    [Screenshot: DB Sync Status Page]
  2. Confirm sync = Up to Date.

CLI Validation

On Secondary PAN:

show application status ise

Look for:
Database Replication State: Active


Step 7 – Report & Document Results

Why: In real deployments, you must prove HA works.

  1. Record screenshots of:
    • Deployment view during failover.
    • Authentication logs during PSN failover.
    • CLI outputs (show application status ise).
  2. Create a Failover Report:
    • Test performed
    • Expected outcome
    • Actual outcome
    • Pass/Fail

Expert-Level Use Cases

  1. Dual-Region ISE with GSLB
    • Site-local PSNs + Global VIP DNS proximity.
    • PAN/MnT anchored in primary DC; remote MnT for local log cache.
    • Drill: cut Region-A WAN → Region-B clients unaffected.
  2. PSN Blue/Green + Staggered Cert Rotation
    • Blue pool presents *.corpA.com, Green *.corpB.com during migration with dual-SANs.
    • LB weights shift after root/intermediate swap; zero portal warnings.
  3. Auth Partitioning by NAD Class
    • Separate VIPs/pools for WLC, Switch, VPN; tailored persistence timeouts and weights.
    • Stops Wi-Fi surges from starving wired/VPN.
  4. IoT/MAB High-Stickiness Window
    • Increase RADIUS persistence timeout for MAB to hours; reduce CoA storms for flappy devices.
  5. AD Fragility Shield
    • Policy fallback to EAP-TLS only during AD outage; admin banner alerts; automatic restore rule with time window.
  6. MnT Split-Write with External SIEM
    • Keep MnT logs minimal (short retention) and stream to SIEM for long-term; failover test validates no loss during MnT switch.
  7. Change-Window Guardrails
    • Real-time success-rate SLO in Grafana (ISE syslog + LB stats). If success <98.5% for 2 min → auto-rollback (re-enable drained PSNs).
  8. CoA Path Assurance
    • Pre-deploy ACLs from all PSN IPs to all NADs; nightly CoA synthetic probes validate UDP/3799 reachability.
  9. WAN-Aware Policy Sync Windows
    • Freeze policy publishes during known WAN maintenance; schedule PSN “config pull verification” after link up.
  10. Portal Isolation Zone
  • Dedicated HTTPS VIP/pool for Guest/BYOD portals with different TLS ciphers and WAF in front; RADIUS VIPs untouched.

CLI Reference

ISE (any node)

show application status ise
show replication status
show logging application ise-radius.log tail
show logging application ise-psc.log tail
show cpu ; show memory ; show disk

Catalyst Switch

show authentication sessions interface Gi1/0/10 details
show radius statistics
test aaa group radius <user> <pass> legacy

WLC 9800

show radius summary
show client detail <mac>
show wlan <id>

F5 BIG-IP (if used)

tmsh show ltm pool
tmsh show ltm virtual
tmsh list ltm monitor radius

Citrix ADC (if used)

show lb vserver
show servicegrp
show ns runningconfig | grep -i persist

FAQs – Cisco ISE HA & Failover Testing

FAQ 1. What happens if the Primary PAN fails during business hours?

  • Answer:
    • The Secondary PAN automatically takes over as the Active Admin Node.
    • All configuration changes must now be made on the Secondary.
    • GUI Validation:
      • Administration → System → Deployment → Check Secondary shows Active.
    • CLI Validation: show application status ise Look for Primary: No, Secondary: Active.

FAQ 2. Do authentications stop if both PAN nodes fail?

  • Answer:
    • No. PANs are management-only nodes, not involved in live authentications.
    • Authentications are handled by PSNs, so end-users won’t be impacted if PANs fail.
    • Impact: You cannot push policy/config changes until PAN is restored.

FAQ 3. How does ISE decide which MnT node becomes active?

  • Answer:
    • ISE supports Primary MnT and Secondary MnT roles.
    • If Primary MnT fails, Secondary automatically takes over logging/reporting.
    • GUI Validation:
      • Operations → Reports → Authentications → Ensure logs still appear.
    • CLI Validation: show logging system tail

FAQ 4. What happens to live user sessions if a PSN fails?

  • Answer:
    • Active sessions remain authorized until re-authentication (e.g., reauth timer, port bounce).
    • New authentication requests failover to other PSNs.
    • Switch CLI Validation: show authentication sessions interface Gi1/0/10 Session remains Authorized.

FAQ 5. Can ISE provide Active/Active Admin Nodes?

  • Answer:
    • No. ISE supports one Active PAN and one Standby PAN (Active/Standby).
    • Multi-admin concurrency is only for multiple admins connecting, not for both PANs managing at the same time.

FAQ 6. How do I test database replication between PANs?

  • Answer:
    • GUI Validation:
      • Administration → System → Deployment → Synchronization Status
      • Check “Up to Date”.
    • CLI Validation on Secondary PAN: show application status ise Look for Database Replication State: Active.

FAQ 7. Does load balancing apply to PSNs in HA?

  • Answer:
    • Yes. PSNs should be front-ended with Load Balancers (F5, Citrix ADC, DNS Round Robin).
    • This ensures seamless failover and distribution of RADIUS/TACACS requests.
    • Without LB, endpoints rely on RADIUS server lists configured on NADs (switches/WLC).

FAQ 8. How do I test TACACS+ failover in ISE?

  • Answer:
    • Configure multiple PSNs as TACACS servers on your device.
    • Shut down one PSN: application stop ise
    • Attempt device login.
    • Device CLI Validation: show aaa servers Device should failover to healthy PSN.

FAQ 9. What’s the difference between Node Failure and Network Failure testing?

  • Answer:
    • Node Failure: Stopping ISE services (application stop ise).
    • Network Failure: Disconnecting/shutting VM NIC → Simulates loss of connectivity.
    • Both should be tested to validate true resilience in production.

FAQ 10. How do I document HA testing for compliance audits?

  • Answer:
    • Capture:
      1. Screenshots of Deployment page before/during/after failover.
      2. CLI outputs (show application status ise, show authentication sessions).
      3. Authentication logs during PSN failover.
    • Record each test in a Failover Test Matrix (Test, Expected, Actual, Result).
    • Store reports for audit trails.

YouTube Link

For more in-depth Cisco ISE Mastery Training, subscribe to my YouTube channel Network Journey and join my instructor-led classes for hands-on, real-world ISE experience

[NEW COURSE ALERT] CISCO ISE (Identity Service Engine) by Sagar Dhawan
CCIE Security v6.1 Training – Ticket#1 Discussed
CCIE Security v6.1 – MAC Authentication Bypass (MAB) in Cisco ISE
CCNP to CCIE SECURITY v6.1 – New Online Batch

Closing Notes (Key Takeaways)

  • Build redundancy at every persona: PAN, MnT, PSN.
  • Use LB VIPs for PSN tier with health + persistence.
  • Practice repeatable failover runbooks; capture pass/fail evidence.
  • Verify accounting integrity and CoA success—not just “auth OK”.
  • Keep NTP/DNS/PKI/AD healthy; most “ISE HA” issues are dependencies.

Upgrade Your Skills – Start Today

For in-depth Cisco ISE Mastery Training, subscribe to Network Journey on YouTube and join my instructor-led classes.

Fast-Track to Cisco ISE Mastery Pro

Duration: 4 months (live)
You’ll master: Enterprise ISE design, PSN/LB at scale, Guest/BYOD, pxGrid & SGT, upgrades/DR, HA runbooks, TAC-grade troubleshooting.
Course outline & enrollment: https://course.networkjourney.com/ccie-security/
Next step: Book a readiness call, download the HA & Failover Test Pack, and reserve your seat.

Enroll Now & Future‑Proof Your Career
Emailinfo@networkjourney.com
WhatsApp / Call: +91 97395 21088