Day 104 – Cisco ISE Mastery Training: Disaster Recovery Drill.

[Day 104] Cisco ISE Mastery Training: Disaster Recovery Drill


Table of Contents

Introduction

In enterprise security, it’s not a question of if a disaster will happen — but when. Power outages, hardware failures, corrupted databases, or even cyber-attacks can bring down critical infrastructure. When Cisco ISE (Identity Services Engine) is at the center of your network access control strategy, a failure can paralyze authentication, authorization, and accounting across your entire organization. That means employees can’t log in, devices can’t join the network, and compliance audits start failing instantly.

This is why Disaster Recovery (DR) Drills in Cisco ISE are non-negotiable. While most engineers set up backups, replication, and redundancy, very few actually test and validate whether those systems will work during a real outage. In this session, we go far beyond theory — you’ll simulate a full disaster scenario, failover to backup nodes, validate recovery using GUI and CLI, and confirm zero data loss.

By the end of this Article, you won’t just know your ISE environment is redundant — you will have proven it under fire. This transforms you from an ISE administrator into a true ISE Disaster Recovery Master, ready for large-scale enterprise continuity planning.


Problem Statement (What we must solve)

  • Full site loss (PAN/MnT/PSN offline) → risk of auth blackout, lost logs, broken guest portals, failed CoA.
  • Config vs Operational data confusion → restores succeed but history/reporting is gone.
  • FQDN/IP changes & certificates → portals warn/break, pxGrid re-registration fails.
  • Version/patch mismatch → restore aborts.
  • External dependencies (AD/PKI/NTP/DNS/LB/VPN/WLC) not ready → auth errors although ISE is “up.”
    We need a deterministic drill: rebuild ISE in DR, reattach dependencies, validate all flows, within RTO/RPO.

Solution Overview (How ISE supports DR)

  • Backups:
    • Configuration (ise-config) → nodes, personas, policy, endpoints, guests, certificates metadata, etc.
    • Operational (ise-ops) → MnT logs, reports, RADIUS/TACACS records, posture, profiling data/history.
  • Repositories: SFTP/FTP/SMB repositories with encryption key/passphrase.
  • Restore Order: Build PAN (same version/patch) → restore config → join/restore MnT (ops) → add PSNs → attach LB VIPs/DNS → validate NADs.
  • Identity/Trust: NTP/DNS/PKI/AD reachability; import the right cert chain; portal CN/SAN alignment.
  • Cutover: GSLB/DNS or LB VIP swing; confirm CoA path and accounting integrity.

Sample Lab Topology (VMware/EVE-NG + NADs/WLC/Endpoints)

Sites

  • DC-A (Primary): PAN-A, MNT-A, PSN-A1/A2, LB-A (F5/Citrix), AD-A, PKI-A, NTP/DNS-A
  • DC-B (DR): PAN-B, MNT-B, PSN-B1/B2, LB-B, AD-B (or reachable AD), PKI-B, NTP/DNS-B

NADs / Access

  • Catalyst 9300 (802.1X/MAB), WLC 9800 (CWA/BYOD), ASA/FTD VPN headend

Endpoints

  • Win11 (EAP-TLS), iOS/Android (Guest/BYOD), IoT (MAB), Admin jump host

Topology Diagram:


Step-by-Step GUI Configuration Guide (with CLI)

Goal: Execute a full DR drill end-to-end. Follow Phases 0-6. Capture evidence at each step.

Evidence Pack Template (use for every step)

  • Screenshot(s): [insert placeholders below]
  • CLI outputs: command + timestamp
  • KPI: RTO (time to auth restore), RPO (data currency), Success rate, Failures
  • Decision: PASS/FAIL, remediation

Phase 0 — Pre-Drill Baseline & Snapshot (in DC-A, before failure)

0.1 Verify cluster health

GUI: Administration → System → Deployment → All nodes Registered/Reachable; Replication In Sync.
[Screenshot: Deployment Status / Replication]

CLI (all nodes):

show application status ise
show cpu ; show memory ; show disk
show ntp

0.2 Verify AD/PKI/DNS/NTP

GUI: AD: Administration → Identity Management → External Identity Sources → Active DirectoryJoined / Connectivity OK.
GUI: Certificates: Administration → System → Certificates → System Certificates → EAP/Portal/Admin certs valid; chain installed.
[Screenshot: AD Join OK] [Screenshot: Cert list]

0.3 Create/Validate Repositories

GUI: Administration → System → Maintenance → Repositories → Add SFTP repo (test connectivity).


CLI (PAN-A):

configure terminal
repository SFTP_DR
 url sftp://10.10.50.10/ISEBackups
 user backupuser password plain <SFTP_PASS>
 exit
show repository SFTP_DR

0.4 Take fresh backups (time-stamped)

GUI: Administration → System → Backup & Restore → Backup

  • ise-config → repository SFTP_DRencryption key <ENC_KEY>
  • ise-ops (from MNT-A) → repository SFTP_DR → same <ENC_KEY>
    [Screenshot: Backup Jobs Success]

CLI Alternative (PAN-A/MNT-A):

backup ise-config ise_cfg_$(date +%F_%H%M) repository SFTP_DR encryption-key <ENC_KEY>
# On MnT
backup ise-ops   ise_ops_$(date +%F_%H%M) repository SFTP_DR encryption-key <ENC_KEY>

0.5 Record baselines (for later comparison)

  • Live Auth Success Rate (e.g., last 60 min)
  • Latest accounting sequence
  • Guest/BYOD active sessions
    [Screenshot: Live Logs / Reports]

Phase 1 — Simulate Disaster in DC-A (Primary site down)

1.1 Trigger failure (choose one)

  • Power-off PAN-A, MNT-A, PSN-A1/A2, or
  • Blackhole DC-A routing, or
  • Disable vNICs (for isolation).

CLI Proof (remote ping/trace): Targets unreachable.

Expected: Clients start failing if traffic still targets DC-A. RTO clock starts now.


Phase 2 — Bring Up DR PAN (PAN-B) and Restore Configuration

Must match ISE major/minor version and patch before restore.

2.1 Deploy PAN-B VM (same resources/version/patch)

  • Install OVA/ISO in VMware/EVE-NG.
  • Initial CLI setup (hostname, IP, DNS/NTP, domain search).

CLI (PAN-B):

show version
show timezone
show ntp
ping <SFTP_DR> ; ping <AD> ; ping <DNS>

2.2 Import CA chains & Admin/EAP/Portal certs (if reusing FQDNs)

GUI: Administration → System → Certificates → Trusted Certificates → import roots/intermediates.
GUI: System Certificates → import Admin/EAP/Portal certs (match FQDN strategy).
[Screenshot: Cert Import Complete]

2.3 Restore Configuration Backup to PAN-B

GUI: Administration → System → Backup & Restore → Restore

  • Select ise-config backup from SFTP_DR
  • Provide **encryption key** <ENC_KEY>`
  • Confirm restore → node reboots.

CLI Alternative (PAN-B):

repository SFTP_DR
 url sftp://10.10.50.10/ISEBackups
 user backupuser password plain <SFTP_PASS>
 exit
restore ise-config ise_cfg_YYYY-MM-DD_HHMM repository SFTP_DR encryption-key <ENC_KEY>

2.4 Post-restore validation (PAN-B)

CLI:

show application status ise
show logging application ise-psc.log tail
show replication status

GUI: Deployment should now show PAN-B as Primary Admin (solo).
[Screenshot: Deployment with PAN-B Primary]

2.5 Re-establish AD join (if needed)

GUI: AD page → Join with service account; test connectivity.
CLI Tail:

show logging application ise-identity.log tail

Phase 3 — Build MnT in DR and Restore Operational Data

3.1 Register MNT-B (Monitoring)

GUI (PAN-B): Administration → System → Deployment → Register Node

  • Add MNT-B with Monitoring persona (Active).
    [Screenshot: Register MnT]

CLI (MNT-B):

show application status ise

3.2 Restore Operational Backup on MNT-B

GUI (PAN-B → MnT UI or MNT-B): Backup & Restore → Restore → select ise-ops backup from SFTP_DR with <ENC_KEY>.
CLI Alternative (MNT-B):

repository SFTP_DR
restore ise-ops ise_ops_YYYY-MM-DD_HHMM repository SFTP_DR encryption-key <ENC_KEY>

Validate:

  • Operations → Reports → Authentications → historic data present.
  • Operations → RADIUS → Live Logs → new logs appear after PSNs join.
    [Screenshot: Reports populated]

Phase 4 — Stand Up PSNs in DR and Reattach Access Network

4.1 Deploy PSN-B1/B2 VMs (same version/patch)

CLI (each PSN):

show version
show ntp ; show dns

4.2 Register PSNs to PAN-B

GUI (PAN-B): Deployment → Register Node → add PSN-B1/B2 with Policy Service persona.
[Screenshot: Register PSN-B1/B2]

Validate (CLI on PSN):

show application status ise

Processes running; Replication In Sync later.

4.3 Certificates for PSNs (EAP/Portal as needed)

GUI: Import/install EAP/Portal certs on PSNs (CN/SAN must match LB VIP FQDN strategy if SSL offload not used).
[Screenshot: PSN System Certificates]

4.4 Load Balancer in DR (F5/Citrix)

  • Pools: PSN-B1/B2 members for 1812/1813/443
  • Monitors: RADIUS/HTTPS
  • Persistence: Source-IP (RADIUS) / Cookie (HTTPS)
  • VIPs: rad-vip.dr.corp:1812/1813, portal-vip.dr.corp:443
    [Screenshot: LB Pools/VIPs Up]

LB CLI quick checks

  • F5: tmsh show ltm pool tmsh show ltm virtual
  • Citrix: show lb vserver show serviceGroup

4.5 DNS/GSLB swing (or device re-point)

  • Preferred: Update DNS records (short TTL) so NADs/WLC/VPN resolve VIP FQDN to DR VIP IPs.
  • Fallback: Change NAD RADIUS server IPs to DR VIPs.
    Validate (NAD CLI):
ping rad-vip.dr.corp

Phase 5 — End-to-End Authentication & Services Validation (Critical)

Run all tests; record timestamps; compare to baseline.

5.1 Wired 802.1X (EAP-TLS)

Switch CLI:

test aaa group radius <user> <pass> legacy
show authentication sessions interface Gi1/0/10 details
show radius statistics

ISE GUI: Live Logs → Access-Accept on PSN-B via RADIUS VIP.
[Screenshot: Live Logs Wired]

5.2 Wireless (PEAP/EAP-TLS) via WLC 9800 + CWA redirect

WLC CLI:

show radius summary
show client mac <mac>

Portal: Browse https://portal-vip.dr.corp/guestportal → no cert warnings; login succeeds.
[Screenshot: Guest Portal Landing → Success]

5.3 VPN (ASA/FTD) AAA

ASA/FTD CLI:

test aaa-server authentication <server-group> host <VIP-IP> username <u> password <p>

ISE GUI: Live Logs show VPN device as NAD; Accept.
[Screenshot: Live Logs VPN]

5.4 CoA / DACL Enforcement

Trigger a rule causing CoA Reauth.
Switch CLI:

debug radius
show ip access-lists <dacl-name>

Validate: CoA packets source from PSN-B IPs (not LB); DACL applied.

5.5 Accounting Integrity

ISE GUI: Accounting Logs show Start → Interim → Stop continuity; NAS-Session-ID consistent.
SIEM: Receives MnT syslog stream.
[Screenshot: Accounting Sequence]

5.6 Profiling & Posture

  • AnyConnect posture flow completes; compliance changes hit correct authz results.
  • ISE GUI: Posture Dashboard updates; Profiler sees DHCP/CDP/HTTP probes.
    [Screenshot: Posture / Profiler]

Phase 6 — Acceptance, Metrics, and Evidence Closure

6.1 Acceptance Criteria (tick all)

  • RTO ≤ target (e.g., ≤ 60 min to stable authentications)
  • RPO ≤ target (e.g., ≤ 15 min config; ≤ 60 min ops data)
  • Wired/Wi-Fi/VPN success rate within baseline ±1%
  • CoA success ≥ 99%
  • Accounting completeness verified; no orphaned starts
  • No portal cert warnings; CN/SAN correct
  • Replication In Sync across PAN↔PSNs and PAN↔MnT
  • AD/NTP/DNS/PKI connectivity green

6.2 Evidence Pack (attach)

  • Screenshots: Deployment, Certs, Live Logs, Reports, LB Pools, NAD tests
  • CLI: show application status ise, show replication status, show radius statistics, LB outputs
  • DNS/GSLB change records
  • Timeline with start/stop timestamps, final RTO/RPO

Expert-Level Use Cases (for consultants & SREs)

  1. Blue/Green DR with Dual-Signed Portals
    • Keep two PSN pools with different certs (SAN includes old+new FQDN); cut DNS with 30-sec TTL; zero browser warnings.
  2. Cloud DR (ISE in Azure/AWS) with ExpressRoute/DirectConnect
    • Pre-deployed PAN/MnT cold standby images; scripted PSN bring-up via Terraform/Ansible; attach to cloud LB; on trigger, restore and scale out.
  3. GSLB-Driven Multi-Region Active/Standby
    • PSNs active in both regions; PAN/MnT anchored in Primary; health-based DNS failover with site proximity and NAD location tagging.
  4. Immutable PSNs + Golden AMI
    • Bake ISE patch + base cert trust into a hardened image; DR means “spawn PSNs,” register via API; minutes to capacity.
  5. Ops Data Offload to SIEM with Near-Zero RPO
    • Stream MnT to SIEM; in DR, restore config only and point to SIEM for history; eliminates large ops restores.
  6. Automated NAD Re-Point
    • Ansible/Prime/DNAC job to flip RADIUS servers to DR VIP; gather post-change show radius statistics and auto rollback if failures > threshold.
  7. CoA Path Assurance Matrix
    • Pre-permit CoA (UDP/3799) from all PSN subnets to all NADs; synthetic CoA pings nightly; DR switch revalidates matrix.
  8. PKI Cutover in DR
    • If CA differs in DR, pre-publish intermediates to endpoints and NADs; rotate EAP/Portal certs with dual trust windows.
  9. Split MnT Roles for High-Ingest Events
    • During DR, temporarily scale MnT horizontally and shunt high-volume accounting to a dedicated MnT to avoid back-pressure.
  10. WAN-Partition Resilience Test
  • Simulate PAN unreachable from remote PSNs (but NAD reachable). Show continuous auth with cached policy, and delayed log forwarding on reconvergence.

Quick CLI Reference

PAN/MnT/PSN

show application status ise
show version
show replication status
show logging application ise-psc.log tail
show logging application ise-identity.log tail
show ntp ; show clock detail ; show dns
configure terminal
 repository SFTP_DR
  url sftp://10.10.50.10/ISEBackups
  user backupuser password plain <SFTP_PASS>
  exit
backup ise-config <name> repository SFTP_DR encryption-key <ENC_KEY>
backup ise-ops    <name> repository SFTP_DR encryption-key <ENC_KEY>
restore ise-config <name> repository SFTP_DR encryption-key <ENC_KEY>
restore ise-ops    <name> repository SFTP_DR encryption-key <ENC_KEY>
application stop ise
application start ise

Catalyst Switch

test aaa group radius <user> <pass> legacy
show authentication sessions interface Gi1/0/10 details
show radius statistics

WLC 9800

show radius summary
show client detail <mac>

F5 BIG-IP

tmsh show ltm pool
tmsh show ltm virtual

Citrix ADC

show lb vserver
show serviceGroup

FAQs – Cisco ISE Disaster Recovery Drill

FAQ 1. How often should I perform a Disaster Recovery Drill in ISE?

Best practice is at least once every 6–12 months, or after major changes such as:

  • ISE upgrades/patches
  • Addition of new PSNs/administration nodes
  • Change in backup storage or replication design
  • Enterprise DR policy changes
    Some regulated industries (finance, healthcare, government) require quarterly DR testing.

FAQ 2. What components of Cisco ISE are covered in a DR Drill?

A proper drill validates:

  • Administration Node Failover (Primary → Secondary)
  • Monitoring & Logging Node Recovery (MNT sync + reports)
  • Policy Service Node Failover (active authentication/authorization traffic)
  • Database Restoration (backup restore test)
  • Network Device Redundancy (switches, WLCs, routers pointing to backup PSNs)

FAQ 3. How do I verify that replication is healthy before a drill?

Use both GUI and CLI:

  • GUIAdministration > System > Deployment → check replication status (green ✔️).
  • CLIshow logging application ise-psc.log and show application status ise.
    Replication must be in In Sync state before initiating a drill.

FAQ 4. What is the most common mistake engineers make during ISE DR drills?

The #1 mistake: forgetting to update network devices’ RADIUS server lists.

  • Devices (Switches, WLCs, Firewalls) must point to both primary and backup PSNs.
  • Otherwise, when the primary PSN fails, endpoints lose access.

FAQ 5. Can I run DR drills during production hours?

Not recommended .

  • Failover tests cause temporary disruptions (replication pauses, admin logouts, PSN authentication delays).
  • Always schedule drills during maintenance windows with stakeholder approvals.

FAQ 6. How do I simulate a real disaster in ISE?

Examples of realistic simulations:

  • Power Down Test → shut down the primary admin node VM.
  • Network Isolation Test → disconnect NIC of PSN to simulate network outage.
  • Database Corruption Test → attempt a restore from backup.
  • Logging Node Test → stop MNT services and validate report continuity.

FAQ 7. What happens to live authentications during failover?

  • If PSNs are redundant → active sessions continue; new logins hit the backup PSN.
  • If Admin node fails → authentications still work (PSNs don’t depend on PAN).
  • If MNT fails → no disruption in access, but logs are lost until recovery.

FAQ 8. How do I validate success after the DR drill?

Validation Checklist :

  1. Users can authenticate via backup PSN.
  2. Backup Admin node is writable (policy changes allowed).
  3. Logs are replicated to backup MNT.
  4. CLI → show application status ise (all services running).
  5. Post-restore backup matches pre-disaster snapshot.

FAQ 9. Where should I store Cisco ISE backups for DR?

  • Use remote secure storage (SFTP, NFS, or cloud storage).
  • Avoid keeping backups on the same data center as ISE nodes.
  • Encrypt backups with a strong key, and maintain offsite copies for compliance.

FAQ 10. Can I automate Disaster Recovery in Cisco ISE?

Yes — partial automation is possible:

  • Scripts/Playbooks → for backup scheduling and verification (application backup).
  • F5 / Citrix ADC Load Balancer → to automate PSN failover for RADIUS/TACACS+.
  • SIEM/SOAR integration → can trigger failover alerts & automation in DR scenarios.
    But a manual drill is always required to ensure real-world readiness.

YouTube Link

For more in-depth Cisco ISE Mastery Training, subscribe to my YouTube channel Network Journey and join my instructor-led classes for hands-on, real-world ISE experience

[NEW COURSE ALERT] CISCO ISE (Identity Service Engine) by Sagar Dhawan
CCIE Security v6.1 Training – Ticket#1 Discussed
CCIE Security v6.1 – MAC Authentication Bypass (MAB) in Cisco ISE
CCNP to CCIE SECURITY v6.1 – New Online Batch

Closing Notes (Key takeaways)

  • Backups are not DR; restores + validations are.
  • Restore Config on PAN, Ops on MnT, then rebuild PSNs and swing VIPs/DNS.
  • Obsess over time (NTP) and trust (PKI)—most “ISE DR issues” are dependency failures.
  • Capture an evidence pack and measure RTO/RPO every drill.

Upgrade Your Skills – Start Today

For more in-depth Cisco ISE Mastery Training, subscribe to Network Journey on YouTube and join my instructor-led classes.

Fast-Track to Cisco ISE Mastery Pro

Duration: 4 months (live)
You’ll master: Enterprise ISE design at scale, HA/DR, advanced policy, Guest/BYOD, pxGrid/SGT, upgrades/migrations, load-balancing/GSLB, TAC-grade troubleshooting, evidence-driven runbooks.
Course outline & enrollment: https://course.networkjourney.com/ccie-security/
Next action: Book a readiness call, get the DR Drill Pack (templates, scripts), and reserve your cohort seat.

Enroll Now & Future‑Proof Your Career
Emailinfo@networkjourney.com
WhatsApp / Call: +91 97395 21088