[Day 104] Cisco ISE Mastery Training: Disaster Recovery Drill
Table of Contents
Introduction
In enterprise security, it’s not a question of if a disaster will happen — but when. Power outages, hardware failures, corrupted databases, or even cyber-attacks can bring down critical infrastructure. When Cisco ISE (Identity Services Engine) is at the center of your network access control strategy, a failure can paralyze authentication, authorization, and accounting across your entire organization. That means employees can’t log in, devices can’t join the network, and compliance audits start failing instantly.
This is why Disaster Recovery (DR) Drills in Cisco ISE are non-negotiable. While most engineers set up backups, replication, and redundancy, very few actually test and validate whether those systems will work during a real outage. In this session, we go far beyond theory — you’ll simulate a full disaster scenario, failover to backup nodes, validate recovery using GUI and CLI, and confirm zero data loss.
By the end of this Article, you won’t just know your ISE environment is redundant — you will have proven it under fire. This transforms you from an ISE administrator into a true ISE Disaster Recovery Master, ready for large-scale enterprise continuity planning.
Problem Statement (What we must solve)
- Full site loss (PAN/MnT/PSN offline) → risk of auth blackout, lost logs, broken guest portals, failed CoA.
- Config vs Operational data confusion → restores succeed but history/reporting is gone.
- FQDN/IP changes & certificates → portals warn/break, pxGrid re-registration fails.
- Version/patch mismatch → restore aborts.
- External dependencies (AD/PKI/NTP/DNS/LB/VPN/WLC) not ready → auth errors although ISE is “up.”
We need a deterministic drill: rebuild ISE in DR, reattach dependencies, validate all flows, within RTO/RPO.
Solution Overview (How ISE supports DR)
- Backups:
- Configuration (ise-config) → nodes, personas, policy, endpoints, guests, certificates metadata, etc.
- Operational (ise-ops) → MnT logs, reports, RADIUS/TACACS records, posture, profiling data/history.
- Repositories: SFTP/FTP/SMB repositories with encryption key/passphrase.
- Restore Order: Build PAN (same version/patch) → restore config → join/restore MnT (ops) → add PSNs → attach LB VIPs/DNS → validate NADs.
- Identity/Trust: NTP/DNS/PKI/AD reachability; import the right cert chain; portal CN/SAN alignment.
- Cutover: GSLB/DNS or LB VIP swing; confirm CoA path and accounting integrity.
Sample Lab Topology (VMware/EVE-NG + NADs/WLC/Endpoints)
Sites
- DC-A (Primary):
PAN-A
,MNT-A
,PSN-A1/A2
, LB-A (F5/Citrix), AD-A, PKI-A, NTP/DNS-A - DC-B (DR):
PAN-B
,MNT-B
,PSN-B1/B2
, LB-B, AD-B (or reachable AD), PKI-B, NTP/DNS-B
NADs / Access
- Catalyst 9300 (802.1X/MAB), WLC 9800 (CWA/BYOD), ASA/FTD VPN headend
Endpoints
- Win11 (EAP-TLS), iOS/Android (Guest/BYOD), IoT (MAB), Admin jump host
Topology Diagram:

Step-by-Step GUI Configuration Guide (with CLI)
Goal: Execute a full DR drill end-to-end. Follow Phases 0-6. Capture evidence at each step.
Evidence Pack Template (use for every step)
- Screenshot(s): [insert placeholders below]
- CLI outputs: command + timestamp
- KPI: RTO (time to auth restore), RPO (data currency), Success rate, Failures
- Decision: PASS/FAIL, remediation
Phase 0 — Pre-Drill Baseline & Snapshot (in DC-A, before failure)
0.1 Verify cluster health
GUI: Administration → System → Deployment
→ All nodes Registered/Reachable; Replication In Sync.
[Screenshot: Deployment Status / Replication]

CLI (all nodes):
show application status ise show cpu ; show memory ; show disk show ntp
0.2 Verify AD/PKI/DNS/NTP
GUI: AD: Administration → Identity Management → External Identity Sources → Active Directory
→ Joined / Connectivity OK.
GUI: Certificates: Administration → System → Certificates → System Certificates
→ EAP/Portal/Admin certs valid; chain installed.
[Screenshot: AD Join OK] [Screenshot: Cert list]


0.3 Create/Validate Repositories
GUI: Administration → System → Maintenance → Repositories
→ Add SFTP repo (test connectivity).

CLI (PAN-A):
configure terminal repository SFTP_DR url sftp://10.10.50.10/ISEBackups user backupuser password plain <SFTP_PASS> exit show repository SFTP_DR
0.4 Take fresh backups (time-stamped)
GUI: Administration → System → Backup & Restore → Backup
- ise-config → repository
SFTP_DR
→ encryption key<ENC_KEY>
- ise-ops (from MNT-A) → repository
SFTP_DR
→ same<ENC_KEY>
[Screenshot: Backup Jobs Success]

CLI Alternative (PAN-A/MNT-A):
backup ise-config ise_cfg_$(date +%F_%H%M) repository SFTP_DR encryption-key <ENC_KEY> # On MnT backup ise-ops ise_ops_$(date +%F_%H%M) repository SFTP_DR encryption-key <ENC_KEY>
0.5 Record baselines (for later comparison)
- Live Auth Success Rate (e.g., last 60 min)
- Latest accounting sequence
- Guest/BYOD active sessions
[Screenshot: Live Logs / Reports]
Phase 1 — Simulate Disaster in DC-A (Primary site down)
1.1 Trigger failure (choose one)
- Power-off
PAN-A
,MNT-A
,PSN-A1/A2
, or - Blackhole DC-A routing, or
- Disable vNICs (for isolation).
CLI Proof (remote ping/trace): Targets unreachable.
Expected: Clients start failing if traffic still targets DC-A. RTO clock starts now.
Phase 2 — Bring Up DR PAN (PAN-B) and Restore Configuration
Must match ISE major/minor version and patch before restore.
2.1 Deploy PAN-B VM (same resources/version/patch)
- Install OVA/ISO in VMware/EVE-NG.
- Initial CLI setup (hostname, IP, DNS/NTP, domain search).
CLI (PAN-B):
show version show timezone show ntp ping <SFTP_DR> ; ping <AD> ; ping <DNS>
2.2 Import CA chains & Admin/EAP/Portal certs (if reusing FQDNs)
GUI: Administration → System → Certificates → Trusted Certificates
→ import roots/intermediates.
GUI: System Certificates
→ import Admin/EAP/Portal certs (match FQDN strategy).
[Screenshot: Cert Import Complete]

2.3 Restore Configuration Backup to PAN-B
GUI: Administration → System → Backup & Restore → Restore

- Select ise-config backup from
SFTP_DR
- Provide **encryption key
**
<ENC_KEY>` - Confirm restore → node reboots.
CLI Alternative (PAN-B):
repository SFTP_DR url sftp://10.10.50.10/ISEBackups user backupuser password plain <SFTP_PASS> exit restore ise-config ise_cfg_YYYY-MM-DD_HHMM repository SFTP_DR encryption-key <ENC_KEY>
2.4 Post-restore validation (PAN-B)
CLI:
show application status ise show logging application ise-psc.log tail show replication status
GUI: Deployment
should now show PAN-B as Primary Admin (solo).
[Screenshot: Deployment with PAN-B Primary]
2.5 Re-establish AD join (if needed)
GUI: AD page → Join with service account; test connectivity.
CLI Tail:
show logging application ise-identity.log tail
Phase 3 — Build MnT in DR and Restore Operational Data
3.1 Register MNT-B (Monitoring)
GUI (PAN-B): Administration → System → Deployment → Register Node
- Add
MNT-B
with Monitoring persona (Active).
[Screenshot: Register MnT]

CLI (MNT-B):
show application status ise
3.2 Restore Operational Backup on MNT-B
GUI (PAN-B → MnT UI or MNT-B): Backup & Restore → Restore
→ select ise-ops backup from SFTP_DR
with <ENC_KEY>
.
CLI Alternative (MNT-B):
repository SFTP_DR restore ise-ops ise_ops_YYYY-MM-DD_HHMM repository SFTP_DR encryption-key <ENC_KEY>
Validate:
- Operations → Reports → Authentications → historic data present.
- Operations → RADIUS → Live Logs → new logs appear after PSNs join.
[Screenshot: Reports populated]
Phase 4 — Stand Up PSNs in DR and Reattach Access Network
4.1 Deploy PSN-B1/B2 VMs (same version/patch)
CLI (each PSN):
show version show ntp ; show dns
4.2 Register PSNs to PAN-B
GUI (PAN-B): Deployment → Register Node
→ add PSN-B1/B2 with Policy Service persona.
[Screenshot: Register PSN-B1/B2]

Validate (CLI on PSN):
show application status ise
Processes running; Replication In Sync later.
4.3 Certificates for PSNs (EAP/Portal as needed)
GUI: Import/install EAP/Portal certs on PSNs (CN/SAN must match LB VIP FQDN strategy if SSL offload not used).
[Screenshot: PSN System Certificates]
4.4 Load Balancer in DR (F5/Citrix)
- Pools: PSN-B1/B2 members for 1812/1813/443
- Monitors: RADIUS/HTTPS
- Persistence: Source-IP (RADIUS) / Cookie (HTTPS)
- VIPs:
rad-vip.dr.corp:1812/1813
,portal-vip.dr.corp:443
[Screenshot: LB Pools/VIPs Up]
LB CLI quick checks
- F5:
tmsh show ltm pool tmsh show ltm virtual
- Citrix:
show lb vserver show serviceGroup
4.5 DNS/GSLB swing (or device re-point)
- Preferred: Update DNS records (short TTL) so NADs/WLC/VPN resolve VIP FQDN to DR VIP IPs.
- Fallback: Change NAD RADIUS server IPs to DR VIPs.
Validate (NAD CLI):
ping rad-vip.dr.corp
Phase 5 — End-to-End Authentication & Services Validation (Critical)
Run all tests; record timestamps; compare to baseline.
5.1 Wired 802.1X (EAP-TLS)
Switch CLI:
test aaa group radius <user> <pass> legacy show authentication sessions interface Gi1/0/10 details show radius statistics
ISE GUI: Live Logs → Access-Accept on PSN-B via RADIUS VIP.
[Screenshot: Live Logs Wired]
5.2 Wireless (PEAP/EAP-TLS) via WLC 9800 + CWA redirect
WLC CLI:
show radius summary show client mac <mac>
Portal: Browse https://portal-vip.dr.corp/guestportal
→ no cert warnings; login succeeds.
[Screenshot: Guest Portal Landing → Success]
5.3 VPN (ASA/FTD) AAA
ASA/FTD CLI:
test aaa-server authentication <server-group> host <VIP-IP> username <u> password <p>
ISE GUI: Live Logs show VPN device as NAD; Accept.
[Screenshot: Live Logs VPN]
5.4 CoA / DACL Enforcement
Trigger a rule causing CoA Reauth.
Switch CLI:
debug radius show ip access-lists <dacl-name>
Validate: CoA packets source from PSN-B IPs (not LB); DACL applied.
5.5 Accounting Integrity
ISE GUI: Accounting Logs show Start → Interim → Stop continuity; NAS-Session-ID consistent.
SIEM: Receives MnT syslog stream.
[Screenshot: Accounting Sequence]
5.6 Profiling & Posture
- AnyConnect posture flow completes; compliance changes hit correct authz results.
- ISE GUI: Posture Dashboard updates; Profiler sees DHCP/CDP/HTTP probes.
[Screenshot: Posture / Profiler]
Phase 6 — Acceptance, Metrics, and Evidence Closure
6.1 Acceptance Criteria (tick all)
- RTO ≤ target (e.g., ≤ 60 min to stable authentications)
- RPO ≤ target (e.g., ≤ 15 min config; ≤ 60 min ops data)
- Wired/Wi-Fi/VPN success rate within baseline ±1%
- CoA success ≥ 99%
- Accounting completeness verified; no orphaned starts
- No portal cert warnings; CN/SAN correct
- Replication In Sync across PAN↔PSNs and PAN↔MnT
- AD/NTP/DNS/PKI connectivity green
6.2 Evidence Pack (attach)
- Screenshots: Deployment, Certs, Live Logs, Reports, LB Pools, NAD tests
- CLI:
show application status ise
,show replication status
,show radius statistics
, LB outputs - DNS/GSLB change records
- Timeline with start/stop timestamps, final RTO/RPO
Expert-Level Use Cases (for consultants & SREs)
- Blue/Green DR with Dual-Signed Portals
- Keep two PSN pools with different certs (SAN includes old+new FQDN); cut DNS with 30-sec TTL; zero browser warnings.
- Cloud DR (ISE in Azure/AWS) with ExpressRoute/DirectConnect
- Pre-deployed PAN/MnT cold standby images; scripted PSN bring-up via Terraform/Ansible; attach to cloud LB; on trigger, restore and scale out.
- GSLB-Driven Multi-Region Active/Standby
- PSNs active in both regions; PAN/MnT anchored in Primary; health-based DNS failover with site proximity and NAD location tagging.
- Immutable PSNs + Golden AMI
- Bake ISE patch + base cert trust into a hardened image; DR means “spawn PSNs,” register via API; minutes to capacity.
- Ops Data Offload to SIEM with Near-Zero RPO
- Stream MnT to SIEM; in DR, restore config only and point to SIEM for history; eliminates large ops restores.
- Automated NAD Re-Point
- Ansible/Prime/DNAC job to flip RADIUS servers to DR VIP; gather post-change
show radius statistics
and auto rollback if failures > threshold.
- Ansible/Prime/DNAC job to flip RADIUS servers to DR VIP; gather post-change
- CoA Path Assurance Matrix
- Pre-permit CoA (UDP/3799) from all PSN subnets to all NADs; synthetic CoA pings nightly; DR switch revalidates matrix.
- PKI Cutover in DR
- If CA differs in DR, pre-publish intermediates to endpoints and NADs; rotate EAP/Portal certs with dual trust windows.
- Split MnT Roles for High-Ingest Events
- During DR, temporarily scale MnT horizontally and shunt high-volume accounting to a dedicated MnT to avoid back-pressure.
- WAN-Partition Resilience Test
- Simulate PAN unreachable from remote PSNs (but NAD reachable). Show continuous auth with cached policy, and delayed log forwarding on reconvergence.
Quick CLI Reference
PAN/MnT/PSN
show application status ise show version show replication status show logging application ise-psc.log tail show logging application ise-identity.log tail show ntp ; show clock detail ; show dns configure terminal repository SFTP_DR url sftp://10.10.50.10/ISEBackups user backupuser password plain <SFTP_PASS> exit backup ise-config <name> repository SFTP_DR encryption-key <ENC_KEY> backup ise-ops <name> repository SFTP_DR encryption-key <ENC_KEY> restore ise-config <name> repository SFTP_DR encryption-key <ENC_KEY> restore ise-ops <name> repository SFTP_DR encryption-key <ENC_KEY> application stop ise application start ise
Catalyst Switch
test aaa group radius <user> <pass> legacy show authentication sessions interface Gi1/0/10 details show radius statistics
WLC 9800
show radius summary show client detail <mac>
F5 BIG-IP
tmsh show ltm pool tmsh show ltm virtual
Citrix ADC
show lb vserver show serviceGroup
FAQs – Cisco ISE Disaster Recovery Drill
FAQ 1. How often should I perform a Disaster Recovery Drill in ISE?
Best practice is at least once every 6–12 months, or after major changes such as:
- ISE upgrades/patches
- Addition of new PSNs/administration nodes
- Change in backup storage or replication design
- Enterprise DR policy changes
Some regulated industries (finance, healthcare, government) require quarterly DR testing.
FAQ 2. What components of Cisco ISE are covered in a DR Drill?
A proper drill validates:
- Administration Node Failover (Primary → Secondary)
- Monitoring & Logging Node Recovery (MNT sync + reports)
- Policy Service Node Failover (active authentication/authorization traffic)
- Database Restoration (backup restore test)
- Network Device Redundancy (switches, WLCs, routers pointing to backup PSNs)
FAQ 3. How do I verify that replication is healthy before a drill?
Use both GUI and CLI:
- GUI → Administration > System > Deployment → check replication status (green ✔️).
- CLI →
show logging application ise-psc.log
andshow application status ise
.
Replication must be in In Sync state before initiating a drill.
FAQ 4. What is the most common mistake engineers make during ISE DR drills?
The #1 mistake: forgetting to update network devices’ RADIUS server lists.
- Devices (Switches, WLCs, Firewalls) must point to both primary and backup PSNs.
- Otherwise, when the primary PSN fails, endpoints lose access.
FAQ 5. Can I run DR drills during production hours?
Not recommended .
- Failover tests cause temporary disruptions (replication pauses, admin logouts, PSN authentication delays).
- Always schedule drills during maintenance windows with stakeholder approvals.
FAQ 6. How do I simulate a real disaster in ISE?
Examples of realistic simulations:
- Power Down Test → shut down the primary admin node VM.
- Network Isolation Test → disconnect NIC of PSN to simulate network outage.
- Database Corruption Test → attempt a restore from backup.
- Logging Node Test → stop MNT services and validate report continuity.
FAQ 7. What happens to live authentications during failover?
- If PSNs are redundant → active sessions continue; new logins hit the backup PSN.
- If Admin node fails → authentications still work (PSNs don’t depend on PAN).
- If MNT fails → no disruption in access, but logs are lost until recovery.
FAQ 8. How do I validate success after the DR drill?
Validation Checklist :
- Users can authenticate via backup PSN.
- Backup Admin node is writable (policy changes allowed).
- Logs are replicated to backup MNT.
- CLI →
show application status ise
(all services running). - Post-restore backup matches pre-disaster snapshot.
FAQ 9. Where should I store Cisco ISE backups for DR?
- Use remote secure storage (SFTP, NFS, or cloud storage).
- Avoid keeping backups on the same data center as ISE nodes.
- Encrypt backups with a strong key, and maintain offsite copies for compliance.
FAQ 10. Can I automate Disaster Recovery in Cisco ISE?
Yes — partial automation is possible:
- Scripts/Playbooks → for backup scheduling and verification (
application backup
). - F5 / Citrix ADC Load Balancer → to automate PSN failover for RADIUS/TACACS+.
- SIEM/SOAR integration → can trigger failover alerts & automation in DR scenarios.
But a manual drill is always required to ensure real-world readiness.
YouTube Link
For more in-depth Cisco ISE Mastery Training, subscribe to my YouTube channel Network Journey and join my instructor-led classes for hands-on, real-world ISE experience
Closing Notes (Key takeaways)
- Backups are not DR; restores + validations are.
- Restore Config on PAN, Ops on MnT, then rebuild PSNs and swing VIPs/DNS.
- Obsess over time (NTP) and trust (PKI)—most “ISE DR issues” are dependency failures.
- Capture an evidence pack and measure RTO/RPO every drill.
Upgrade Your Skills – Start Today
For more in-depth Cisco ISE Mastery Training, subscribe to Network Journey on YouTube and join my instructor-led classes.
Fast-Track to Cisco ISE Mastery Pro
• Duration: 4 months (live)
You’ll master: Enterprise ISE design at scale, HA/DR, advanced policy, Guest/BYOD, pxGrid/SGT, upgrades/migrations, load-balancing/GSLB, TAC-grade troubleshooting, evidence-driven runbooks.
Course outline & enrollment: https://course.networkjourney.com/ccie-security/
Next action: Book a readiness call, get the DR Drill Pack (templates, scripts), and reserve your cohort seat.
Enroll Now & Future‑Proof Your Career
Email: info@networkjourney.com
WhatsApp / Call: +91 97395 21088