Planning insights from the Election Security Exchange

Planning Desk, Week E-22: Recovery—Restore, but Verify

Recovery is the final step in the third phase of your Incident Response Process. Containment stopped the spread. Eradication removed the cause. Recovery is the orderly return to normal: bringing systems back online, restoring data, resuming normal operations, and watching closely for any sign that the problem returns.

The temptation at this point is to move fast. Leadership wants the all-clear, staff want their tools back, and the public wants the website up. Speed here is fine; sloppiness is not. A botched recovery undoes the work of the previous two steps.

Restore From a Known Clean State

The single most important question in recovery is When did the incident actually start? Your detection log and analysis notes from earlier in the incident response process should provide a date and time, or at least a date window. Restore from a backup taken before that point, not after.

This is where backup hygiene pays off or fails publicly. A backup you have never tested is a hope, not a recovery plan. Confirm the backup is intact, its date is before the incident window, and that the restore actually works in a test environment before you put it into production.

Bring Services Back In a Deliberate Order

A phased restoration is preferable. Prioritize essential systems, particularly those required in the next 24 to 72 hours, and those that you can monitor closely. Restoring a low-traffic system early can help you detect any recurrence in a manageable setting before reintroducing high-traffic systems, where problems can be more difficult to spot.

Watch Closely After the Lights Come Back On

The period immediately after recovery is when your team is tired and most likely to miss something. It is also when sophisticated attackers often try again from a different angle, having learned what worked and what did not. Set a defined heightened-monitoring window before you declare the incident closed. This could be several days for routine incidents or longer for anything involving credential compromise, persistent access, or unknown root cause.

The monitoring should include the obvious (logs, alerts, the systems you just restored) and the less obvious (related accounts, connected systems, the same attack pattern aimed at a different entry point).

Do Not Declare Victory Too Early

The incident is not over when service is restored. It is over when the heightened-monitoring window has passed without recurrence, the documentation is complete, and the post-incident review (next issue) is on the calendar. Until then, the Incident Response Plan stays active.

Actions You Can Take

Validate backups on a schedule, not just when you need them. Quarterly is reasonable for most offices. Time how long it takes to restore, document the steps, and fix what doesn’t work.
Define recovery order in advance for your critical systems: which ones are restored first, which can wait, and who decides.
Set heightened-monitoring windows by severity tier. Write them into your Incident Response Plan. Do not negotiate them during recovery.
Assign a recovery monitor. Identify someone whose only job during the post-recovery window is watching for recurrence. This person should be separate from the staff getting the office back to normal.
Schedule the post-incident review before recovery ends. Putting it on the calendar is the only way to make sure it actually happens.

Recovery closes Phase 3, but it does not close the incident. Next week we wrap up this series on the incident response process with Phase 4, Post-Incident Activity, the work that determines whether the next incident catches you in the same place.

The Planning Desk is a running timeline of key election security tasks. You can find prior editions in the newsletter archive.

Situation Room: The Canvas Breach Through an Election Security Lens