Security Incident Response Plan Template

AWS Platform Guide

<DATE>

Response Introduction

This is the Security Incident Response Plan for <APPLICATION/SYSTEM NAME> that documents the procedures for responding to a security incident.

The goal of the Security Incident Response Plan is to provide tools to help technical staff who are responsible for supporting systems to effectively respond to security incidents, and to minimize any negative impact to institutional operations through a set of detection, analysis and recovery activities.

System Overview

TODO: add a starter list of artifacts that thoughtbot typically delivers.

Architecture Model

<Attach a high-level diagram of the application/system data flow and data storage, including all interconnected system names and networks (e.g., public vs. private)>

System Hardware Inventory

<Insert hardware inventory of covered devices, e.g., provide a link to the inventory file>

Audit Logging

<Insert details of the application system audit logging processes. Include details about where the log files are located, as well as a brief description of the events captured in each log file>

TODO: Insert standard thoughtbot approach to logging and alerting.

Glossary

Breach - A breach is defined as a security incident that requires notification to affected individuals, where the protected data of the affected individual was, or is reasonably believed to have been, accessed by an unauthorized person.

Event - An event is any observable occurrence on a system or network, e.g. system login, file creation or application transaction.

Security Incident - An event, or series of events, that violates, or is about to violate, campus policies and standards.

System Contacts

The <APPLICATION/SYSTEM NAME> Incident Response Team includes the following staff:

Incident Management Team	Email	Phone Number

The Incident Management Team are engineers who have technical knowledge of the application/system and full understanding of the Security Incident Response Plan. The first point of contact should be a designated On-Call Engineer.

<If there is an on-call rotation, insert instructions on how to access the rotation>

Client Contact	Email	Phone Number

Client Contacts ** are people who understands the business impact of the system and its unavailability, and can make decisions on behalf of the client team.

<Insert other points of contact and their information, and/or instructions for hierarchy of communication>

Incident Response Procedures

Guideline

Incident Response

<Example of response flowchart below. Insert one that is relevant to your team>

The high level incident response steps can be summarized in the flowchart above. The sections below outline detailed procedures for each step.

Step 1: Detection

The monitoring tools and alert systems are set up to notify the proper contacts automatically should suspicious events occur. All events reported by end users of the application/system should be directed to the Incident Management Team, starting with a designated On-Call Engineer.

The On-Call Engineer will then notify others within the Client and thoughtbot teams as laid out in the System Contact section.

Step 2: Response

The On-Call Engineer (with support from the Incident Management Team, if needed) should start by determining whether suspicious system events constitute a Security Incident.

<Steps can include, but not limited to the following. Edit as needed>

Gather information about incident: sequence of events, including any unexpected observations and/or the actions taken so far
List the devices/systems that are affected, their functions, and the number of users that might be impacted
Review the evidence of compromise (email samples, screenshots, suspicious files, etc.)
Document any observations and decisions made based upon discussion or evidence analysis.

The rationale used to determine whether the suspicious events constitute a Security Incident must be documented.

If the suspicious events are confirmed to be a Security Incident, continue to the next step to Assess Initial Impact.

If the suspicious events are one-time computer glitches or user misjudgement, and does not fit the definition of a Security Incident, document the decision and close the incident response process.

Step 3: Resolution

The Incident Management Team should proceed with the following steps to return affected systems to a normal operational state:

<Insert relevant steps. Some examples below:>

Change passwords that are affected
If available, collect any operating system or application level audit logs to capture events that show what data was accessed, when the suspicious event started/stopped, and what action was performed
Repair affected system

Examples of compromises and the corresponding remediation procedures are provided below. Note that depending on the nature of compromise, a combination of remediation procedures may be needed to fully address the cause:

Scenario	Resolution Procedures

<Fill in common scenarios and procedures. Should be reviewed/updated after each incident post-mortem>

Step 4: Analysis

After the system has been returned to an operational state, the persons listed below should gather for a blameless post-mortem to assess the response. They should thoroughly document actions performed and observations made during the incident response process, and any improvements needed before the next incident.

<insert persons for post-mortem>

Step 5: Readiness

If the post-mortem resulted in any action items to improve the process of responding to the next incident, follow up on these items, such as:

Update the runbook
Implement prevention plans
Improve monitoring/alerting

AWS Platform Guide

The guide for building and maintaining production-grade Kubernetes clusters with built-in support for SRE best practices.

Source available on GitHub.

Work with us to scale your application, improve stability, and increase the rate of defect-free deployments.

Get Started