What is an
Incident ?
Unplanned
interruption in IT service
Reduction in
quality of an IT service
Anything is
broken / Not working
Any Disruption
to IT Service
Something is
broken / down
Examples :
A user is
unable to send and receive an email due to an exchange server issue.
A laptop is
crashed or unable to run some basic application
The user is
unable to connect VPN / WIFI.
What is ITIL ?
ITIL
(Information Technology Infrastructure Library ) is a frame work that providing
best practices, Guidance and practical example for delivering and managing IT
services.
What is an
Incident Management ?
Incident
management is a process to restore Normal service operation as soon as possible
by keeping Low impact to the business with proper quality.
It can be
reported by user via email or logging incident in instance, telephone or chat
by technical person , IT support person . it is detected through any monitoring
Tools.
Define
priority as per the impact, assign to right group for quick resolution,
incident is escalated to next level if not resolved.
Incident
management process in the following ways :
Incident Identification
Incident Logging
Incident Categorization
Incident Prioritization
Incident Assignment
Initial diagnosis
Escalation as
necessary for further investigation
Incident resolution
Incident Closure
Incident
management also ensures communication with the user community throughout the
life of the incident. Any user can record an incident and track it through the
entire incident life cycle service is stored and the issue is resolved. Reports
are used to monitor , track and analyze service levels and improvement.
Tables used in
incident reporting
Incident (incident)
Incident metrics (incident_metric)
Incident SLA (incident_sla)
Service
orchestration is the process of designing, creating, delivering, and monitoring
service offerings in an automated way. Service definitions created in the
product catalog are taken right through the ordering process for fast and
effective service orchestration.
Top
30+ Incident Management Interview Questions and Answers
What is the difference
between an incident and a service request?
A service request is an orderable item that is predefined whereas an incident
is unpredictable and unexpected.
A service request is usually not bound by time frame, expectations are there
(10 days to deliver a laptop) whereas incidents that are for IT services are
SLA bound (unless it is a question or query).
Reset a password (Service Request) vs user is unable to authenticate by AD
(Incident).
Service requests are usually less critical whereas incidents are more critical.
------
What is the difference between an Incident and a problem?
A problem is originated when recurring incidents happen or unresolvable
incidents or any incident which provides a workaround and not a permanent fix.
An incident is a single event that causes service destruction.
Incidents are much more focused on solving the issue, however, the problem is
focused to find the root cause of the issue and then resolving it.
What are Incident templates?
There are prefilled templates or forms to register an incident, this
functionality is useful for service desks or support teams to register faster
incidents while on a call from a user.
ServiceNow Interview Questions
What is the priority?
Priority is measured by combining a matrix of impact and urgency, this helps a
support team to understand what actions in the required time to be taken to
solve the issue. Service level agreements are also designed based on priority
and are often used to measure KPIs.
What are impact and urgency and how they are measured?
The impact is defined or measured by the effect of an Incident on the business.
The impact can be low, medium-high and can be determined based on the Incident.
For example, if an email server is down for one user then the impact is low and
if the email server is down for whole company users then the impact is high.
Determined how many users are affected and which business is affected.
Urgency is to be measured how important this incident be solved and importance
for the business, an incident may have a high impact and may have low urgency
it all depends on the business.
What do you understand by a workaround?
The workaround is a temporary solution to reduce the impact of the incident and
which can be later investigated as a problem to find the root cause and
permanent fix. The workaround is also done when we have a known error and a
permanent fix is not yet found.
What is the importance of incident management?
Restore the service and normal operations as soon as possible.
Increase continuous delivery.
The resolution, workaround for an Incident.
Deliver service level agreement and provide quality service and service
availability.
Increase user satisfaction and trust.
Higher productivity and efficiency.
Improve documentation and analysis and provide reporting.
Share some examples of incident management KPI's.
1. Average response time
2. First, call resolution rate
3. Average resolution time
4. SLA compliance rate
5. Percentage of major incidents
6. End-user satisfaction rates
Define self-recovery.
Self-recovery is an incident resolution type when the system has automatically
restored or resolve the issue.
Define known errors.
Known errors are the issues that do not have a permanent fix but have a known
root cause
How to handle re-occurrent incidents?
Incidents, by nature, are unpredictable. But that doesn't mean you can't be
prepared for them. Having a plan in place to deal with re-occurring incidents
can help you resolve them more quickly and efficiently. Here are a few tips on
how to handle re- occurrent incidents:
First, identify the root cause of the Incident. Is there a particular trigger
that sets it off? If so, try to avoid or remove that trigger if possible.
Second, document the steps you took to resolve the incident so that you can
refer back to them if it happens again.
Third, keep track of any trends in the incident (e.g, does it happen more often
at certain times of day or week?) and take steps to address them.
What is business impact an > Salesforce Handbook: alysis?
What is incident escalation?
What is a major incident?
What is an alert?
Share the process involved in the incident management lifecycle?
When the incident can be resolved?
What is the difference between incident resolution and incident closure?
How to prevent incidents from happening in the first place?
How does ITIL help in an event or service disruption?
Is it possible to relate an incident with another record, for example, a
problem? If yes. How would you do it?
How do you know when to implement an incident management system?
Why is an effective incident response important?
Do you know any incident management best practices?
ITIL Certifications.
Incident Manager Interview Questions
How much experience do you have in the incident management process?
What was the most complex incident management process you ho handled?
Which incident management software systems you have worked on?
How do you handle incident escalations?
It was the last of our 30+ Incident Management Interview Questions which we
prepared for you today. We hope that this article was useful.
What
is the main objective of incident management process? (interviewquestions.guru)
What is an Incident?
An incident is an unplanned event that impacts normal business operations.
Incidents can be caused by various factors, including human error, natural
disasters, or cyber-attacks.
What is Incident Management Process?
The incident management process is a set of steps and procedures that your
organization can use to respond to and recover from incidents.
Who is the incident management coordinator?
The incident management coordinator is responsible for leading the incident
management process. This role may be filled by a senior executive, such as the
CIO or CEO, or a designated member of your IT team
Which is the first step in the incident management process?
The first step in the incident management process is identifying and assessing
the Incident. This involves determining the cause of the incident, the impact
on business operations, and any potential risks or vulnerabilities.
Definitions: A security
flaw, glitch, or weakness found in software code that could be exploited by an
attacker (threat source).
ServiceNow Vulnerability
Response is a vulnerability management solution that enables organizations to identify,
prioritize, and remediate vulnerabilities across their IT infrastructure. The
solution integrates with various vulnerability scanners and other security
tools to provide a comprehensive view of an organization’s vulnerability
landscape.
What are the stages of incident management?
The stages of incident management are identification, assessment, response, and
recovery.
What is the
main objective of incident management process?
The main objective of the incident management process is to protect your
organization's data and systems and to ensure that business operations resume
as quickly as possible. The process should also help you maintain productivity
and business continuity during and after an incident.
Key Components of on Effective incident Managernent Process
There are several key components that an effective incident management process
should include:
A plan for identifying and responding to incidents.
A plan for restoring normal operations
A communication plan
A team of incident response experts
How can an effective incident management process help your organization?
An effective incident management process can help your organization by.
Preventing or minimizing the impact of incidents.
Reducing the time it takes to recover from an incident.
Ensuring that critical business functions are maintained during and after an
Incident
How to improve the incident management process?
If you feel that your organization's incident management process could be
improved, here are a few tips:
Review and update your plan regularly.
Train your team on how to respond to incidents.
Test your plan regularly
Stay prepared for the unexpected.
Incident Management Process is an important part of any business continuity
plan Having a well-defined process in place can help protect your
organization's people property, and information
What is KPI in incident management?
A key performance indicator (KPI) is a metric that can measure the success of
on Incident management process. KPIs can include measures such as the time it
takes to resolve incidents, the number of Incidents per month, or the
percentage of resolved incidents within a specific timeframe. Choosing the
right KPIs con help you to track the effectiveness of your incident management
process and make necessary improvements.
Incident Management KPI Examples
Some common KPIs that can be used to measure the success of an incident
management process include.
The number of Incidents per month.
The time it takes to resolve incidents.
The percentage of resolved incidents within a specific timeframe.
The number of critical systems or data breaches.
The impact of incidents on business operations
Why is incident management important?
Incident management is important because it helps you protect your
organization's people, property, and information. The process should also help
you maintain productivity and business continuity during and after an incident.
Having a well- defined incident management process in place can help you
respond quickly and effectively to any incident.
During incident resolution, when is management notification appropriate?
There may be times when management notification is appropriate during the
resolution of an incident.
include situations where the
incident has a significant impact on business operations, or there is a risk to
the safety of employees. Management should be kept informed of all major
incidents and any updates or developments related to the incident.
What are some common causes of incidents?
There are many different causes of incidents. Some of the most common causes
include:
Human error
Malicious activity
System failure
Accidental damage
What is MTTR in incident management?
Mean time to repair (MITR) is a metric that can measure the time it takes to
resolve an incident. MTTR can be calculated by dividing the total amount of
time it takes to resolve all incidents by the number of incidents. This metric
can help you to track the effectiveness of your incident management process and
make necessary Improvements
Basic
understanding of Incident Management
1. What is
the goal of Incident Management?
The goal of
Incident Management is to restore normal service operation as quickly as
possible, while minimizing impact to business operations and ensuring quality
is maintained.
ServiceNow
Incident Management supports the incident management process in the following
ways:
- Incident Identification
- Incident Logging
- Incident Categorization
- Incident Prioritization
- Incident Assignment
- Initial diagnosis
- Escalation, as necessary, for further
investigation
- Incident resolution
- Incident closure
Incident
Management also ensures communication with the user community throughout the
life of the incident.
Any user
can record an incident and track it through the entire incident life cycle
until service is restored and the issue is resolved. Reports are used to
monitor, track, and analyze service levels and improvement.
2. What
are the different ways in which you can log an incident?
- An ESS user can call a service desk agent
and the agent can log an incident based on the information provided by the
user.
- An ESS user can send an SMS to the
<<ServiceNow Customer Service>> number and an incident is
automatically created for the user.
Note: The user must install the
Notify plugin and set up a Twilio account in order to avail the messaging
service.
- An ITIL user can create an incident by
navigating to Incident > Create new in the application.
- An ESS or ITIL user can copy an existing
incident by clicking the Additional actions menu icon and selecting ‘Copy
Incident’. The ‘Copy Incident’ UI action copies the details of an existing
incident record to a new incident record. The user can then make
modifications to the fields as necessary.
Note: An ITIL
user can copy or create any incident whereas an ESS user can copy only the
incident that the user has created.
- An ITIL user can create templates for
incidents that are logged frequently by navigating to System Definition
> Templates. This simplifies the process of submitting new records by
populating fields automatically. Later, while creating an incident, the
user can navigate to Incident > Create new, click the More options menu
icon and select ‘Toggle Template Bar’. All the existing templates are
displayed at the end of the incident form and the user can select the
required template to create an incident.
Note: Users with the itil role can create personal templates for incidents they log frequently. An administrator or a user with the template_editor_global role can create templates that are available to everyone. An administrator can enable the global option for any personal template that a user has entered, so that all agents have access to it.
- An ITIL user can create an incident template
and then create a pre-defined module with that template. A module can be
created by navigating to System Definition > Modules. On the module
form, in the ‘Arguments’ field, the user needs to provide the url for the
incident template from which a new incident has to be created. So,
whenever an incident is required, the user can just click the module from
the application left navigation pane.
- A user with catalog_admin or admin role can
create a record producer by navigating to Service Catalog > Catalog
Definitions > Record Producers. Record producers appear in the service
catalog as catalog items. Hence, an ESS user can log incident directly
from the Service Catalog using a record producer.
- An End user can request to create an incident
using the ‘Connect’ chat icon that appears on the upper-right corner of
the instance. On the chat window the user can add an ITIL user and also
provide a short description of the issue. Based on the description, the
ITIL user creates an appropriate incident.
3. When
should I create an Incident vs. a Request?
Create an
incident when there is any unplanned interruption or degradation in the quality
of an existing IT service and create a request when you want to put a formal
request to the IT service desk to provide something. A request can be for a new
hardware or application, information, training etc.
Example: If
the existing RAM in your system is malfunctioning, then create an incident but
if you want a new RAM for your system, raise a request.
Incident
Requests are requests that denote the failure or degradation of an IT
service. For example, unable to print, unable to fetch mails and so on.
Service
Requests on the other hand are requests raised by the user for support,
delivery, information, advice or documentation. Some examples are installing
software in workstations, resetting lost password, requesting for hardware
device and so on.
4. How
can I assign an incident to a group or a user?
There are
two ways to assign an incident to a group or a user:
- On the Incident form, there are two fields:
‘Assignment group’ and ‘Assigned to’. You can assign a group or a user by
clicking the lookup icon next to these fields and selecting an appropriate
group or a user.
- You can set assignment rules from System
Policy > Rules > Assignment Lookup Rules. In the Assignment Lookup
Rule form, provide the values for the fields such as ‘Category’,
‘Subcategory’, ‘Assignment Group’ and ‘Assigned To’. Assignment lookup
rule automatically assigns any incident with the pre-defined Category and
Subcategory to the assignment group and/or user that you have provided in
the Assignment Lookup Rule form.
- An ITIL user can self-assign the incidents
using the ‘Assign to me’ from the Actions menu that appears on the list
view.
5. How
do I change the Priority of an Incident?
On an
Incident form, by default, the ‘Priority’ field is read-only and must be set by
selecting the ‘Impact’ and ‘Urgency’ values. For example, if you set the value
of ‘Impact’ and ‘Urgency’ to be high, then the value of the ‘Priority’ field
will be Critical whereas if you set the value of ‘Impact’ to be medium and
‘Urgency’ to be low, then the value of the ‘Priority’ field will be low. In the
Priority [dl_u_priority] table, the values of Impact, Urgency and Priority can
be modified in the data lookup rules.
An
administrator can either alter the priority look-up rules (in the Priority
[dl_u_priority] table) or disable the “Priority is managed by Data Lookup - set
as read-only” UI policy and create their own business logic.
6. What
is the Configuration Item (CI) field that I see on the Incident form?
In the
‘Configuration item’ (CI) field, you need to select the component of the
business service which is affected and for which the incident is being logged.
For example – email, blackberry, e-commerce.
7. What
is the Importance of associating the right CI to an Incident?
Configuration Management
Database (CMDB) contains a collection of configuration items (CI) as well as
descriptive relationships between such CIs. When populated, the database
becomes a means of understanding how critical assets such as information
systems are composed, what are their upstream sources or dependencies, and what
are their downstream targets.
CI
Relationships form a crucial part of CMDB because with relationships, the users
accessing CMDB can understand the inter-dependencies between the CIs, and in
the case of a failure, the impact caused on another CI can be identified.
If you open
an Incident form and enter a value in the Configuration item field, you will
notice that a Dependency view icon appears next to the lookup icon. If you
click the Dependency view icon, it will show you the upstream and downstream
dependencies of that CI. If say, the CI item is email and is not working, you
can check the dependency map to find out the servers on which this CI is
dependent on and then validate the servers one by one to get to the root cause
of the issue. This will be possible only when the relationships between CIs is
well defined in the CMDB table.
8. How
do I associate multiple Business Service or CIs with the incident?
When you
populate the ‘Business Service’ and the ‘Configuration Item’ fields on the
incident form and save the record, the selected values appear on the “Impacted
Services/CIs” and “Affected CIs” related lists respectively.
If you want
to add multiple affected CIs or impacted services, use the “Add” button
provided on the related lists.
Note: If you
have modified the cmdb_ci (configuration item) on the incident, the related
list of “Impacted Services/CIs” will not reflect the change unless you do it
manually using the “Refresh Impacted Services” UI action from the context menu.
9. How
do you find relevant knowledge article on the Incident form? Is it possible
that the Short description field display knowledge articles?
On the
Incident form, in the Short Description field, type the subject on which you
want to find relevant knowledge articles. You can also type the subject in the
Related Search field, in the Related Search Results section. All the articles
relevant to the subject appears in the Related Search Results section.
A System
Admin can configure the search results based on specific user field by
performing the following actions:
a)
Navigate to Contextual Search > Table Configuration.
b) Click on the ‘Incident [incident]’ configuration record.
c) In the ‘Search as’ tab, select the ‘Enable search as’ checkbox.
d) From the ‘Search as field’ choice list, select the field based on
which you want to see the filtered search results.
For example: Select caller field (consider ITIL User as the
caller).
After
updating the incident configuration record, if you open any incident and view
the ‘Related Search Results’, you will now find two tabs:
- My Results: Includes search results for
the logged in user.
- ITIL User Results: Includes search
results that is common for both the logged in user and the user you have
selected in the ‘Search as field’ choice list.
Note that
the search results not only display the relevant knowledge article but also
related service catalog items. For each search record, an 'Order' or an
'Attach' button appears for you to order the service catalog item or attach the
knowledge article with the current incident record.
10. What
is triage in Incident Management?
Triaging an
incident involves two major activities. Firstly, classifying the incident into
the right assignment group. Secondly, involving the right set of people in
order to resolve the incident as quickly as possible. Identifying the correct
and most appropriate assignment group or person for the incident is the most
basic purpose of triage in incident management.
11. What
is triage in ITIL?
A process
for sorting inefficient operations into ITIL processes based on the client's
need for or likely business benefit from immediate improvement. ITIL Triage is
used in the data center, at disaster recovery sites, and in boardrooms when
limited financial resources must be allocated.
Incident
Management Life Cycle
12. How
does incident state changes when the caller updates an incident that is in On
Hold state?
If the
incident is in the On Hold state and the On hold
reason is Awaiting Caller, the incident state changes
to In Progress when the caller updates the incident. In case
of all other On hold reason, the incident state remains in
the On Hold state.
13. Are
Resolution code and resolution notes fields mandatory while closing an
incident? What if I do not have those fields in the form or use scripts to
avoid this entry?
Before
Kingston, the ‘Close codes’ and ‘Close notes’ fields were controlled using UI
policies. UI policies are valid only if the fields on which the UI policies are
applicable are present on the form. So, you could avoid entry to the ‘Close
codes’ and ‘Close notes’ fields by not adding the fields to the form.
From
Kingston release onwards, we have moved the UI Policy to a Data Policy that
works on the server side. Hence, you will need to fill the ‘Resolution codes’
and ‘Resolution notes’ fields to be able to submit the form.
Note: Data
policies on the ‘Resolution codes’ and ‘Resolution notes’ fields are not
available OOB for existing or upgrade customers. Existing or upgrade customers
can create a custom Data Policy on Incident table that makes the ‘Resolution
Notes’ and ‘Resolution Codes’ fields mandatory.
14. What
is the difference between copying an Incident and creating a child incident?
‘Copy
Incident’ (UI action in the contextual menu) copies the details of an existing
incident record to a new incident record. There is no association between the
original/source incident and the new incident.
‘Create
Child Incident’ (UI action in the Context menu) copies the details of the
parent incident and associates the new incident to the parent incident. The
originating incident number is copied to the ‘Parent Incident’ field of the
newly created child incident.
Note: The list
of attributes and related lists that will be copied from the originating
incident will be the ones that are mentioned in the following incident
properties:
- List of attributes (comma-separated) that will
be copied from the originating incident
- Related lists (comma-separated) that will be
copied from the originating incident
- List of attributes (comma-separated) from
Affected CIs (task_ci) related list that will be copied from the
originating incident
- List of attributes (comma-separated) from
Impacted Services (task_cmdb_ci_service) related list that will be copied
from the originating incident
15. How
does the parent incident and the child incident state synchronizes with each
other?
Starting
Kingston release, the parent-child incident state synchronization is as
follows:
Note: If an
incident is reopened, all the child incidents of that incident are reopened and
the state of the child incidents is changed to In Progress.
16. How
can I create Problem/Change from an incident?
You can
attach an incident to a problem or a change. To create a problem from an
incident record, click the Additional actions menu icon and click ‘Create
Problem’. If you want to relate incident to a change request, you can click the
Additional actions menu icon and click ‘Create Normal Change’, ‘Create
Emergency Change’ or ‘Create Standard Change’. The parent record (incident
record) will be automatically associated with the new Problem or Change record.
Note: The
‘Create Standard Change’ UI action will be introduced in London release.
If you need
to associate multiple Problems or Change Requests to a parent incident record,
you can accomplish this by using the “New” and “Edit” buttons on the “Problems”
and “Change Requests” related list present on the incident form.
17. Why
are resolved incidents getting automatically closed?
If the
incident property “Number of days (integer) after which Resolved incidents are
automatically closed. Zero (0) disables this feature” has number of days
defined in its field then resolved incidents will be automatically closed from
the time the incident has been resolved/updated.
If the
Incident property “Enable auto closure of incidents based on Resolution date.
Setting this to 'No' will make auto closure to run based on the Updated date.”
is selected, the incident will be auto-closed based on the resolution date else
it will be based on the last updated date.
Note:
Calculation of days based on Business Days is not supported out-of-the-box.
18. When
is the right time to engage a Problem Manager for an Incident?
When
incidents occur, the role of incident management is to restore service as
rapidly as possible, without necessarily identifying or resolving the
underlying cause of the incidents. If incidents occur rarely or have little
impact, assigning resources to perform root cause analysis cannot be justified.
However, if an individual incident or a series of repeated incidents causes
significant impact, problem management is tasked with diagnosing the underlying
cause of the incidents and, ultimately, to identify a means to remove that
cause.
Major
Incident Management
19. What
are major incidents?
Major
incidents are those incidents for which the degree of impact on the
business/organization is extreme. Incidents for which the timescale of
disruption – to even a relatively small percentage of users – becomes excessive
should also be regarded as major incidents. It is possible to define some of
these major incidents, but most will be prioritized as they happen based on
impact and urgency. The major incident module has been introduced since
Kingston release.
20. Can
P1 incident be considered as a major incident?
Some
organizations equate a major incident with a Priority 1 Incident (or a Severity
1 Incident) but the mapping is not that crisp. Incident priority is for sorting
and prioritizing (and measuring and reporting). A major incident is about
abandoning the normal process and switching to different procedures.
21. What
is the procedure for handling a major incident?
A separate
procedure, with shorter timescales and greater urgency, must be used for major
incidents. A definition of what constitutes a major incident must be agreed and
ideally mapped on to the overall incident prioritization system.
Where
necessary, the major incident procedure should include the dynamic
establishment of a separate Major Incident Management team, under the direct
leadership of the Major Incident Manager. The Major Incident Management team is
formulated to concentrate on that particular major incident and to ensure that
adequate resources and focus are provided for finding a fast resolution.
If the
cause of the incident needs to be investigated at the same time, then the
Problem Manager will be involved as well, but the Incident Manager must ensure
that service restoration and underlying cause are kept separate. Throughout,
the communication manager will ensure that all activities are recorded and
users are kept fully informed of progress. Communication is a hugely important
activity in handling major incidents.
The Problem
Manager should in these circumstances be notified (if not already aware) and
should arrange a formal meeting with interested parties (or regular meetings if
necessary). These should be attended by all key in house support staff, vendor
support staff and IT services management, with the purpose of reviewing
progress and determining the best course of action. The communication manager
should attend these meetings and ensure that a record of actions/decisions is
maintained, ideally as part of the overall incident record as major incidents
are still logged in the same way as all other incidents (it is only the
priority and management of the incident which is different).
If no
Problem Manager or Problem Process Owner is currently in place, an Incident
Management Executive and Major Incident Management team could take on the
activities described above.
Note: The
Incident Management - Major Incident Management plugin (com.snc.incident.mim)
must be activated to work with major incidents.
22. What
are major incident candidates?
Incidents
that we want to promote as a major incident is first proposed to a major
incident candidate. The major incident manager then analyses the candidate and
decides whether the candidate can be considered as a major incident. So, we can
say a major incident candidate is that state of an incident when it is proposed
as a possible candidate for major incident but not yet approved by the major
incident manager to be called a major incident.
23.What are
the ways to create a major incident candidate?
- From the left navigation pane, click
Incident > Major Incidents > Create Major Incident Candidate.
- On the incident form, right-click on the
header and from the context menu click ‘Propose major incident’.
Note: The major
incident candidate functionality has been introduced since Kingston release.
24. What
are the ways to create a major incident?
- From the left navigation pane, click
Incident > Major Incidents > Create Major Incident.
- On the incident form, right-click on the
header and from the context menu click ‘Promote to major incident’.
- You can create major incident trigger
rules to define conditions under which an incident is automatically
considered as a major incident candidate. To create Major incident trigger
rules, navigate to Major Incidents > Administration > Major Incident
Trigger Rules and click “New”.
25. What
is the difference between create a major incident from a major incident
candidate Vs. the ways to create a major incident?
- When you directly create a major incident
candidate and then create a new major incident from that candidate, a new
incident is created that becomes the major incident. The candidate is
added as a child of the major incident. The major incident is automatically
assigned to the Major Incident Management group. If rota is defined, the
system automatically assigns the user available for the on-call rota for
the 'Assigned to' field.
- When a major incident candidate is promoted as
a major incident, the incident itself is considered as a major incident.
There is no new incident that is created. The value in the existing
'Assignment group' or the 'Assigned to' field does not change to the major
incident management group or any user.
26. Incident
form is not behaving as expected even when I have activated the major incident
management plugin. Am I missing something?
I have
activated the major incident management plugin, raised a new P1 incident but
still do not see any changes to the incident form or any option to create a
major incident candidate or to promote a candidate to a major incident. I can
see a trigger rule set for Priority = 1 Critical so I was expecting this to
become a candidate for major incident but nothing happened. What am I doing
wrong?
The issue
may be because the trigger rule you are looking at is inactive. ServiceNow
ships 3 trigger rules OOB, all marked as active=false. You can review and
activate the ones you need for your business.
27. Do
we have a place from where we can manage major incidents?
The major
incident workbench is a single pane view specifically designed for major
incident managers, communication managers and resolver groups to manage major
incidents.
To navigate
to major incident workbench, click 'View Workbench' that appears on the header
of the Incident form.
28. I do not
see the View Workbench button on the header of the Incident form!
You will
see the button when the incident is either proposed or is accepted as a major
incident.
29. What
is a major incident dashboard (PA dashboard) used for?
Major
incident dashboard provides at-a-glance view of all major incident information.
Analyzing
and Improving IM process
30. What
are the fields related to time calculation that are available on my Incident
form?
There are a
number of time-tracking fields available for users. The fields are as follows:
- Business resolve time: It is the total amount
of time taken to resolve a task within business hours. For example, if the
user works for 9am to 6pm from Monday to Friday, the business resolved is
the time to complete the task within that time excluding holidays,
weekends, and out-of-office hours. Business resolved time is calculated
when the task is resolved.
- Calendar resolve time: It is the total amount
of time taken to resolve a task including holidays, weekends, and
out-of-office hours.
- Business Duration: It is the total amount of
time spend actively on the task within business hours. For example, if you
are working on an incident record, then move to some other record or
browser, and come back again to the initial incident record, the time when
you were active in other records will not be counted.
- Calendar Duration: It is the total amount of
time spend actively on the task. For example, if you are working on an
incident record, then move to some other record or browser, and come back
again to the initial incident record, the time when you were active in
other records is also counted.
Note:
All the above time-tracking fields are based on the calendar record available
at System Policy > SLA > Calendars (sys_calendar table). These records
define the working days, hours and holidays. You can also create customized
calendar schedules by clicking the “New” button on the Calendar list view.
31. What
are Response Times and Resolution Times?
Response
Time is defined as the amount of time between when the client first creates an
incident report (which includes leaving a phone message, sending an email, or
using an online ticketing system) and when service desk agent actually responds
(automated responses don’t count) and lets the client know they are currently
working on it.
Resolution
Time is defined as the amount of time between when the client first creates an
incident report and when that issue is actually solved.
Note: The above
is just for theoretical explanation and is not tracked anywhere on the
instance.
32. What
is a known error and when is it identified?
A known
error is a fault in a configuration item (CI) identified by the successful
diagnosis of a problem and for which a temporary workaround or a permanent solution
has been identified. Therefore, a known error is an already identified solution
to an existing or a new issue. A known error is identified when the cause of
the problem is known.
33. What
are the Key Performance Indicator (KPI) of Incident Management?
Measurements
are important across all stages of the ITIL lifecycle. Each process has metrics
that should be monitored and reported to effectively evaluate the overall
performance.
Examples of
Incident Management KPIs that are shipped with the base system are:
- % of high priority incidents resolved
- % of incidents resolved on first assignment
- % of incidents resolved within SLA
- % of reopened incidents
- Average time to resolve a high priority
incident
- Average time to resolve an incident
- Number of incidents created per user
Users
can enable or disable a KPI and customize KPI conditions.
Integration
with Performance Analytics provides daily data collection and drill-down
capabilities on KPI data. KPIs should be related to Critical Success Factors
(CSF) and CSFs should be related to objectives. This relationship helps with
decision support for maintaining current state and improving to desired state.
Although each organization is different, relevant reports for users, staff and
management will help support important decisions that can be used to improve
both the processes and the business as a whole.
34. What
is the difference between Critical success factors (CSF) Vs. Key Performance
Indicator (KPI) of Incident Management?
A CSF is a
critical factor or activity required for ensuring the success of a company or
an organization. Alternative terms are key result area (KRA) and key success
factor (KSF). These are often used to denote the mission statements, vision of
an organization, or simply for a business strategy.
Key
performance indicators or KPIs, on the other hand, are measures used to
quantify management objectives, are accompanied with a target or threshold and
enable measurement of performance. Another key term is measure of KPIs (threshold),
which simply indicates the plotting of achievement against a definition, which
may be either time based or denoted against numbers.
35. What
are Mean Time To Identify (MTTI), Mean Time To Repair (MTTR) and Mean Time To
Failure (MTTF)?
- Mean Time To Identify (MTTI) is the length of
time a system or other product takes to detect an incident.
- Mean Time To Resolve (MTTR) is the average
time between the start and resolution of an incident. But first you have
to identify the issue.
- Mean Time To Failure (MTTF) is the length of
time a system or other product is expected to last in operation.
Note: You can
define these metrics by navigating to Metrics > Definitions.
36. What is
Post Incident Review (PIR)?
After the
major incident is resolved, a post-incident review is conducted to analyze the
incident and understand what can be done to prevent a similar incident in the
future. This also provides an opportunity to review the incident response
process and identify areas for improvements.
To streamline the process, a post-incident report is created when an incident
is resolved which can then be reviewed and updated during the review process
before sharing the report with stakeholders.
37.
What are PIR reports? Where do I find them?
The
post-incident report provides a summary of the incident, findings, resolution
information including any change requests and problem records created. A
timeline of activities is available in the report which can be edited by the
major incident manager to include important activities, e.g., actions taken to
resolve an incident.
Troubleshooting
38.
Incidents are getting created with same incident number but different sys_id?
In
ServiceNow, duplicate numbering is a rare case since numbering does not
enforces uniqueness by default.
However, if
duplicate numbers are created, we need to set the Number field as unique at
dictionary level in our instances as follows:
- Clean up all duplicate entries.
- Update the number field on the number maintenance
record for that incident table.
Note: The
number should either be greater than or equal to the highest number in the
incident list.
For example, if your highest incident
number is INC176601 then update the number field in the number maintenance
record for incident to 176601.
- Update the number field in the number
maintenance record for incident.
To know
more about duplicate incidents, refer KB0538764.
39. On
the incident, an ITIL user is not able to find the "Closed"
state for the incident.
Users with
itil_admin and admin role can see the "Closed" state of an
incident.
What is the Lifecycle of MIM
Major Incident Management Process involves
identifying, prioritizing, and resolving critical incidents to minimize
business impact.
Incidents are classified based on their impact
and urgency
A Major Incident Manager is appointed to lead
the response team
Communication is key to ensure all
stakeholders are informed
Root cause analysis is conducted to prevent
future incidents
Continuous improvement is achieved through
regular reviews and updates
Lifecycle of Major incidents can be defined as
below:
Incident Identification
Incident Logging
Incident Categorization
Incident Prioritization
Investigation & Diagnosis
Escalation & SLA management
Recovery & Resolution
-----
The Lifecycle of Critical/Major Incident
management can be define in below stages :
Major incidents are considered to have 4 main
stages, namely:
Identification
Containment
Resolution
Maintenance
Which of course can be compared with lifecycle
of Incident management process:
Identification
Logging
Categorization
Prioritization
Investigation & Diagnosis
Recovery & Resolution
---------
What is the relation between Incident and change?
Incidents can trigger changes to prevent
future incidents.
Incidents can reveal weaknesses in the system
that require changes to prevent future incidents.
Changes can also cause incidents if not
properly managed.
Incident and change management processes
should be closely integrated.
For example, a major incident caused by a
software bug may require a change to the code to prevent it from happening
again.
Similarly, a change to a server configuration
may cause an incident if not properly tested and implemented.
Effective communication between incident and
change management teams is crucial to ensure timely and effective resolution of
incidents.
Incident is unplanned interruption and Change is a planned
modification.
Incidents and changes are closely related as
changes can cause incidents and incidents can lead to changes.
Changes can introduce new risks and
vulnerabilities that can lead to incidents.
Changes can also be made in response to
incidents to prevent them from happening again.
Incidents can trigger the need for changes to
be made to prevent similar incidents in the future.
For example, a software update can introduce a
bug that causes an incident, which then requires a change to be made to fix the
bug.
Or, an incident can reveal a flaw in a process
that requires a change to be made to improve the process and prevent future
incidents.
Situational based question like ....Brief one scenario where
you drive a major incident in lot of pressure
During a major incident, I led a team under
high pressure to restore service within SLA.
Quickly assessed the situation and identified
the root cause
Communicated effectively with stakeholders and
team members
Prioritized tasks and delegated
responsibilities
Monitored progress and made adjustments as
necessary
Ensured all documentation was accurate and
up-to-date
Maintained a calm and professional demeanor
throughout the incident
Have you handle most difficult incident? If Is! What is your
approach and what is the learning
Yes, I have handled difficult incidents. My
approach is to gather all available information, prioritize actions, and
communicate effectively with stakeholders.
Gather all available information about the
incident
Prioritize actions based on impact and urgency
Communicate effectively with stakeholders,
including regular updates
Ensure all necessary resources are available
to resolve the incident
Conduct a thorough post-incident review to
identify areas for improvement
Example: Managed a major outage that impacted
multiple systems and required coordination with multiple teams. Prioritized
actions based on impact and urgency, communicated effectively with
stakeholders, and ensured all necessary resources were available to resolve the
incident.
What if you wake one day on Moon ? Be prepared
If I wake up on the Moon, I would assess the
situation, gather information, and take necessary actions to ensure my safety
and survival.
Assess the surroundings and determine the
level of danger
Check for available resources and supplies
Establish communication with Earth and inform
them about the situation
Evaluate the possibility of returning to Earth
or seeking assistance
Take necessary precautions to protect myself
from the harsh lunar environment
You are in call and solving an incident, there are more
incident in queue. What you will do?
I will prioritize the incidents based on their
impact and urgency and delegate them to the appropriate teams.
Assess the impact and urgency of each incident
in the queue
Delegate the incidents to the appropriate
teams based on their expertise
Ensure that the teams have all the necessary
information to resolve the incidents
Monitor the progress of each incident and
provide support as needed
------
Major Incident Manager need to act accordingly
to the Impact and Urgency of the Incident .
then Prioritize and categorize the incident.
What is PIR in Incident management?
PIR stands for Post Incident Review. It is a
process in incident management to analyze and learn from past incidents.
PIR is conducted after an incident is resolved
to evaluate the incident response and identify areas for improvement.
It involves gathering data, analyzing the
incident timeline, identifying root causes, and documenting lessons learned.
PIR helps in preventing future incidents,
improving incident response processes, and enhancing overall system
reliability.
Examples of PIR activities include conducting
interviews with involved parties, reviewing incident logs, and documenting
recommendations.
The findings from PIR are used to update
incident management procedures, train staff, and implement preventive measures.
---------------
It was Post incident review. I didn't know as
I used different term Post mortem analysis for the same. The stupid guy
rejected me without even let me know what PIR is.. Cognizant need to learn
courtesy.
Major Incident Management Process
Major Incident Management Process involves
identifying, prioritizing, and resolving critical incidents to minimize
business impact.
Incidents are classified based on their impact
and urgency
A Major Incident Manager is appointed to lead
the response team
Communication is key to ensure all
stakeholders are informed
Root cause analysis is conducted to prevent
future incidents
Continuous improvement is achieved through
regular reviews and updates
---------
reparation - Identification - Classification -
Prioritisation - Escalation - Knowing the ETR and RFO - Further Escalation -
Further Escalation - Restoration - Closure - Postmortem.
What time it take to provide RCA?
The time taken to provide a Root Cause
Analysis (RCA) varies depending on the complexity of the incident.
The duration of providing an RCA depends on
the incident's complexity, scope, and available data.
Simple incidents with clear causes may require
less time for RCA.
Complex incidents involving multiple systems
or dependencies may take longer to analyze and determine the root cause.
Availability of relevant data and
documentation can significantly impact the time required for RCA.
Collaboration with various teams and
stakeholders may be necessary to gather information and validate findings.
Efficient RCA processes and tools can help
streamline the analysis and reduce the time taken.
What is difference between responsible and accountable?
Responsible and accountable are two different
levels of involvement and ownership in a task or project.
Responsible refers to the person who is
assigned to complete a task or achieve a goal.
Accountable refers to the person who is
ultimately answerable for the outcome or results of the task or project.
Responsibility can be shared among multiple
individuals, but accountability rests with a single person.
Being responsible means having the duty or
obligation to perform a task, while being accountable means being answerable
for the success or failure of that task.
For example, in a Major Incident Management
role, the Major Incident Manager may be responsible for coordinating the
response and resolution of incidents, while the Incident Management Team Lead
may be accountable for ensuring that incidents are resolved within the
agreed-upon timeframes.
----------
Its Comes from RACI Matrix from ITIL Framework
:
Responsible: People or stakeholders who do the
work. They must complete the task or objective or make the decision.
Several people can be jointly Responsible.
Vs
Accountable: Person or stakeholder who is the “owner”
of the work.
What is difference between change and problem
Change is a planned action to improve a
service, while problem is an unplanned event that causes disruption to a
service.
Change is a proactive action taken to improve
a service, while problem is a reactive response to an issue that has already
occurred.
Change is a controlled process that is
planned, tested, and implemented with minimal disruption to the service, while
problem is an unplanned event that causes disruption to the service.
Change is usually initiated by the service
provider, while problem is usually reported by the service user.
Examples of change include upgrading software,
adding new features, or changing hardware, while examples of problem include
system crashes, network outages, or security breaches.
what is the KPIs of major incident
Key Performance Indicators (KPIs) of major
incidents measure the effectiveness and efficiency of incident management.
Response time: Measure the time taken to
respond to a major incident.
Resolution time: Measure the time taken to
resolve a major incident.
Customer satisfaction: Measure the
satisfaction level of customers affected by major incidents.
Incident recurrence rate: Measure the
frequency of major incidents recurring.
Mean time between failures (MTBF): Measure the
average time between major incidents.
Mean time to recover (MTTR): Measure the
average time taken to recover from major incidents.
Number of major incidents: Measure the total
number of major incidents occurring within a specific period.
Escalation rate: Measure the rate at which
major incidents are escalated to higher levels of management.
Availability: Measure the percentage of time
services are available without major incidents.
Impact on business: Measure the financial and
operational impact of major incidents on the business.
The KPI are as follows :
1).MTTR-Mean Time to Response
2).MTTA-Mean Time to Acknowledge
3).MTTBF-Mean Time between Failures.
3).MTTD- Mean Time to Detect.
what are CFS
CFS stands for Call For Service.
CFS is a term commonly used in emergency
services and law enforcement.
It refers to a request or notification for
assistance or response to a specific incident.
CFS can include various types of incidents
such as crimes, accidents, or medical emergencies.
Dispatchers or operators receive CFS and
assign appropriate resources to handle the situation.
Examples of CFS include a 911 call reporting a
burglary, a traffic accident, or a person in need of medical attention.
Lifecycle of an incident
The incident lifecycle involves
identification, logging, categorization, prioritization, investigation,
resolution, and closure.
Identification: recognizing that an incident
has occurred
Logging: recording the details of the incident
Categorization: assigning a category to the
incident
Prioritization: determining the urgency and
impact of the incident
Investigation: gathering information to
determine the cause of the incident
Resolution: taking action to resolve the
incident
Closure: verifying that the incident has been
resolved and documenting the resolution
What is the difference between impact and urgency?
Impact refers to the extent of the
consequences caused by an incident, while urgency refers to the time
sensitivity of resolving the incident.
Impact is about the severity and extent of the
damage caused by an incident.
Urgency is about the time sensitivity and the
need for immediate action to resolve the incident.
Impact is measured in terms of the disruption
caused to the business or services.
Urgency is measured in terms of the time
constraints and the impact on business operations.
For example, a critical system outage may have
a high impact as it affects multiple users and services, but if it can be
resolved within a few hours, the urgency may be lower.
On the other hand, a minor incident with low
impact may require immediate attention due to its urgency, such as a security
vulnerability that needs to be patched immediately.
What is defined as severity and priority?
Severity and priority are used to classify
incidents based on their impact and urgency.
Severity refers to the impact an incident has
on the business or system.
Priority refers to the urgency with which an
incident needs to be resolved.
Both severity and priority are usually classified
into levels, such as low, medium, and high.
The severity and priority of an incident can
be determined based on factors such as the number of users affected, the
criticality of the system, and the potential financial impact.
For example, a critical system outage
affecting a large number of users would be classified as high severity and high
priority.
What are Emergency Change and Urgent Change What is known
error What is the incident management lifecycle what contains in the Service
operation module Suppose the customer toll-free number got down and then what
will be lost for the customer
Answers to questions related to Incident
Management
Emergency Change and Urgent Change are types
of changes that are implemented quickly to resolve an incident or prevent an
incident from occurring
Known Error is a problem that has been
identified and has a documented root cause and a workaround
Incident Management lifecycle consists of
identification, logging, categorization, prioritization, diagnosis, escalation,
resolution, and closure of incidents
Service Operation module contains processes
related to managing services in operation, such as incident management, problem
management, event management, access management, and request fulfilment