Incident Management

 

What is an Incident ?

 

Unplanned interruption in IT service

Reduction in quality of an IT service

Anything is broken / Not working

Any Disruption to IT Service

Something is broken / down

 

Examples :

A user is unable to send and receive an email due to an exchange server issue.

A laptop is crashed or unable to run some basic application

The user is unable to connect VPN / WIFI.

 

What is ITIL ?

 

ITIL (Information Technology Infrastructure Library ) is a frame work that providing best practices, Guidance and practical example for delivering and managing IT services.

 

What is an Incident Management ?

 

Incident management is a process to restore Normal service operation as soon as possible by keeping Low impact to the business with proper quality.

It can be reported by user via email or logging incident in instance, telephone or chat by technical person , IT support person . it is detected through any monitoring Tools.

 

Define priority as per the impact, assign to right group for quick resolution, incident is escalated to next level if not resolved.   

 

Incident management process in the following ways :

Incident Identification

Incident Logging

Incident Categorization

Incident Prioritization

Incident Assignment

Initial diagnosis

Escalation as necessary for further investigation

Incident resolution

Incident Closure

Incident management also ensures communication with the user community throughout the life of the incident. Any user can record an incident and track it through the entire incident life cycle service is stored and the issue is resolved. Reports are used to monitor , track and analyze service levels and improvement.

 

Tables used in incident reporting

Incident (incident)

Incident metrics (incident_metric)

Incident SLA (incident_sla)

Service orchestration is the process of designing, creating, delivering, and monitoring service offerings in an automated way. Service definitions created in the product catalog are taken right through the ordering process for fast and effective service orchestration.

 

 

Top 30+ Incident Management Interview Questions and Answers

 

 

What is the difference between an incident and a service request?

A service request is an orderable item that is predefined whereas an incident is unpredictable and unexpected.

A service request is usually not bound by time frame, expectations are there (10 days to deliver a laptop) whereas incidents that are for IT services are SLA bound (unless it is a question or query).

Reset a password (Service Request) vs user is unable to authenticate by AD (Incident).

Service requests are usually less critical whereas incidents are more critical.

------
What is the difference between an Incident and a problem?

A problem is originated when recurring incidents happen or unresolvable incidents or any incident which provides a workaround and not a permanent fix. An incident is a single event that causes service destruction.

Incidents are much more focused on solving the issue, however, the problem is focused to find the root cause of the issue and then resolving it.

What are Incident templates?

There are prefilled templates or forms to register an incident, this functionality is useful for service desks or support teams to register faster incidents while on a call from a user.

ServiceNow Interview Questions

What is the priority?

Priority is measured by combining a matrix of impact and urgency, this helps a support team to understand what actions in the required time to be taken to solve the issue. Service level agreements are also designed based on priority and are often used to measure KPIs.

What are impact and urgency and how they are measured?

The impact is defined or measured by the effect of an Incident on the business. The impact can be low, medium-high and can be determined based on the Incident. For example, if an email server is down for one user then the impact is low and if the email server is down for whole company users then the impact is high. Determined how many users are affected and which business is affected.

Urgency is to be measured how important this incident be solved and importance for the business, an incident may have a high impact and may have low urgency it all depends on the business.

What do you understand by a workaround?

The workaround is a temporary solution to reduce the impact of the incident and which can be later investigated as a problem to find the root cause and permanent fix. The workaround is also done when we have a known error and a permanent fix is not yet found.

What is the importance of incident management?

Restore the service and normal operations as soon as possible.

Increase continuous delivery.

The resolution, workaround for an Incident.

Deliver service level agreement and provide quality service and service

availability.

Increase user satisfaction and trust.

Higher productivity and efficiency.

Improve documentation and analysis and provide reporting.

Share some examples of incident management KPI's.

1. Average response time

2. First, call resolution rate

3. Average resolution time

4. SLA compliance rate

5. Percentage of major incidents

6. End-user satisfaction rates

Define self-recovery.

Self-recovery is an incident resolution type when the system has automatically restored or resolve the issue.

Define known errors.

Known errors are the issues that do not have a permanent fix but have a known root cause

How to handle re-occurrent incidents?

Incidents, by nature, are unpredictable. But that doesn't mean you can't be prepared for them. Having a plan in place to deal with re-occurring incidents can help you resolve them more quickly and efficiently. Here are a few tips on how to handle re- occurrent incidents:

First, identify the root cause of the Incident. Is there a particular trigger that sets it off? If so, try to avoid or remove that trigger if possible.

Second, document the steps you took to resolve the incident so that you can refer back to them if it happens again.

Third, keep track of any trends in the incident (e.g, does it happen more often at certain times of day or week?) and take steps to address them.

What is business impact an > Salesforce Handbook: alysis?

What is incident escalation?

What is a major incident?

What is an alert?

Share the process involved in the incident management lifecycle?

When the incident can be resolved?

What is the difference between incident resolution and incident closure?

How to prevent incidents from happening in the first place?

How does ITIL help in an event or service disruption?

Is it possible to relate an incident with another record, for example, a problem? If yes. How would you do it?

How do you know when to implement an incident management system?

Why is an effective incident response important?

Do you know any incident management best practices?

ITIL Certifications.

Incident Manager Interview Questions

How much experience do you have in the incident management process?

What was the most complex incident management process you ho handled?

Which incident management software systems you have worked on?

How do you handle incident escalations?

It was the last of our 30+ Incident Management Interview Questions which we prepared for you today. We hope that this article was useful.

 

What is the main objective of incident management process? (interviewquestions.guru)

 

 

What is an Incident?

An incident is an unplanned event that impacts normal business operations. Incidents can be caused by various factors, including human error, natural disasters, or cyber-attacks.

What is Incident Management Process?

The incident management process is a set of steps and procedures that your organization can use to respond to and recover from incidents.

Who is the incident management coordinator?

The incident management coordinator is responsible for leading the incident management process. This role may be filled by a senior executive, such as the CIO or CEO, or a designated member of your IT team

Which is the first step in the incident management process?

The first step in the incident management process is identifying and assessing the Incident. This involves determining the cause of the incident, the impact on business operations, and any potential risks or vulnerabilities.

Definitions: A security flaw, glitch, or weakness found in software code that could be exploited by an attacker (threat source).

 

ServiceNow Vulnerability Response is a vulnerability management solution that enables organizations to identify, prioritize, and remediate vulnerabilities across their IT infrastructure. The solution integrates with various vulnerability scanners and other security tools to provide a comprehensive view of an organization’s vulnerability landscape.


What are the stages of incident management?

The stages of incident management are identification, assessment, response, and recovery.

What is the main objective of incident management process?

The main objective of the incident management process is to protect your organization's data and systems and to ensure that business operations resume as quickly as possible. The process should also help you maintain productivity and business continuity during and after an incident.

Key Components of on Effective incident Managernent Process

There are several key components that an effective incident management process should include:

A plan for identifying and responding to incidents.

A plan for restoring normal operations

A communication plan

A team of incident response experts

How can an effective incident management process help your organization?

An effective incident management process can help your organization by.

Preventing or minimizing the impact of incidents.

Reducing the time it takes to recover from an incident.

Ensuring that critical business functions are maintained during and after an Incident

How to improve the incident management process?

If you feel that your organization's incident management process could be improved, here are a few tips:

Review and update your plan regularly.

Train your team on how to respond to incidents.

Test your plan regularly

Stay prepared for the unexpected.

Incident Management Process is an important part of any business continuity plan Having a well-defined process in place can help protect your organization's people property, and information

What is KPI in incident management?

A key performance indicator (KPI) is a metric that can measure the success of on Incident management process. KPIs can include measures such as the time it takes to resolve incidents, the number of Incidents per month, or the percentage of resolved incidents within a specific timeframe. Choosing the right KPIs con help you to track the effectiveness of your incident management process and make necessary improvements.

Incident Management KPI Examples

Some common KPIs that can be used to measure the success of an incident management process include.

The number of Incidents per month.

The time it takes to resolve incidents.

The percentage of resolved incidents within a specific timeframe.

The number of critical systems or data breaches.

The impact of incidents on business operations

Why is incident management important?

Incident management is important because it helps you protect your organization's people, property, and information. The process should also help you maintain productivity and business continuity during and after an incident. Having a well- defined incident management process in place can help you respond quickly and effectively to any incident.

During incident resolution, when is management notification appropriate?

There may be times when management notification is appropriate during the resolution of an incident.

include situations where the incident has a significant impact on business operations, or there is a risk to the safety of employees. Management should be kept informed of all major incidents and any updates or developments related to the incident.

What are some common causes of incidents?

There are many different causes of incidents. Some of the most common causes include:

Human error

Malicious activity

System failure

Accidental damage

What is MTTR in incident management?

Mean time to repair (MITR) is a metric that can measure the time it takes to resolve an incident. MTTR can be calculated by dividing the total amount of time it takes to resolve all incidents by the number of incidents. This metric can help you to track the effectiveness of your incident management process and make necessary Improvements

 

 

 

 

Basic understanding of Incident Management

1. What is the goal of Incident Management?

The goal of Incident Management is to restore normal service operation as quickly as possible, while minimizing impact to business operations and ensuring quality is maintained.

ServiceNow Incident Management supports the incident management process in the following ways:

  • Incident Identification
  • Incident Logging
  • Incident Categorization
  • Incident Prioritization
  • Incident Assignment
  • Initial diagnosis
  • Escalation, as necessary, for further investigation
  • Incident resolution
  • Incident closure

 Incident Management also ensures communication with the user community throughout the life of the incident.

Any user can record an incident and track it through the entire incident life cycle until service is restored and the issue is resolved. Reports are used to monitor, track, and analyze service levels and improvement.

 

2. What are the different ways in which you can log an incident?

  • An ESS user can call a service desk agent and the agent can log an incident based on the information provided by the user.
  • An ESS user can send an SMS to the <<ServiceNow Customer Service>> number and an incident is automatically created for the user.

          Note: The user must install the Notify plugin and set up a Twilio account in order to avail the messaging service.

  • An ITIL user can create an incident by navigating to Incident > Create new in the application.

  • An ESS or ITIL user can copy an existing incident by clicking the Additional actions menu icon and selecting ‘Copy Incident’. The ‘Copy Incident’ UI action copies the details of an existing incident record to a new incident record. The user can then make modifications to the fields as necessary.

          Note: An ITIL user can copy or create any incident whereas an ESS user can copy only the incident that the user has created.

  • An ITIL user can create templates for incidents that are logged frequently by navigating to System Definition > Templates. This simplifies the process of submitting new records by populating fields automatically. Later, while creating an incident, the user can navigate to Incident > Create new, click the More options menu icon and select ‘Toggle Template Bar’. All the existing templates are displayed at the end of the incident form and the user can select the required template to create an incident.

    Note: Users with the itil role can create personal templates for incidents they log frequently. An administrator or a user with the template_editor_global role can create templates that are available to everyone. An administrator can enable the global option for any personal template that a user has entered, so that all agents have access to it.

  • An ITIL user can create an incident template and then create a pre-defined module with that template. A module can be created by navigating to System Definition > Modules. On the module form, in the ‘Arguments’ field, the user needs to provide the url for the incident template from which a new incident has to be created. So, whenever an incident is required, the user can just click the module from the application left navigation pane.

  • A user with catalog_admin or admin role can create a record producer by navigating to Service Catalog > Catalog Definitions > Record Producers. Record producers appear in the service catalog as catalog items. Hence, an ESS user can log incident directly from the Service Catalog using a record producer.
  • An End user can request to create an incident using the ‘Connect’ chat icon that appears on the upper-right corner of the instance. On the chat window the user can add an ITIL user and also provide a short description of the issue. Based on the description, the ITIL user creates an appropriate incident.

 

3. When should I create an Incident vs. a Request?

Create an incident when there is any unplanned interruption or degradation in the quality of an existing IT service and create a request when you want to put a formal request to the IT service desk to provide something. A request can be for a new hardware or application, information, training etc.

Example: If the existing RAM in your system is malfunctioning, then create an incident but if you want a new RAM for your system, raise a request.

Incident Requests are requests that denote the failure or degradation of an IT service. For example, unable to print, unable to fetch mails and so on.

Service Requests on the other hand are requests raised by the user for support, delivery, information, advice or documentation. Some examples are installing software in workstations, resetting lost password, requesting for hardware device and so on.


4. How can I assign an incident to a group or a user?

There are two ways to assign an incident to a group or a user:

  • On the Incident form, there are two fields: ‘Assignment group’ and ‘Assigned to’. You can assign a group or a user by clicking the lookup icon next to these fields and selecting an appropriate group or a user.

  • You can set assignment rules from System Policy > Rules > Assignment Lookup Rules. In the Assignment Lookup Rule form, provide the values for the fields such as ‘Category’, ‘Subcategory’, ‘Assignment Group’ and ‘Assigned To’. Assignment lookup rule automatically assigns any incident with the pre-defined Category and Subcategory to the assignment group and/or user that you have provided in the Assignment Lookup Rule form.

  • An ITIL user can self-assign the incidents using the ‘Assign to me’ from the Actions menu that appears on the list view.

 

5. How do I change the Priority of an Incident?

On an Incident form, by default, the ‘Priority’ field is read-only and must be set by selecting the ‘Impact’ and ‘Urgency’ values. For example, if you set the value of ‘Impact’ and ‘Urgency’ to be high, then the value of the ‘Priority’ field will be Critical whereas if you set the value of ‘Impact’ to be medium and ‘Urgency’ to be low, then the value of the ‘Priority’ field will be low. In the Priority [dl_u_priority] table, the values of Impact, Urgency and Priority can be modified in the data lookup rules.

An administrator can either alter the priority look-up rules (in the Priority [dl_u_priority] table) or disable the “Priority is managed by Data Lookup - set as read-only” UI policy and create their own business logic.

 

6. What is the Configuration Item (CI) field that I see on the Incident form?

In the ‘Configuration item’ (CI) field, you need to select the component of the business service which is affected and for which the incident is being logged. For example – email, blackberry, e-commerce.      

 

7. What is the Importance of associating the right CI to an Incident?

Configuration Management Database (CMDB) contains a collection of configuration items (CI) as well as descriptive relationships between such CIs. When populated, the database becomes a means of understanding how critical assets such as information systems are composed, what are their upstream sources or dependencies, and what are their downstream targets.

CI Relationships form a crucial part of CMDB because with relationships, the users accessing CMDB can understand the inter-dependencies between the CIs, and in the case of a failure, the impact caused on another CI can be identified.

If you open an Incident form and enter a value in the Configuration item field, you will notice that a Dependency view icon appears next to the lookup icon. If you click the Dependency view icon, it will show you the upstream and downstream dependencies of that CI. If say, the CI item is email and is not working, you can check the dependency map to find out the servers on which this CI is dependent on and then validate the servers one by one to get to the root cause of the issue. This will be possible only when the relationships between CIs is well defined in the CMDB table.

 

8. How do I associate multiple Business Service or CIs with the incident?

When you populate the ‘Business Service’ and the ‘Configuration Item’ fields on the incident form and save the record, the selected values appear on the “Impacted Services/CIs” and “Affected CIs” related lists respectively.

If you want to add multiple affected CIs or impacted services, use the “Add” button provided on the related lists.

Note: If you have modified the cmdb_ci (configuration item) on the incident, the related list of “Impacted Services/CIs” will not reflect the change unless you do it manually using the “Refresh Impacted Services” UI action from the context menu.

 

9. How do you find relevant knowledge article on the Incident form? Is it possible that the Short description field display knowledge articles?

On the Incident form, in the Short Description field, type the subject on which you want to find relevant knowledge articles. You can also type the subject in the Related Search field, in the Related Search Results section. All the articles relevant to the subject appears in the Related Search Results section.

A System Admin can configure the search results based on specific user field by performing the following actions:

a)  Navigate to Contextual Search > Table Configuration.
b)  Click on the ‘Incident [incident]’ configuration record.
c)  In the ‘Search as’ tab, select the ‘Enable search as’ checkbox.
d)  From the ‘Search as field’ choice list, select the field based on which you want to see the filtered search results.
     For example: Select caller field (consider ITIL User as the caller).

After updating the incident configuration record, if you open any incident and view the ‘Related Search Results’, you will now find two tabs:

  •  My Results: Includes search results for the logged in user.
  •  ITIL User Results: Includes search results that is common for both the logged in user and the user you have selected in the ‘Search as field’ choice list.

Note that the search results not only display the relevant knowledge article but also related service catalog items. For each search record, an 'Order' or an 'Attach' button appears for you to order the service catalog item or attach the knowledge article with the current incident record.

 

10. What is triage in Incident Management?

Triaging an incident involves two major activities. Firstly, classifying the incident into the right assignment group. Secondly, involving the right set of people in order to resolve the incident as quickly as possible. Identifying the correct and most appropriate assignment group or person for the incident is the most basic purpose of triage in incident management.

 

11. What is triage in ITIL?

A process for sorting inefficient operations into ITIL processes based on the client's need for or likely business benefit from immediate improvement. ITIL Triage is used in the data center, at disaster recovery sites, and in boardrooms when limited financial resources must be allocated.

 

Incident Management Life Cycle

12. How does incident state changes when the caller updates an incident that is in On Hold state?

If the incident is in the On Hold state and the On hold reason is Awaiting Caller, the incident state changes to In Progress when the caller updates the incident. In case of all other On hold reason, the incident state remains in the On Hold state.

 

13. Are Resolution code and resolution notes fields mandatory while closing an incident? What if I do not have those fields in the form or use scripts to avoid this entry?

Before Kingston, the ‘Close codes’ and ‘Close notes’ fields were controlled using UI policies. UI policies are valid only if the fields on which the UI policies are applicable are present on the form. So, you could avoid entry to the ‘Close codes’ and ‘Close notes’ fields by not adding the fields to the form.

From Kingston release onwards, we have moved the UI Policy to a Data Policy that works on the server side. Hence, you will need to fill the ‘Resolution codes’ and ‘Resolution notes’ fields to be able to submit the form.

Note: Data policies on the ‘Resolution codes’ and ‘Resolution notes’ fields are not available OOB for existing or upgrade customers. Existing or upgrade customers can create a custom Data Policy on Incident table that makes the ‘Resolution Notes’ and ‘Resolution Codes’ fields mandatory.

 

14. What is the difference between copying an Incident and creating a child incident?

‘Copy Incident’ (UI action in the contextual menu) copies the details of an existing incident record to a new incident record. There is no association between the original/source incident and the new incident.

‘Create Child Incident’ (UI action in the Context menu) copies the details of the parent incident and associates the new incident to the parent incident. The originating incident number is copied to the ‘Parent Incident’ field of the newly created child incident.

Note: The list of attributes and related lists that will be copied from the originating incident will be the ones that are mentioned in the following incident properties:

  • List of attributes (comma-separated) that will be copied from the originating incident
  • Related lists (comma-separated) that will be copied from the originating incident
  • List of attributes (comma-separated) from Affected CIs (task_ci) related list that will be copied from the originating incident
  • List of attributes (comma-separated) from Impacted Services (task_cmdb_ci_service) related list that will be copied from the originating incident

 

15. How does the parent incident and the child incident state synchronizes with each other?

Starting Kingston release, the parent-child incident state synchronization is as follows:

Note: If an incident is reopened, all the child incidents of that incident are reopened and the state of the child incidents is changed to In Progress.

 

16. How can I create Problem/Change from an incident?

You can attach an incident to a problem or a change. To create a problem from an incident record, click the Additional actions menu icon and click ‘Create Problem’. If you want to relate incident to a change request, you can click the Additional actions menu icon and click ‘Create Normal Change’, ‘Create Emergency Change’ or ‘Create Standard Change’. The parent record (incident record) will be automatically associated with the new Problem or Change record.

Note: The ‘Create Standard Change’ UI action will be introduced in London release.

If you need to associate multiple Problems or Change Requests to a parent incident record, you can accomplish this by using the “New” and “Edit” buttons on the “Problems” and “Change Requests” related list present on the incident form.

 

17. Why are resolved incidents getting automatically closed?

If the incident property “Number of days (integer) after which Resolved incidents are automatically closed. Zero (0) disables this feature” has number of days defined in its field then resolved incidents will be automatically closed from the time the incident has been resolved/updated.

If the Incident property “Enable auto closure of incidents based on Resolution date. Setting this to 'No' will make auto closure to run based on the Updated date.” is selected, the incident will be auto-closed based on the resolution date else it will be based on the last updated date.

Note: Calculation of days based on Business Days is not supported out-of-the-box.

 

18. When is the right time to engage a Problem Manager for an Incident?

When incidents occur, the role of incident management is to restore service as rapidly as possible, without necessarily identifying or resolving the underlying cause of the incidents. If incidents occur rarely or have little impact, assigning resources to perform root cause analysis cannot be justified. However, if an individual incident or a series of repeated incidents causes significant impact, problem management is tasked with diagnosing the underlying cause of the incidents and, ultimately, to identify a means to remove that cause.

 

Major Incident Management 

 19. What are major incidents?

Major incidents are those incidents for which the degree of impact on the business/organization is extreme. Incidents for which the timescale of disruption – to even a relatively small percentage of users – becomes excessive should also be regarded as major incidents. It is possible to define some of these major incidents, but most will be prioritized as they happen based on impact and urgency. The major incident module has been introduced since Kingston release.           

 

20. Can P1 incident be considered as a major incident?

Some organizations equate a major incident with a Priority 1 Incident (or a Severity 1 Incident) but the mapping is not that crisp. Incident priority is for sorting and prioritizing (and measuring and reporting). A major incident is about abandoning the normal process and switching to different procedures.

 

21. What is the procedure for handling a major incident?

A separate procedure, with shorter timescales and greater urgency, must be used for major incidents. A definition of what constitutes a major incident must be agreed and ideally mapped on to the overall incident prioritization system.

Where necessary, the major incident procedure should include the dynamic establishment of a separate Major Incident Management team, under the direct leadership of the Major Incident Manager. The Major Incident Management team is formulated to concentrate on that particular major incident and to ensure that adequate resources and focus are provided for finding a fast resolution.

If the cause of the incident needs to be investigated at the same time, then the Problem Manager will be involved as well, but the Incident Manager must ensure that service restoration and underlying cause are kept separate. Throughout, the communication manager will ensure that all activities are recorded and users are kept fully informed of progress. Communication is a hugely important activity in handling major incidents.

The Problem Manager should in these circumstances be notified (if not already aware) and should arrange a formal meeting with interested parties (or regular meetings if necessary). These should be attended by all key in house support staff, vendor support staff and IT services management, with the purpose of reviewing progress and determining the best course of action. The communication manager should attend these meetings and ensure that a record of actions/decisions is maintained, ideally as part of the overall incident record as major incidents are still logged in the same way as all other incidents (it is only the priority and management of the incident which is different).

If no Problem Manager or Problem Process Owner is currently in place, an Incident Management Executive and Major Incident Management team could take on the activities described above.

Note: The Incident Management - Major Incident Management plugin (com.snc.incident.mim) must be activated to work with major incidents.

 

22. What are major incident candidates?

Incidents that we want to promote as a major incident is first proposed to a major incident candidate. The major incident manager then analyses the candidate and decides whether the candidate can be considered as a major incident. So, we can say a major incident candidate is that state of an incident when it is proposed as a possible candidate for major incident but not yet approved by the major incident manager to be called a major incident.

 

23.What are the ways to create a major incident candidate?

  •  From the left navigation pane, click Incident > Major Incidents > Create Major Incident Candidate.
  •  On the incident form, right-click on the header and from the context menu click ‘Propose major incident’.

Note: The major incident candidate functionality has been introduced since Kingston release.

 

24. What are the ways to create a major incident?

  •  From the left navigation pane, click Incident > Major Incidents > Create Major Incident.


  •  On the incident form, right-click on the header and from the context menu click ‘Promote to major incident’.


  •  You can create major incident trigger rules to define conditions under which an incident is automatically considered as a major incident candidate. To create Major incident trigger rules, navigate to Major Incidents > Administration > Major Incident Trigger Rules and click “New”.

 

25. What is the difference between create a major incident from a major incident candidate Vs. the ways to create a major incident?

  • When you directly create a major incident candidate and then create a new major incident from that candidate, a new incident is created that becomes the major incident. The candidate is added as a child of the major incident. The major incident is automatically assigned to the Major Incident Management group. If rota is defined, the system automatically assigns the user available for the on-call rota for the 'Assigned to' field.
  • When a major incident candidate is promoted as a major incident, the incident itself is considered as a major incident. There is no new incident that is created. The value in the existing 'Assignment group' or the 'Assigned to' field does not change to the major incident management group or any user.

 

26. Incident form is not behaving as expected even when I have activated the major incident management plugin. Am I missing something?

I have activated the major incident management plugin, raised a new P1 incident but still do not see any changes to the incident form or any option to create a major incident candidate or to promote a candidate to a major incident. I can see a trigger rule set for Priority = 1 Critical so I was expecting this to become a candidate for major incident but nothing happened. What am I doing wrong?

The issue may be because the trigger rule you are looking at is inactive. ServiceNow ships 3 trigger rules OOB, all marked as active=false. You can review and activate the ones you need for your business.

 

27. Do we have a place from where we can manage major incidents?

The major incident workbench is a single pane view specifically designed for major incident managers, communication managers and resolver groups to manage major incidents.

To navigate to major incident workbench, click 'View Workbench' that appears on the header of the Incident form.

 

28. I do not see the View Workbench button on the header of the Incident form!

You will see the button when the incident is either proposed or is accepted as a major incident.

 

29. What is a major incident dashboard (PA dashboard) used for?

Major incident dashboard provides at-a-glance view of all major incident information.

 

Analyzing and Improving IM process

30. What are the fields related to time calculation that are available on my Incident form?

There are a number of time-tracking fields available for users. The fields are as follows:

  • Business resolve time: It is the total amount of time taken to resolve a task within business hours. For example, if the user works for 9am to 6pm from Monday to Friday, the business resolved is the time to complete the task within that time excluding holidays, weekends, and out-of-office hours. Business resolved time is calculated when the task is resolved.

  • Calendar resolve time: It is the total amount of time taken to resolve a task including holidays, weekends, and out-of-office hours.

  • Business Duration: It is the total amount of time spend actively on the task within business hours. For example, if you are working on an incident record, then move to some other record or browser, and come back again to the initial incident record, the time when you were active in other records will not be counted.

  • Calendar Duration: It is the total amount of time spend actively on the task. For example, if you are working on an incident record, then move to some other record or browser, and come back again to the initial incident record, the time when you were active in other records is also counted.

 Note: All the above time-tracking fields are based on the calendar record available at System Policy > SLA > Calendars (sys_calendar table). These records define the working days, hours and holidays. You can also create customized calendar schedules by clicking the “New” button on the Calendar list view.

 

31. What are Response Times and Resolution Times?

Response Time is defined as the amount of time between when the client first creates an incident report (which includes leaving a phone message, sending an email, or using an online ticketing system) and when service desk agent actually responds (automated responses don’t count) and lets the client know they are currently working on it.

Resolution Time is defined as the amount of time between when the client first creates an incident report and when that issue is actually solved.

Note: The above is just for theoretical explanation and is not tracked anywhere on the instance.

 

32. What is a known error and when is it identified?

A known error is a fault in a configuration item (CI) identified by the successful diagnosis of a problem and for which a temporary workaround or a permanent solution has been identified. Therefore, a known error is an already identified solution to an existing or a new issue. A known error is identified when the cause of the problem is known.

 

33. What are the Key Performance Indicator (KPI) of Incident Management?

Measurements are important across all stages of the ITIL lifecycle. Each process has metrics that should be monitored and reported to effectively evaluate the overall performance.

Examples of Incident Management KPIs that are shipped with the base system are: 

  • % of high priority incidents resolved
  • % of incidents resolved on first assignment
  • % of incidents resolved within SLA
  • % of reopened incidents
  • Average time to resolve a high priority incident
  • Average time to resolve an incident
  • Number of incidents created per user

 Users can enable or disable a KPI and customize KPI conditions.

Integration with Performance Analytics provides daily data collection and drill-down capabilities on KPI data. KPIs should be related to Critical Success Factors (CSF) and CSFs should be related to objectives. This relationship helps with decision support for maintaining current state and improving to desired state. Although each organization is different, relevant reports for users, staff and management will help support important decisions that can be used to improve both the processes and the business as a whole.

 

34. What is the difference between Critical success factors (CSF) Vs. Key Performance Indicator (KPI) of Incident Management?

A CSF is a critical factor or activity required for ensuring the success of a company or an organization. Alternative terms are key result area (KRA) and key success factor (KSF). These are often used to denote the mission statements, vision of an organization, or simply for a business strategy.

Key performance indicators or KPIs, on the other hand, are measures used to quantify management objectives, are accompanied with a target or threshold and enable measurement of performance.  Another key term is measure of KPIs (threshold), which simply indicates the plotting of achievement against a definition, which may be either time based or denoted against numbers.

 

35. What are Mean Time To Identify (MTTI), Mean Time To Repair (MTTR) and Mean Time To Failure (MTTF)?

  • Mean Time To Identify (MTTI) is the length of time a system or other product takes to detect an incident.

  • Mean Time To Resolve (MTTR) is the average time between the start and resolution of an incident. But first you have to identify the issue.

  • Mean Time To Failure (MTTF) is the length of time a system or other product is expected to last in operation.

Note: You can define these metrics by navigating to Metrics > Definitions.

 

36. What is Post Incident Review (PIR)?

After the major incident is resolved, a post-incident review is conducted to analyze the incident and understand what can be done to prevent a similar incident in the future. This also provides an opportunity to review the incident response process and identify areas for improvements.


To streamline the process, a post-incident report is created when an incident is resolved which can then be reviewed and updated during the review process before sharing the report with stakeholders.                     

 

 37. What are PIR reports? Where do I find them?

The post-incident report provides a summary of the incident, findings, resolution information including any change requests and problem records created. A timeline of activities is available in the report which can be edited by the major incident manager to include important activities, e.g., actions taken to resolve an incident. 

 

Troubleshooting

38. Incidents are getting created with same incident number but different sys_id?

In ServiceNow, duplicate numbering is a rare case since numbering does not enforces uniqueness by default.

However, if duplicate numbers are created, we need to set the Number field as unique at dictionary level in our instances as follows:

  • Clean up all duplicate entries.

  • Update the number field on the number maintenance record for that incident table.

           Note: The number should either be greater than or equal to the highest number in the incident list.
           For example, if your highest incident number is INC176601 then update the number field in the number maintenance record for incident to 176601.

  • Update the number field in the number maintenance record for incident.

To know more about duplicate incidents, refer KB0538764.

 

39. On the incident, an ITIL user is not able to find the "Closed" state for the incident.

Users with itil_admin and admin role can see the "Closed" state of an incident. 

 

 

 

What is the Lifecycle of MIM

Major Incident Management Process involves identifying, prioritizing, and resolving critical incidents to minimize business impact.

Incidents are classified based on their impact and urgency

A Major Incident Manager is appointed to lead the response team

Communication is key to ensure all stakeholders are informed

Root cause analysis is conducted to prevent future incidents

Continuous improvement is achieved through regular reviews and updates

Lifecycle of Major incidents can be defined as below:

Incident Identification

Incident Logging

Incident Categorization

Incident Prioritization

Investigation & Diagnosis

Escalation & SLA management

Recovery & Resolution

-----

The Lifecycle of Critical/Major Incident management can be define in below stages :

Major incidents are considered to have 4 main stages, namely:

Identification

Containment

Resolution

Maintenance

Which of course can be compared with lifecycle of Incident management process:

Identification

Logging

Categorization

Prioritization

Investigation & Diagnosis

Recovery & Resolution

---------

What is the relation between Incident and change?

Incidents can trigger changes to prevent future incidents.

Incidents can reveal weaknesses in the system that require changes to prevent future incidents.

Changes can also cause incidents if not properly managed.

Incident and change management processes should be closely integrated.

For example, a major incident caused by a software bug may require a change to the code to prevent it from happening again.

Similarly, a change to a server configuration may cause an incident if not properly tested and implemented.

Effective communication between incident and change management teams is crucial to ensure timely and effective resolution of incidents.

 

Incident is unplanned interruption and Change is a planned modification.

Incidents and changes are closely related as changes can cause incidents and incidents can lead to changes.

Changes can introduce new risks and vulnerabilities that can lead to incidents.

Changes can also be made in response to incidents to prevent them from happening again.

Incidents can trigger the need for changes to be made to prevent similar incidents in the future.

For example, a software update can introduce a bug that causes an incident, which then requires a change to be made to fix the bug.

Or, an incident can reveal a flaw in a process that requires a change to be made to improve the process and prevent future incidents.

 

Situational based question like ....Brief one scenario where you drive a major incident in lot of pressure

During a major incident, I led a team under high pressure to restore service within SLA.

Quickly assessed the situation and identified the root cause

Communicated effectively with stakeholders and team members

Prioritized tasks and delegated responsibilities

Monitored progress and made adjustments as necessary

Ensured all documentation was accurate and up-to-date

Maintained a calm and professional demeanor throughout the incident

 

Have you handle most difficult incident? If Is! What is your approach and what is the learning

Yes, I have handled difficult incidents. My approach is to gather all available information, prioritize actions, and communicate effectively with stakeholders.

Gather all available information about the incident

Prioritize actions based on impact and urgency

Communicate effectively with stakeholders, including regular updates

Ensure all necessary resources are available to resolve the incident

Conduct a thorough post-incident review to identify areas for improvement

Example: Managed a major outage that impacted multiple systems and required coordination with multiple teams. Prioritized actions based on impact and urgency, communicated effectively with stakeholders, and ensured all necessary resources were available to resolve the incident.

What if you wake one day on Moon ? Be prepared

If I wake up on the Moon, I would assess the situation, gather information, and take necessary actions to ensure my safety and survival.

Assess the surroundings and determine the level of danger

Check for available resources and supplies

Establish communication with Earth and inform them about the situation

Evaluate the possibility of returning to Earth or seeking assistance

Take necessary precautions to protect myself from the harsh lunar environment

 

You are in call and solving an incident, there are more incident in queue. What you will do?

I will prioritize the incidents based on their impact and urgency and delegate them to the appropriate teams.

Assess the impact and urgency of each incident in the queue

Delegate the incidents to the appropriate teams based on their expertise

Ensure that the teams have all the necessary information to resolve the incidents

Monitor the progress of each incident and provide support as needed

------

Major Incident Manager need to act accordingly to the Impact and Urgency of the Incident .

then Prioritize and categorize the incident.

 

What is PIR in Incident management?

PIR stands for Post Incident Review. It is a process in incident management to analyze and learn from past incidents.

PIR is conducted after an incident is resolved to evaluate the incident response and identify areas for improvement.

It involves gathering data, analyzing the incident timeline, identifying root causes, and documenting lessons learned.

PIR helps in preventing future incidents, improving incident response processes, and enhancing overall system reliability.

Examples of PIR activities include conducting interviews with involved parties, reviewing incident logs, and documenting recommendations.

The findings from PIR are used to update incident management procedures, train staff, and implement preventive measures.

---------------

It was Post incident review. I didn't know as I used different term Post mortem analysis for the same. The stupid guy rejected me without even let me know what PIR is.. Cognizant need to learn courtesy.

 

Major Incident Management Process

Major Incident Management Process involves identifying, prioritizing, and resolving critical incidents to minimize business impact.

Incidents are classified based on their impact and urgency

A Major Incident Manager is appointed to lead the response team

Communication is key to ensure all stakeholders are informed

Root cause analysis is conducted to prevent future incidents

Continuous improvement is achieved through regular reviews and updates

---------

reparation - Identification - Classification - Prioritisation - Escalation - Knowing the ETR and RFO - Further Escalation - Further Escalation - Restoration - Closure - Postmortem.

 

What time it take to provide RCA?

The time taken to provide a Root Cause Analysis (RCA) varies depending on the complexity of the incident.

The duration of providing an RCA depends on the incident's complexity, scope, and available data.

Simple incidents with clear causes may require less time for RCA.

Complex incidents involving multiple systems or dependencies may take longer to analyze and determine the root cause.

Availability of relevant data and documentation can significantly impact the time required for RCA.

Collaboration with various teams and stakeholders may be necessary to gather information and validate findings.

Efficient RCA processes and tools can help streamline the analysis and reduce the time taken.

 

What is difference between responsible and accountable?

Responsible and accountable are two different levels of involvement and ownership in a task or project.

Responsible refers to the person who is assigned to complete a task or achieve a goal.

Accountable refers to the person who is ultimately answerable for the outcome or results of the task or project.

Responsibility can be shared among multiple individuals, but accountability rests with a single person.

Being responsible means having the duty or obligation to perform a task, while being accountable means being answerable for the success or failure of that task.

For example, in a Major Incident Management role, the Major Incident Manager may be responsible for coordinating the response and resolution of incidents, while the Incident Management Team Lead may be accountable for ensuring that incidents are resolved within the agreed-upon timeframes.

----------

Its Comes from RACI Matrix from ITIL Framework :

Responsible: People or stakeholders who do the work. They must complete the task or objective or make the decision.

Several people can be jointly Responsible.

Vs

Accountable: Person or stakeholder who is the “owner” of the work.

 

What is difference between change and problem

Change is a planned action to improve a service, while problem is an unplanned event that causes disruption to a service.

Change is a proactive action taken to improve a service, while problem is a reactive response to an issue that has already occurred.

Change is a controlled process that is planned, tested, and implemented with minimal disruption to the service, while problem is an unplanned event that causes disruption to the service.

Change is usually initiated by the service provider, while problem is usually reported by the service user.

Examples of change include upgrading software, adding new features, or changing hardware, while examples of problem include system crashes, network outages, or security breaches.

 

what is the KPIs of major incident

Key Performance Indicators (KPIs) of major incidents measure the effectiveness and efficiency of incident management.

Response time: Measure the time taken to respond to a major incident.

Resolution time: Measure the time taken to resolve a major incident.

Customer satisfaction: Measure the satisfaction level of customers affected by major incidents.

Incident recurrence rate: Measure the frequency of major incidents recurring.

Mean time between failures (MTBF): Measure the average time between major incidents.

Mean time to recover (MTTR): Measure the average time taken to recover from major incidents.

Number of major incidents: Measure the total number of major incidents occurring within a specific period.

Escalation rate: Measure the rate at which major incidents are escalated to higher levels of management.

Availability: Measure the percentage of time services are available without major incidents.

Impact on business: Measure the financial and operational impact of major incidents on the business.

The KPI are as follows :

1).MTTR-Mean Time to Response

2).MTTA-Mean Time to Acknowledge

3).MTTBF-Mean Time between Failures.

3).MTTD- Mean Time to Detect.

 

what are CFS

CFS stands for Call For Service.

CFS is a term commonly used in emergency services and law enforcement.

It refers to a request or notification for assistance or response to a specific incident.

CFS can include various types of incidents such as crimes, accidents, or medical emergencies.

Dispatchers or operators receive CFS and assign appropriate resources to handle the situation.

Examples of CFS include a 911 call reporting a burglary, a traffic accident, or a person in need of medical attention.

 

Lifecycle of an incident

The incident lifecycle involves identification, logging, categorization, prioritization, investigation, resolution, and closure.

Identification: recognizing that an incident has occurred

Logging: recording the details of the incident

Categorization: assigning a category to the incident

Prioritization: determining the urgency and impact of the incident

Investigation: gathering information to determine the cause of the incident

Resolution: taking action to resolve the incident

Closure: verifying that the incident has been resolved and documenting the resolution

 

What is the difference between impact and urgency?

Impact refers to the extent of the consequences caused by an incident, while urgency refers to the time sensitivity of resolving the incident.

Impact is about the severity and extent of the damage caused by an incident.

Urgency is about the time sensitivity and the need for immediate action to resolve the incident.

Impact is measured in terms of the disruption caused to the business or services.

Urgency is measured in terms of the time constraints and the impact on business operations.

For example, a critical system outage may have a high impact as it affects multiple users and services, but if it can be resolved within a few hours, the urgency may be lower.

On the other hand, a minor incident with low impact may require immediate attention due to its urgency, such as a security vulnerability that needs to be patched immediately.

 

What is defined as severity and priority?

Severity and priority are used to classify incidents based on their impact and urgency.

Severity refers to the impact an incident has on the business or system.

Priority refers to the urgency with which an incident needs to be resolved.

Both severity and priority are usually classified into levels, such as low, medium, and high.

The severity and priority of an incident can be determined based on factors such as the number of users affected, the criticality of the system, and the potential financial impact.

For example, a critical system outage affecting a large number of users would be classified as high severity and high priority.

 

What are Emergency Change and Urgent Change What is known error What is the incident management lifecycle what contains in the Service operation module Suppose the customer toll-free number got down and then what will be lost for the customer

Answers to questions related to Incident Management

Emergency Change and Urgent Change are types of changes that are implemented quickly to resolve an incident or prevent an incident from occurring

Known Error is a problem that has been identified and has a documented root cause and a workaround

Incident Management lifecycle consists of identification, logging, categorization, prioritization, diagnosis, escalation, resolution, and closure of incidents

Service Operation module contains processes related to managing services in operation, such as incident management, problem management, event management, access management, and request fulfilment

 

 

 

 

 

 

 

Snow PDF

 ServiceNow PDF Try it Yourself » < iframe  src ="https://drive.google.com/drive/folders/1MHcrQb2TJTQtU8zyYZN1koJn5mD3r6YZ?usp=shari...