top of page

SOFTWARE FAILURE MODE AND EFFECTS ANALYSIS (FMEA)


Failure Mode and Effects Analysis (FMEA) is a key safety assessment analysis that determines failure modes at system, hardware and software level. Overlooking Failure Modes can often cause system or functionality failure which directly impacts a systems safety performance, reliability and quality.


FMEA is a bottom-up approach which has four key phases - identification of fault, assessment of impact, determination of potential causes and their resolutions, and finally testing and documentation of analysis.


FMEA addresses the effect of failures at the system, software and hardware level. The outcome of the analysis helps us identify gaps in safety requirements specification and provides input for component testing, integration testing and system level testing. This paper describes the application of Failure Mode and Effects Analysis (FMEA) to software modules.



FAILURE MODE AND EFFECTS ANALYSIS (FMEA)

 

Software Failure Mode and Effects Analysis (FMEA) is a bottom-up analysis technique to identify the consequences of possible software failure modes on the software system. An example below outlines the application of Software FMEA to Brake ECU (Electronic Control Unit).


As depicted in figure 1 below, Brake ECU receives brake pedal sensor input from the driver as an analog signal and vehicle speed information from another ECU via CAN which in turn outputs brake torque request and brake module status to other ECUs over CAN.



FMEA starts with identifying different software failure modes that can influence the subsystem or system. The four phases (mentioned above) is one potential approach to perform FMEA. A brief expansion of these phases are:

  • Look at the system functionality holistically and identify a comprehensive list of potential failure modes

  • For each identified failure mode in step 1, assess the implications of failure on connected software or hardware system and also on the overall performance of the system 

  • Once we know the overall impact, we isolate potential causes for failure. Once the causes are identified the system design needs to be enhanced to adequately prevent future failures

  • Once the design change is made, we retest the failure mode to ensure that the system appropriately handles the failure before release. Then, the necessary documentation in done 


Now that we have a brief understanding of the approach, let’s follow these steps to perform software FMEA on Brake ECU depicted in figure 1 above.


STEP 1:


For the example above, let’s start by listing individual components including interfaces, the function they provide and their failure modes.

The only component that is of interest here is Brake ECU with inputs and outputs. The function can be defined as:

  1. transmitting brake torque request, based off inputs: brake pedal sensor and vehicle speed, to other vehicle modules. 

  2. sending brake module fault status to other vehicle modules.


The failure modes for interfaces and the component (Brake ECU) can be defined as follows:

Component / Interface

Function

Potential Failure Mode

Brake pedal sensor analog voltage input 


No signal

Signal voltage out of range



Vehicle speed


Message corruption

Message loss



Message timeout



Brake ECU

Transmits brake torque request

NO brake torque request

DELAYED brake torque request



INVALID brake torque request



Sends brake module fault status to other vehicle modules

NO brake module status 


DELAYED brake module status 



INVALID brake module status





STEP 2:


Once we have listed the failure modes, let’s determine the effect(s) of the failure on other system components and on the overall system for each failure mode.


For the example above, we determine the effect(s) of receiving invalid or delayed vehicle speed, brake pedal analog voltage out of range, or not receiving anything at all and ask this question: 

  • What if the brake pedal input requested by the driver is not received for a certain period of time?

  • What if we receive corrupted vehicle speed over CAN? Are we okay with 1* corrupted message or not?

  • Does the failure impact vehicle behavior resulting in high severity?


The table below lists the potential effect(s) of failure which might or might not impact vehicle behavior.

Component / Interface

Function

Potential Failure Mode

Potential Effect(s) of Failure

Brake pedal sensor analog voltage input


No signal


Signal voltage out of range




Vehicle speed


Message corruption


Message loss




Message timeout




Brake ECU

Transmits brake torque request

NO brake torque request

No brake command issued to the vehicle actuator when requested  by the driver

DELAYED brake torque request 

Brake command issued too late to the vehicle actuator when requested by the driver



INVALID brake torque request

Invalid brake command issued to the vehicle actuator when requested by the driver which might cause overbraking



Sends brake module fault status to other vehicle modules

NO brake module status 

No brake module status issued to other vehicle modules in order to notify brake ECU failure


DELAYED brake module status 

Brake module status issued too late to other vehicle modules in order to notify brake ECU failure



INVALID brake module status

Invalid brake module status issued to other vehicle modules in order to notify brake ECU failure





STEP 3:


After we are done defining the failure modes and potential effect(s) of failure, the next step is to determine potential cause(s) of failure. For each failure mode, we determine all possible causes, including both hardware and software. Listing potential cause(s) of failure helps us figure out which design controls prevention technique to be implemented in order to mitigate these failures. We can have the mitigation strategy defined only in hardware or software or both.

Component / Interface

Function

Potential Failure Mode

Potential Effect(s) of Failure

Potential Cause(s) of Failure

Brake pedal sensor analog voltage input 


No signal



Signal voltage out of range





Vehicle speed


Message corruption



Message loss





Message timeout





Brake ECU

Transmits brake torque request

NO brake torque request

No brake command issued to the vehicle actuator when requested  by the driver

[brake pedal sensor analog voltage input]

No signal

[vehicle speed]

Message loss





No power supply 





DELAYED brake torque request

Brake command issued too late to the vehicle actuator when requested by the driver

[vehicle speed]

Message timeout



[Brake ECU]

Internal fault





INVALID brake torque request

Invalid brake command issued to the vehicle actuator when requested by the driver which might cause overbraking

[brake pedal sensor analog voltage input]

Signal voltage out of range



[vehicle speed]

Message corruption





[Brake ECU]

Internal fault





Sends brake module fault status to other vehicle modules

NO brake module status 

No brake module status issued to other vehicle modules in order to notify brake ECU failure

[Brake ECU]

Internal fault


No power supply





DELAYED brake module status 

Brake module status issued too late to other vehicle modules in order to notify brake ECU failure

[Brake ECU]

Internal fault



INVALID brake module status

Invalid brake module status issued to other vehicle modules in order to notify brake ECU failure

[Brake ECU]

Internal fault





STEPS 4 and 5:


After we are done identifying potential failure modes and causes of failure with the severity of failure captured under potential effects column, we list down current design controls prevention and recommend action(s) to mitigate these failures if already not in place. For example:


  • To mitigate brake pedal sensor failure, we can add a redundant sensor to fall back on in case the primary sensor fails. Also, we can add plausibility check which reads both the sensor voltages and compare against each other and set a fault if the difference between the two increases by some value for some period of time. 

  • To check for CAN message corruption, we can verify CRC (Cyclic Redundancy Check), parity bit, etc. added to a field of CAN messages, on the receiver side and set a fault flag if invalid CRCs exceed a threshold.

  • To check for CAN message drop or loss, we can verify MC (Message Counter), sequence number, etc. added to a field of CAN messages, on the receiver side and/or check for timeout.

  • To mitigate risk, we can add sensor fault detection strategy in hardware, like what happens if the power supply to the sensor goes off, what if the sensor malfunctions, what if there is a register failure, etc. and what actions to take.


CONCLUSION


Bottoms up FMEA analysis approach helps in functionality level failure modes identification, assessment of severity and the impact on the overall system. If there is no impact, then we can be fairly confident that the system design is robust. If there is some impact, then preventive measures need to be initiated as highlighted in this paper.


14 views0 comments

Recent Posts

See All

Autonomous Vehicle Safety Overview

The research and development of autonomous vehicles can potentially revolutionize the transportation sector. However, as several...

留言


bottom of page