Appendix A I Appendix B I Appendix C

This paper describes two methods of reliability testing that are more effective in finding defects in a faster time frame than traditional environmental testing with temperature and shock/vibration. The major benefits of these two methods are greatly increased product reliability and reduced customer returns resulting in lower warranty cost. This is due to the ability to detect latent defects as well as marginal design limits that do not readily surface during traditional testing.To reap the benefits of any of these methods, these have to be implemented in R&D and manufacturing. In R&D, proper testing will identify design defects or limitations and in manufacturing a screening test will weed out latent assembly and component defects.

A typical Alcon system reliability test program consists out of an operational test at the specified high and low temperature limits, a 40 ºC high humidity test and temperature cycling between the operational limits. Then a shock (or bump) and vibration test is performed at non-operating (storage) conditions. Then a statistical relevant sample of systems is run around the clock for extended time to demonstrate error free operation and reveal any potential wear or performance reduction issues. This entire process usually takes about three months or often longer when defects are identified and have to be mitigated. In the mean time reliability calculations are conducted on the PCBA's and these are subjected to rapid thermal cycling and mechanical stress tests. Then the assumption is made that the system is reliable. We know from actual field and customer experience that this is not always the case.

Manufacturing only uses burn-in (better called run-in) for not more than 12 hours for all products. This merely proves that it works at the time. If we are lucky, we catch the infant mortality of components. It does not identify latent defects or marginal components because the system is not stressed at that time. These latent defects are the ones that get us in trouble.

The limitation of these methods is that we never know the design margins and often miss marginal performing sub-systems or components. The reason is that we do not stress the system beyond its operating parameters. For example, a device that is specified to operate between 10 ºC and 35 ºC might not be reliable at 37 ºC. This means that we only have a design margin of 2 ºC and something might be operating on its edge of its range. The same is true for vibration where it can just pass, but we typically do not see long term structural wear issues. One can argue that with the typical three samples used, there is a possibility that other systems might not pass some of these tests without any failures. All the above is also valid for final manufacturing test, where the systems are not stressed at all.

In addition, the above tests do not demonstrate any long term wear or degradation issues in relation to actual field performance. What is lacking in these tests is the ability to correlate the integrity relationship between the products life of one or two times warrantee to the integrity of the product verification tests.

Current industry practices are used to stress the system beyond the usual operational and/or storage conditions to expose defects early on in the design process. The two main methods used are HALT/HASS and Accelerated Aging.

With any of these two methods, it has to be clearly understood that the initial R&D testing is not a test but an investigation. From this investigation a Pass/Fail test can be derived that can be used in product verification and manufacturing as a screening. The object of the investigation is to find the design margins, latent or marginal designs. This is an interactive approach with most likely several or sometimes many failures that have to be properly addressed. The result will be a more robust and reliable product that can withstand the rigors of its environment during operating as well as non-operating conditions.
When the bugs and weak points of the device have been exposed and corrected, a pass/fail test can be derived by specifying lower limits that still exceed the normal operating conditions for short periods of time.


 
JUMP TO PAGE
1 2 3 4 5

Peter Philips, MSEE, January 2008
BACK TO: PETER PHILIPS   IDEAS
© 1997-2010 TECHMAN/KANATA Legal Notice Site map