Posts Tagged ‘layered redundancy’

System health monitors help ensure continuous operations

Friday, November 12th, 2010

Last month an electronic monitoring vendor experienced a prolonged system outage. Such events have affected many of the best technology companies in the world, including Blackberry, Twitter and Facebook. All of these reliable, reputable technology companies have had their systems go down unexpectedly at one time or another. At Satellite Tracking of People, we work hard to avoid system outages.

To start, we architected VeriTracks with layers of redundancy. Our system is scaled horizontally in each of our Data Centers, which means individual components of the system, such as a power unit, a server or a cooling unit, can fail without impacting the entire system. Any one component can go down, yet the system will continue running normally. Our Data Centers are geographically separated, which allows us to quickly restore operations at the second site should the first one experience a catastrophic failure.

We also have an extensive monitoring system proactively watching the system’s operations. We have more than 1,000 individual monitors checking all aspects of system operations. Our monitors report everything from available storage space, the number of BluTag devices and BluHome units calling in at any given time, to the health of each individual server and the data storage disk. Because these monitors notify our staff of instances of exceeded thresholds early on, appropriate actions can be taken 24/7 to avoid a major system issue.

Our systems are designed with high availability and robust monitoring. We routinely test the system’s response capability to catastrophic situations and third parties double check our work. Recently our entire software system successfully passed an independent security assessment in accordance with the Federal Information Processing Standards (FIPS) Pub 800-53. This assessment reviews and analyzes the management, operational and technical safeguards or countermeasures prescribed for an information system to protect the confidentiality, integrity and availability of the system and its information.

No vendor or agency wants an experience similar to what happened last week. But we realize it may have created some question in your mind about what STOP does to prevent such an occurrence. We wanted to share with you our daily routine to prevent such a catastrophic event and how we ensure that you can always access your monitoring data.