Yesterday our network provider for all E-Starr servers, AtlantaNAP, experienced a full network outage. Because of this, all sites on E-Starr and many other hosting providers were inaccessible for about an hour. Because our main site was also down, we blasted the information on social media for our clients to know the issue was being handled. Here is the official statement on this rare outage:
“On Monday, March 31 at 2:17 PM EST, GNAX routers were unable to effectively route traffic to the internet. The issue stemmed from a large peer at the TIE peering fabric flooding the peer routers with unproductive routes, which crippled our route tables on the adjacent routers and then propagated and affected our core routers as BGP neighbors. The immediate fix re-converged routes at 3:12 PM EST.
To prevent this type of incident occurring again in the future, our network team has applied more stringent access lists in those peers. Also, our stricter configuration will terminate a BGP peer if they show a sudden, unexpected increase in routes, further protecting our customers from this type of occurrence in the future.
The immediate fix was determined and implementation was started in less than 30 minutes, as our network team launched in to action. However, due to the scale and variety of our network infrastructure, it took a few hours to fully diagnose and confirm the issue from the logs, design a more permanent resolution and carefully test it.
We apologize for the inconvenience and trouble this disruption caused to our customers and sincerely thank you for your patience and understanding as we worked through the issue. We know how critical our services are to our customers. We will do everything we can to learn from this event over the coming days and weeks to further understand the details and refine our resolution and processes. We are committed to providing our customers mission-critical IT infrastructure, therefore we are implementing a status page that will give periodic updates during any future issues. We will only update events as they are confirmed with factual information.”