Major Service Outage — All Services Down

Incident Report for Teradig

Postmortem

Timeline

  • 22:18 Oct 4 - Services offline (ECONNREFUSED on all endpoints)
  • 00:23 Oct 6 - Recovery attempt failed (timeout 48000ms exceeded)
  • 00:45 Oct 6 - Full service restoration (200 OK)

Root Cause

Software bug in virtual network management layer caused data inconsistency and duplication within the cluster networking stack.

Resolution

  1. Removed all network access rules
  2. Restarted VM infrastructure layer
  3. Reapplied network configuration via IaC

Prevention

  • Enhanced network stack monitoring
  • Documented recovery procedures for virtual network corruption scenarios

Status: Resolved - All services operational since 00:45 UTC Oct 6

Posted Oct 06, 2025 - 12:21 CAT

Resolved

This incident has been resolved.
Posted Oct 06, 2025 - 07:01 CAT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 05, 2025 - 22:33 CAT

Identified

The issue has been identified and a fix is being implemented.
Posted Oct 05, 2025 - 22:32 CAT

Update

We are continuing to investigate this issue.
Posted Oct 05, 2025 - 19:24 CAT

Update

We are continuing to investigate this issue.
Posted Oct 05, 2025 - 12:44 CAT

Update

We are continuing to investigate this issue.
Posted Oct 05, 2025 - 09:27 CAT

Investigating

Incident Message:

We are currently experiencing a major outage across our entire infrastructure.

Impact:
All core services—including web hosting, email hosting, and domain registration—are currently unavailable. Our engineering team is urgently investigating the issue and working to restore service as fast as possible.

What we’re doing:
Our highest priority is resolving this outage. Updates will be posted here every 30 minutes or upon significant progress.

We apologize for the disruption and appreciate your patience.
Posted Oct 04, 2025 - 22:30 CAT