π Incident Postmortem β Major Service Outage (MSO)
Incident Start: 2025-08-29 23:04:53 (Europe/Paris)
Partial Restoration (DNS + critical services): 2025-09-02 12:00:00 (Europe/Paris)
Full Restoration: 2025-09-03 19:40:15 (Europe/Paris)
Duration: ~4 days
Impact: All Teradig LTD systems and services were inaccessible, including client websites, email, and domain services.
π Summary
Between August 29 and September 3, 2025, Teradig LTD experienced a major service outage (MSO) due to an internal network failure. This outage made all systems inaccessible.
From September 2 at noon, we restored DNS services for critical clients by updating nameservers (NS). We also enabled file and database sharing for clients who needed to remain live. Full restoration of all systems was completed on September 3 at 19:40 (Europe/Paris).
β οΈ Root Cause
The outage originated in our internal networking layer, preventing external traffic from reaching our systems.
- Nodes were functional but isolated.
- DNS/NS servers were down, which made hosted domains unreachable.
π οΈ Resolution
- Stabilized nodes early in the process.
- Identified NAT and routing failure as the root cause.
- Due to the amount of data, a full restoration was required.
- Provided temporary workarounds: moved some email systems, shared databases/files directly.
- Restored DNS and live access for priority clients on September 2 at noon.
- Achieved full system restoration on September 3 at 19:40 (Europe/Paris).
π Impact
- All hosted services (websites, portals, email, domains) unavailable for multiple days.
- Key academic clients at the start of the year were significantly affected.
- Communication delays early on added pressure for affected users.
β
Preventive Measures
To avoid recurrence, Teradig LTD has:
- Strengthened internal network resilience and NAT redundancy.
- Enhanced monitoring and real-time alerting.
- Published a dedicated status page (https://teradig.statuspage.io) for transparent communication.
- Improved backup and disaster recovery to shorten restoration times.
- Introduced clearer client communication protocols during outages.
- Version-controlled our infrastructure on GitHub, ensuring faster recovery.
- Established new points of recovery to reduce downtime in future incidents.
π Closing Note
We sincerely apologize for the disruption and the inconvenience caused. Throughout this incident, our top priority was to preserve client data β and we are pleased to confirm that no data was lost.
We thank our clients for their patience and trust, and we are committed to stronger resilience, clearer communication, and faster recovery in the future.
Christophe RENZAHO
Managing Director
Teradig LTD