Service Status¶

Welcome to Evam Status page. Here you’ll find live and historical data on system performance.

UK¶

Active Incidents¶

Note

All Systems Operational

Historical incidents¶

Below you’ll find a record of all incidents from the past 30 days.

Post-Incident Report — UK Service Disruption, 18 June 2026

Service affected: Central Services (UK production) Date: 18 June 2026 Status: Resolved

Summary

On the morning of 18 June 2026, our UK Central Services experienced two periods of disruption. The first was a period of intermittent instability during which the service generally remained available to users. The second was a short full outage of approximately 20 minutes, during which the service was unavailable. Service was fully restored and confirmed stable by late morning.

We know how important reliable service is in the work our customers do, and we apologise for the disruption.

What happened

During the first period, one part of our infrastructure began consuming more memory than expected. Because the service runs across more than one instance, traffic was automatically handled by the healthy instance, so most users were not affected during this time.

While our team was resolving the underlying memory issue, a supporting component that routes incoming traffic did not restart as expected after maintenance actions were applied. This issue was not visible in our standard health checks at the time. As a result, when both instances were cycled, incoming traffic could no longer reach the service, which led to the full outage.

Once our engineering team identified that this routing component was not running, it was restarted manually on both instances and the service was restored.

Customer impact

During the initial instability, the service remained available to the large majority of users and most customers would not have noticed an interruption. During the later full outage, Central Services in the UK was unavailable for approximately 20 minutes. No customer data was lost or compromised at any point.

Timeline (UTC)

06:40 — First signs of instability began.

06:55 — Service recovered and was operating normally.

07:30 — Instability resumed and was detected by our team.

07:54 — Engineering response began.

08:00 — Service recovered again.

08:53 — Full outage began; the service became unavailable.

09:13 — Full service restored and confirmed stable.

10:02 — Additional memory added as an immediate safeguard against recurrence.

How we fixed it

Our team manually restarted the traffic-routing component on both service instances to restore connectivity, then verified that traffic was flowing correctly. As an immediate safeguard against recurrence, we increased the memory available to the service the same day.

What we are doing to prevent recurrence

We have completed a full internal review of the incident and are taking the following actions:

Resolving the underlying cause. We have identified the source of the elevated memory usage and have work underway to fix it.

Strengthening our recovery procedures. We are improving how we verify that a service is fully restored after maintenance, so that a component failing to restart is detected immediately rather than later.

Improving monitoring and alerting. We are restoring and tuning our alerts so that early warning signs are caught faster and more reliably.

Upgrading our infrastructure. We are continuing a planned migration of our infrastructure to improve overall resilience.

Improving our incident response. We are formalising our on-call and escalation process and expanding incident-response training across our engineering teams so issues are owned and communicated clearly from the outset.

Our commitment

Reliability is fundamental to the service we provide. The steps above are aimed at both preventing this specific issue from happening again and making our service more resilient overall. We will continue to share updates on this page as our follow-up work progresses.

Thank you for your patience and continued trust.

Sweden¶

Subscribe to Incident Updates

Active Incidents¶

Note

All Systems Operational

Historical incidents¶

Below you’ll find a record of all incidents from the past 30 days.