Ten keys to forecast reliability
/--/uploads/2025/09/Ten-keys-to-forecast-reliability.png)
Grid operators have some of the world’s most demanding service level requirements. And yet, as the complexity of energy systems increase, grid operators are becoming more reliant on intelligent IT systems such as forecasters. It follows, therefore, that these IT systems need to be reliable to enable grid reliability.
System reliability is one cornerstone of the systems operator forecasting systems we build at N-SIDE (along with accuracy and security). It is also in our DNA as the builders of EUPHEMIA, the algorithm that has reliably the day-ahead market for almost 10 years. The following post will highlight ten critical points for ensuring forecast reliability:
- Software engineering
- Organisation
- Comprehensive testing
- Quality assurance peer review
- A solid CI/CD pipeline
- Have a validation environment
- Monitoring, logging, and alerting
- Fallbacks
- Service desk
- Service Level Agreement
1. Software engineering
When it comes to industrial machine learning, it’s essential to have a team of skilled software engineers design and build the final system, even if a data scientist or machine learning engineer created the algorithm. As a former machine learning engineer, I truly appreciate the collaboration with my software engineering colleagues who work alongside us to turn our cutting-edge algorithms into robust software products.
2. Organisation
Conway’s law states that the architecture of a software system will reflect the communication structure and organisational design of the company that develops it. You can try to build reliable software, but if your organisation is not designed for it, you’ll continuously swim against the tide. As TSOs evolve into IT organisations, keeping Conway’s law in mind will be critical to ensuring grid stability in future.
3. Comprehensive testing
We write comprehensive tests of our code to catch present and future bugs before we see them in production.
4. Quality assurance and peer review
Fortunately, this is quite a standard part of everyday software best practice. Any code we write gets checked by two other experts before going into the CI/CD pipeline.
5. A solid CI/CD pipeline
CI/CD stands for Continuous Integration, Continuous Development. This is a series of automatic steps, or gates that the software system containing new updates must go through before being approved for final deployment. It could be scanning code quality, implementing the tests mentioned above, or scanning for vulnerabilities.
6. Have a validation environment
Even with all the steps above, some bugs manifest only when the code is run. A realistic validation environment ensures it doesn’t run in production for the first time.
7. Monitoring, logging, and alerting
Of our systems, but also the data availability from different vendors. Proactive monitoring and alerting notify us when the system behaviour differs from expected, preferably before it affects real-world systems. By this stage, 99% of the bugs have been detected and fixed. The most common problem now becomes data unavailability from one of our data suppliers and the occasional bug.
8. Fallbacks
What happens if some critical data source is not available? We can fill in the missing data, but often that leads to highly misleading forecasts (after all: if we knew what the input data would be, we would not need to include it in our models). A standard forecasting algorithm would fail without such data. However, we can’t afford that for mission-critical systems. Our forecasts, therefore, often have two or more backup systems to make sure that a forecast is still delivered.
9. Service desk
Once the code is deployed and the forecast runs, what happens if there is a problem? We have a service desk and a duty roster to ensure it is always staffed by one of our forecasting specialists, ready to respond if there is an incident. If this occurs, we also do a post-incident review internally to ensure the cause is identified and addressed.
10. Service Level Agreement
We have SLAs with our clients in which we agree to a service level matching their needs, with penalties on us if we do not meet this service level.
In the demanding world of grid operations, the reliability of our forecasting systems is paramount. This reliability is not an accident, but a result of carefully considered principles and practices, from thoughtful software engineering to the robustness of our organizational structure. Whether through comprehensive testing, continuous monitoring, or the implementation of fallback systems, we strive to ensure that our forecasts are dependable even in the face of unexpected challenges.
But beyond these technical aspects, we believe that reliability also hinges on our relationships with our clients. Through our service desk and SLAs, we commit ourselves to responsiveness and accountability, ensuring that we are there to support your needs and mitigate any issues that arise. It is this combination of technical excellence and dedication to service that defines the reliability of N-SIDE forecasts. As we continue to evolve and refine our practices, we invite you to join us in exploring the future of reliable forecasting and its crucial role in maintaining grid stability.
/--/uploads/2025/08/HCA-e1754569935306.jpg)