Evergrid, a provider of global resource management software for next generation data centers, today announced a partnership with Platform Computing through the Platform Alliance Network partner program.
This partnership will allow Evergrid to integrate its patent pending high availability and resource management software, Evergrid Availability Services (AvS-Batch), with Platform's flagship LSF workload manager. The resulting solution will allow massively parallelized distributed applications and single process applications, such as Electronic Design Automation (EDA) applications, to run at near 100-percent reliability in high performance computing (HPC) clusters.
Evergrid's AvS-Batch prevents application downtime by automating the checkpointing, migration and recovery of applications. As a result, Evergrid provides for automatic application failover across multiple nodes and tiers.
By recording the state of the application, Evergrid is able to checkpoint and recover from failures at near 100-percent reliability with minimal overhead. This is especially useful in high performance technical computing environments where distributed applications may run for hours and even days.
Evergrid's AvS-Batch also provides for stateful pre-emptive scheduling, which allows users to checkpoint the entire state of lower priority jobs to disk to allow higher priority jobs to run immediately. Once the high priority jobs complete, the checkpointed applications can resume execution on available resources. This capability ensures that no compute cycles are ever lost when a job is pre-empted. Evergrid changes the nature of application pre-emption today, which, with current commercial technologies, requires the lower priority job to be stopped and restarted from the beginning, losing all work done to that point in time. Evergrid's AvS-Batch allows a pre-empted job to resume from the checkpoint, leveraging all work done by that application up until the point of pre-emption. Stateful pre-emptive scheduling also lets commercial applications users make more efficient use of their software licenses.
"Evergrid's patented transparent application fault tolerance and stateful pre-emptive scheduling technologies solve critical reliability and resource utilization issues in today's high performance computing clusters," said Mitchell Ratner, Evergrid's vice president of Business Development.
"Partnering with Platform Computing, the workload management leader in the high performance computing (HPC) cluster space, makes perfect sense from a business and customer perspective."
"Our customers with high performance computing clusters environments will be delighted to know that they now have access to an integrated solution that provides stateful, pre-emptive scheduling and checkpoint/recovery capabilities for single process and parallel applications," said Simon Lonsdale, director of Strategic Alliances at Platform Computing. "Evergrid's Availability Services software is the perfect complement to Platform's LSF workload manager," added Lonsdale. "Together, Platform and Evergrid will provide our mutual customers in demanding, large-scale, computing-intensive sectors with a seamless solution to their continuous availability and resource management problems."