ARPA-E’s PERFORM Program Data Plan

Easily Accessible and Usable Datasets for Grid Management Innovation

ARPA-E’s Performance-based Energy Resource Feedback, Optimization, and Risk Management (PERFORM) program seeks to develop new grid management systems that represent the relative delivery risk of individual electricity generation resources and balance the collective risk of all assets across the grid. The program has funded twelve project teams across the country to build software platforms that optimize grid management as the penetration of variable renewable resources continues to increase. However, the difficulty of obtaining and sharing sensitive transmission grid data has historically presented a challenge for innovation.

To overcome this challenge, PERFORM also funded leading power systems modelling groups at Lawrence Livermore National Laboratory, National Renewable Energy Laboratory, Princeton University, Texas A&M University, and the University of Wisconsin to generate realistic synthetic datasets for the power grid. These datasets served as a basis for the twelve optimization teams to develop their platforms.

The dataset teams have now come together to make their data publicly available via the links below. Each unique dataset can be used for a variety of applications specified by each respective team. ARPA-E hopes that making this data widely accessible will spur additional innovation in grid management beyond the PERFORM program.

See the Datasets for the ARPA-E PERFORM Program page for more information.

For questions regarding the PERFORM Program, please contact Technology-to-Market Advisor Jonathan Glass (

Lawrence Livermore National Laboratory

The Lawrence Livermore National Laboratory (LLNL) team developed unit commitment and economic dispatch (UCED) data for the synthetic Texas 7K and NYISO systems. The data is most easily used in conjunction with the open-source Prescient cost modeling system. These UCED test cases synthesize (1) power system network data produced by the TAMU team, (2) forecast and actual solar, wind, and load data produced by the NREL team, and (3) thermal unit performance characteristics produced by the University of Wisconsin - Madison team. The integrated data allows production cost model runs for these synthetic systems, over a full year of varying weather and system conditions. The Prescient cost modeling system is built on the high-performance Egret UCED optimization library, which enables rapid solution of large cases (such as the Texas7K system) with full network fidelity in tractable run-times. System performance metrics such as LMPs, generation costs, renewables curtailment, and reserve margins are the key outputs of resultant simulations.

See for LLNL’s dataset and more information.

For questions regarding this dataset, please contact Jean-Paul Watson (

National Renewable Energy Laboratory

The National Renewable Energy Laboratory (NREL) has produced a set of time-coincident load, wind, and solar generation profiles, including actual and forecasted timeseries data. Both actuals and forecasts are provided with high temporal and spatial fidelity, and the forecast data includes both deterministic and probabilistic forecasts. These datasets are intended to provide researchers with access to realistic load, wind, and solar forecast data at high resolution. In providing probabilistic forecasts, these datasets aim to inform the development of next-generation stochastic dispatch and planning models. Through its inclusion of profiles for potential future wind and solar buildouts, these datasets will also enable power system modelers to understand and plan for the implications of high renewable systems.

See for NREL’s dataset and more information.

For questions regarding this dataset, please contact Brian Sergi (

Princeton University

Princeton University has developed two simulation platforms— PGscen and CLNSim — that support the joint simulation of renewable asset production and loads conditioned on short-term forecasts. PGscen is open source. It can be used for producing joint Monte Carlo scenarios composed of power load demand, as well as solar and wind generation for a given grid. The PGscen model is trained on historical forecasted and realized demands and outputs. CLNSim, is also calibrated to historical forecasts and realizations of wind, solar and hydro assets, as well as loads. The platform supports live listener-based simulation requests, automatically returning ad hoc user requests within a few minutes.

See for Princeton’s dataset and more information.

For questions regarding this dataset, please contact René A. Carmona (

Texas A&M University

The Texas A&M University Team created a 6,700-bus grid covering the Electric Reliability Council of Texas (ERCOT) footprint and a 23,600-bus grid covering the US Midwest. The transmission in the synthetic grids is entirely fictional, and they contain no Critical Energy/Electric Infrastructure Information (CEII.) The team used Energy Information Administration 860 data to site the generators and census data to approximate the load. The synthetic substations meet realistic proportions of load and generation due to a clustering technique. Transformers were added to connect multiple voltage levels at each substation. The transmission line topology algorithm iteratively inserts synthetic lines using an algorithm inspired by simulated annealing. Voltage control devices were added to create alternating current power flow solvable synthetic cases using reactive power planning. This improved dataset represents a more reliable and resilient grid.

See for TAMU’s dataset and more information.

For questions regarding this dataset, please contact Farnaz Safdarian (

University of Wisconsin

The University of Wisconsin – Madison developed the three power flow cases representing the transmission system in New York state. The models were created to enable the development and testing of risk measurement and risk mitigation studies. In keeping with that purpose, each facility in the model is associated with a geographic location. Two of the cases represent a peak and off-peak hour, respectively, from 2019. The third case represents a peak hour in the year 2030 after significant green power and transmission reinforcements have been added to the system.  Example generator economic data derived from the EPA Continuous Emissions Monitoring database, in format ready for MATPOWER, is included. Similarly, existing and anticipated DC transmission projects are represented as matched generator and load pairs.

See for University of Wisconsin - Madison’s dataset and more information.

For questions regarding this dataset, please contact Scott Greene (