Frequently Asked Questions about Public Datasets

Watch a how-to demonstration video: End-Use Load Profiles Dataset Access Demonstration

Contents

General

Data Viewer

Dataset-specific FAQ

End Use Savings Shapes (2022.1)

End-Use Load Profiles for the U.S. Building Stock (2021.1)

I have a question not answered here.

General

What buildings are represented by each dataset?

The residential (ResStock) dataset represents dwelling units in the contiguous United States, including single-family, multi-family (including high rise multi-family), and mobile homes. It does not include dormitories, prisons, assisted care facilities, and other congregate housing situations.

The commercial (ComStock) dataset represents 14 of the most common commercial building types – small office, medium office, large office, retail, strip mall, warehouse, primary school, secondary school, full-service restaurant, quick-service restaurant, small hotel, large hotel, hospital, and outpatient – which comprise about 65% of the commercial sector floor area in the United States according to CBECS.

What year does the baseline building stock represent?

The building stock represents, as closely as possible, the U.S. building stock as it was in 2018. The building stock representation is the same regardless of the weather data used.

Should I use aggregate timeseries data or individual building profiles?

It depends on your application. Many applications just need an aggregate load shape. If analyzing scenarios that require realistic spikiness of individual dwelling unit or building loads, such as behind-the-meter solar plus storage, rate design involving real-time pricing or demand charges, or distribution system impacts, then we recommend using individual building profiles. These recommendations will be discussed in more detail in a forthcoming Applications and Opportunities report.

Are descriptions available for the filters and end-use categories?

Descriptions of each of the building characteristics filters and the end-use categories can be found in the data_dictionary.tsv file (example). Descriptions of the values used in those filters can be found in the enumeration_dictionary.tsv (example). Both files can be opened with Excel or a text editor. Each dataset has its own data_dictionary.tsv and enumeration_dictionary.tsv file, linked to from the Datasets page.

Does the dataset include data on XYZ end-use category or device?

The list of included end-use categories can be found in each dataset’s data_dictionary.tsv file (example).

Is there a legend or lookup for the geographic codes (g0100010 for county, g01000100 for PUMA, etc.)?

In each of the published datasets there is a spatial lookup table: spatial_tract_lookup_table.csv (example). County and PUMA codes can be looked up using the "nhgis_county_gisjoin" and "nhgis_puma_gisjoin" columns, respectively. If you want to find the pre-aggregated timeseries file for a county or PUMA, you can use this lookup to find the code for the county of interest. To find the 5-digit PUMA code based on a city or place name, use this file from the U.S. Census Bureau: 2010_PUMA_Names.pdf.

Are there load profiles available for the 16 California Climate Zones?

Aggregating by California climate zone is available for residential building profiles but not commercial building profiles.

To achieve this aggregation for the residential load profiles, use the resstock.nrel.gov website and select the dataset and region of interest (Example: the ResStock End Use Savings Shapes by State 2018 dataset and the state of California.) At the bottom click the “Explore Timeseries” button. At the left side, halfway down, click the button “+ Add Filters”. In the “Filters” column, find and select the “Cec climate zone” filter. The options available are for CEC Climate Zones 1-16. Do not use the “None” option, as the option is for locations outside the state of California.

How can I see the building characteristics associated with an aggregate load profile?

There are two ways to access the building characteristics data associated with an aggregate load profile:

  1. Building Characteristics data viewer (example) - Like the energy data viewer, you can apply filters to focus on the subset of the building stock that you are interested in. Each dataset has its own building characteristics viewer.
  2. Metadata.tsv files - The raw building characteristics data can be found in the metadata file corresponding to each dataset (ResStock example) (ComStock example). The metadata files are in tab-separated value format and can be opened in Excel or with scripting languages. You can filter these files down to the same subset of dwelling units or buildings that you are viewing for timeseries data. You can use the various characteristic string fields (e.g., “HVAC Heating Efficiency”) to understand the distribution of characteristic values across that subset of dwelling units or buildings. For ResStock, each row has an equal weighting (found in the “Sample Weight” field). For ComStock, each row can be weighted with the "Weight" field (all buildings of a given type have the same weight).

How can I see the building characteristics associated with an individual building (or dwelling unit) load profile?

The filename of the individual building (or dwelling unit) load profile's parquet files contains the building ID. Each of these building IDs corresponds to a row in the dataset's metadata, which is available in either .parquet or .tsv format (tab-separated value format that can be opened in Excel) (ResStock example) (ComStock example).

What are the data units?

All downloaded energy data is in kWh, including all electricity, natural gas, propane, and fuel oil end uses, as documented in the data_dictionary.tsv files (example).

Timeseries energy consumption data viewed on the website are in metric units. The metric prefix is on the y-axis label (T for tera, G for giga, M for mega, etc.) and the rest of the unit information is in the y-axis label.

What is the timezone of timestamps?

The timestamps of all load profiles have been converted to Eastern Standard Time, to prevent issues when aggregating across time zones.

The underlying modeling was conducted using local standard time for each location, with occupant schedules adjusted for daylight savings as applicable. All EnergyPlus timeseries outputs were converted from local standard time to Eastern Standard Time for publication in the web data viewer, data viewer exports, timeseries aggregates, and individual timeseries parquet files. In converting from local Standard Time to Eastern Standard Time, if necessary the last few hours of each dataset were moved to the beginning of the timeseries. For example, the first two hours of data from Colorado in Eastern Standard Time (Jan 1, midnight to 2 AM) were originally modeled as the last two hours of the year in Mountain Standard Time (Dec 31, 10 PM to midnight) using the corresponding weather.

Do the aggregates have the sample weighting factors applied?

Yes. The aggregates represent the total relevant building stock (e.g. all small office buildings in the state of Colorado), not just the sum of the model results. In other words, the aggregates represent the total “floor_area_represented” for commercial or “units_represented” for residential.

What software can I use to open the individual building timeseries .parquet files?

Parquet files can be read using programming languages such as Python, using the pyarrow package. For other options, see https://arrow.apache.org/docs/index.html. There are a few third-party graphical tools for viewing parquet files, but we have not tested them and the third-party support is limited.

Is there an API to access data without downloading locally?

There are no plans for an API. However, we are currently developing documentation that will explain how to link one’s own Amazon Web Services account to this data, so the data can be queried by analytic tools like Athena. We will also be providing example SQL queries to help facilitate analyses.

Are there electric vehicle (EV) charging profiles in the dataset?

No, we do not currently model EV charging in the dataset. For modeling aggregate EV load profiles for a city or state, we suggest using EVI-Pro Lite. Measured charging profile data for individual homes can be found in the NEEA HEMS data and Pecan Street Dataport. Email us at load.profiles@nrel.gov if you have suggestions for other EV charging data sources.

How many profiles or models should be used, and how does the number used affect uncertainty of results?

Users should estimate standard error for metrics of interest using the standard deviation divided by the square root of the number of samples (i.e., profiles or models). As discussed in the methodology report (section 5.1.3), for residential units, a good rule of thumb is to use at least 1000 samples to maintain 15% or less sampling discrepancy for common quantities of interest. Queries in sparsely populated areas or with filters applied may have relatively few samples available. In these cases, samples from similar locations can be grouped to increase the sample size.

As an example, if one is interested in the mean change in annual electricity costs in a certain county under a potential new rate structure and 500 samples are available in that county, the costs should be calculated for all 500 samples and the standard deviation of those costs can be used to calculate the standard error of the mean change in annual electricity costs.

Are weather data files available in EPW format?

Weather data used for the modeling have been provided in .csv format for regression modeling, forecasting, or other analyses. The TMY3 weather files in EnergyPlus input format (EPW) can be downloaded from the NREL Data Catalog, with filenames that correspond to county IDs in the ResStock/ComStock metadata.

EPW format weather files for 2018 or other actual meteorological years have not been publicly released. These files can be purchased from private sector vendors. See https://energyplus.net/weather/simulation for a list of providers.

Data Viewer

Why is the time series data sometimes slow to load after I click the update button?

We query data in real time to produce the time series graphs you see on the webpage, and this can involve scanning terabytes (TB) of data. Running a baseline-only query for California, Texas, New York, or Illinois takes around a minute, while running a query for a state like Colorado or Massachusetts takes about 10-20 seconds. However - if the graphs have previously been generated we have the data cached and can typically load the data in a few seconds. That's why the load time varies.

Why can’t I click on “Explore Timeseries”?

The “Explore Timeseries” option is available once a specific geography (e.g. state or PUMA region) is selected.

How do I see a profile for just one, or just a few, end uses?

Clicking on the end uses in the legend will toggle their inclusion in the visualization.

How can I access a specific day of timeseries data?

Choose “Export csv” and “15 minute resolution”. The resulting csv file will have 15 minute end use load profiles that are not aggregated over time.

Does the timestamp represent the beginning, middle, or end of each 15-minute interval?

The timestamp indicates the end of each 15-minute interval. So "12:15" represents the energy use between 12:00 and 12:15.

What is being summed or averaged over?

The 'sum' aggregation is the total energy consumption for all buildings that meet the filter criteria across all the occurrences of the given time step within the selected month(s). For example, in a day timeseries range for a specific state for the month of July, the 7-7:15 AM hour time step shows the sum of all energy consumption statewide between 7-7:15 AM in July, from buildings that meet the filter criteria. The value in that timestamp would be approximately 1/96th of the total statewide energy consumption in that month in that sector. The ‘sum’ view has fewer uses than the ‘average’ view.

The 'average' aggregation is the total energy consumption for all buildings that meet the filter criteria, averaged across all the occurrences of the given time step within the selected month(s). For example, in a day timeseries range for a specific state for the month of July, the 7-7:15 AM hour time step shows the average statewide energy consumption between 7-7:15 AM in July, from buildings that meet the filter criteria. The ‘average’ aggregation provides a view of the average day of total energy consumption in the state. This is the more logical view for most use cases.

Note that while each time step within a day or a year has the same number of occurrences within each dataset, each time step for a week does not - some days of the week occur more times than others in each year or month range (except for February).

Can I aggregate over multiple locations?

The viewer allows aggregations of up to six locations (states or PUMAs, depending on the dataset). When viewing a single location, choose the “+ More Locations” option, add up to five additional locations, and choose “Update Search”.

Sums of more than six locations can be created manually by downloading sums of up to six locations and summing further on your local computer.

TMY3 weather is not aligned between locations. This does not affect our recommendations for working with annual data. However, if your application requires timeseries data and therefore would benefit from aligned weather, we recommend either using an AMY dataset, or filtering by weather station and summing only within a single weather station’s PUMAs.

How should I interpret graphs that include PV?

Downloading a csv of the relevant data is the best approach. The data visualizations in the web viewer that include PV have some UI limitations. We are also aware that the plot axes cut off negative values.

How are the peak day and min peak day identified?

The peak day is the day with the highest single-hour (peak) energy consumption. The min peak day is the day with the lowest single-hour energy. The peak_day or min_peak_day aggregation type is only available when month constraints are not used.

How do I see which day is the peak day?

This is not currently available in the web interface, but you can use the interface to download the full year of 15-min data and see which day is the peak day.

How can I access energy use intensity (per square foot) data?

Pre-aggregated files For commercial buildings, the pre-aggregated timeseries files include a floor area column, so it is straightforward to divide energy use by the floor area to get intensity. Floor area is not currently included in the residential aggregates, but the floor area can be calculated from the metadata.tsv file (example), by adding up the values in the "in.sqft" column after filtering down to the building type and geographic region corresponding to the pre-aggregated file.

Data viewer In the data viewer, the bar graphs can show energy use intensity by selecting "energy_consumption_intensity" from the Output drop down menu. Timeseries data for energy use intensity are not directly available, but you can use the Building Characteristics viewer to download floor area values for a filtered subset of buildings and use that to convert timeseries energy use to timeseries energy use intensity.

How can I see the number of buildings, dwelling units, or number of devices associated with an aggregate load profile from the data viewer?

While the pre-aggregated files (example) contain a column with the "floor_area_represented" for commercial or "units_represented" for residential, aggregations generated by the web viewer don’t include the "floor_area_represented" or "units_represented" information currently. Instead, you can find this information in one of two ways:

  1. Building Characteristics data viewer (example) - You can select the same geography as the aggregate from the energy data viewer, and you can apply the same stock filters as well. The bar graph shows the number of dwelling units (for residential) or floor area (for commercial), which can also be exported to a CSV file.
  2. Metadata files - The raw building characteristics data can be found in the metadata file corresponding to each dataset (ResStock example) (ComStock example). The metadata files are in tab-separated value format and can be opened in Excel or with scripting languages. The files can be filtered down to the same subset of dwelling units or buildings in whatever timeseries you are using. You can use the various characteristic string fields (e.g., “HVAC Heating Efficiency”) to understand the distribution of characteristic values across that subset of dwelling units or buildings. For ResStock, each row has an equal weighting (found in the “Sample Weight” field). For ComStock, each row can be weighted with the "Weight" field (all buildings of a given type have the same weight).

What is the number hovering over the timeseries y axis when I hold my cursor over a specific end use?

This is the total energy consumption by that end use within the selected months.

Can I get the raw data shown by the Data Viewer and Building Characteristics?

Yes! Please visit the Datasets page for direct links to each dataset's Data Repository.

Dataset-specific FAQ

End Use Savings Shapes (2022.1)

Key documentation: dataset release slides; technical documentation

How does this dataset relate to the End Use Load Profiles (2021.1) dataset? Which should I use?

The baseline dataset in the 2022.1 data release is similar to but not identical to the dataset in the 2021.1 release due to improvements to the modeling approach made between the two efforts. These improvements are described in both the dataset release slides linked above and the technical documentation linked on the Datasets page. The 2022.1 release includes savings shapes (also known as measure impact profiles) and emissions impacts for ten measure packages. We recommend using the 2022.1 release for any new work. For work already underway using the 2021.1 release that would not benefit from the measure package or emissions results, updating to the 2022.1 is optional and will have minimal impact on most results.

It is important to note that the dwelling unit models are not the same between the two releases. ID 1 in the 2021.1 release is not related to ID 1 in the 2022.1 release.

What information is not available in the data viewer?

The data viewer includes end-use energy consumption information for all end uses and fuels modeled, at a range of different aggregation levels. Carbon emissions outputs and PV outputs (including the PV end use and the net totals) are not available in the data viewer and should be accessed using other avenues (see Datasets page). Additionally, the data viewer is not set up to show individual model results, only results aggregated across models.

Are the EnergyPlus model input files (.idf) available for the 2022.1 release?

We make the HPXML model input files available in the datasets which can be translated into OpenStudio (.osm) and EnergyPlus (.idf) models via the OpenStudio-HPXML workflow. (Use the git tag euss.2022.1 for the version used in EUSS.) The input HPXML and schedule csv files are available in the OEDI Data Lake in the folder “building_energy_models” within each dataset. TMY3 weather files are available on the NREL Data Catalog. 2012 and 2018 AMY files are available for purchase from commercial vendors.

What do the GHG emissions values represent?

The carbon emissions results represent 1 year of emissions, approximately the average year within the specified lifetime, but it’s a weighted average towards sooner years. More information is available in the technical documentation (see link at the head of this section).

Which homes have water heaters in nominally conditioned space?

The logic used to assign a water heater location for each modeled dwelling unit is this. For colder climates (IECC 2004 3A, 4A, 4B, 4C, 5A, 5B, 6A, 6B, 7) the water heater is located in the basement in any home that has a basement (whether conditioned or unconditioned) and in living space otherwise. For warmer climates (IECC 2004 1A, 1B, 1C, 2A, 2B, 2C, 3B, 3C) the water heater is located in the garage in any home that has an attached garage and in living space otherwise.

End-Use Load Profiles for the U.S. Building Stock (2021.1)

Key documentation: technical report, End-Use Load Profiles for the U.S. Building Stock: Methodology and Results of Model Calibration, Validation, and Uncertainty Quantification.

What does "water systems" mean?

Water system energy consumption includes all building energy related to to residential water heating and commercial service water heating and pumping.

Did the calibration include natural gas or other fuels?

While the validation effort was largely focused on electricity, we did make some comparisons to annual and monthly EIA survey data for natural gas. These comparisons, which we used to inform the model improvements made during calibration, are published in the technical Methodology and Results report linked at the top of this FAQ. We did not do any timeseries comparisons for propane, fuel oil, or other fuels, although these fuels are included in the models.

Have you compared the dataset to XYZ?

All comparisons we completed as part of the calibration and validation effort are published in the technical Methodology and Results report linked at the top of this FAQ. In general, the comparisons are against anonymous hourly utility meter data, EIA monthly/annual data, and various end-use metered datasets.

Are there behind-the-meter photovoltaic (PV) solar profiles in the dataset?

Yes, there are solar PV profiles in the ResStock data but not the ComStock data.

Are the EnergyPlus model input files (.idf) available for the 2021.1 release?

Not directly. We made OpenStudio model input files (.osm) available in the dataset (ResStock example, ComStock example), which generate the EnergyPlus model input files. The residential models require external schedule .csv files (example).

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×