HERS Variability Study: What Does It Mean for the Building Industry?

Codewatcher HERS variability study results

Green Builder Coalition Executive Director Mike Collignon assesses the findings of the HERS Variability Study and highlights the work that needs to be done to improve the consistency of HERS scoring.

By now, you might have heard about the Home Energy Rating Variability Study, which was prepared for the Department of Energy (DOE) and conducted by the six regional energy efficiency organizations (REEOs). It was first summarized at the February 2018 RESNET Board meeting, but the complete report was finally released in January 2019.

The following is a summary and commentary on the study, as well as its potential ramifications on the building industry.

Let’s start with the premise of the study. Quoting directly from the study, “The U.S. Department of Energy (DOE) Building Energy Codes Program… commissioned a study in attempt to better understand how home energy ratings might function as a code compliance mechanism, and to address the question of variability that could be expected if enlisting the HERS Index for the purpose of demonstrating code compliance via the ERI path”.

Finally, they stressed “that the study sought data on the consistency of multiple ratings on a single home and not whether the resulting ratings complied with the code (via the ERI targets specified in the IECC).”

Both by its name and the initial premise, there was an expectation that variability would be found. That in and of itself shouldn’t surprise anyone, since the object of the study was subjective analyses of homes. It could be posited that variability was inevitable. The key was to determine the level of consistency among HERS raters.

Next, the methodology employed was to dispatch between four and six HERS raters to a chosen home. The RESNET-certified raters were not made aware they were evaluating the same home, nor did their onsite presence overlap with each other. Each rater was given documentation in advance. From that, they were able to conduct a plan review, but they followed that up with a field inspection/onsite verification based on RESNET protocol. The output of each analysis was a preliminary HERS Index and a Building Summary Report.

Two homes per region were selected, though in the end, only 11 homes were rated. This methodology produced 56 home energy ratings in total.

The broad range of ratings for a singular home produced through this study was more than many expected. The study noted: “A majority of homes (7 of the 11) experienced variability of 10 or more points. Average variability across all homes studied was approximately 13 points.”

The list below breaks down the HERS Index ranges for each location:

Austin: 65 – 79
Chicago: 40 – 51
Dallas-Fort Worth: 64 – 79
Denver: 67 – 99*
Derby, Conn.: (w/o PV) 43 – 55
Derby, Conn.: (w/ PV) 19 – 30
Grand Rapids, Mich.: 58 – 65
Orlando: 59 – 74
Portland, Ore.: 82 – 88
Salt Lake City: 42 – 51
Seattle:  71 – 79
Tallahassee: 62 – 74

*99 is the clear and obvious outlier of the 5 results for this home. Yet even if that data point is excluded, the next highest index is a 79, which represents a 12-point range.

Digging deeper into the study’s findings, the inconsistencies were widespread. For example, five raters performed an analysis of the Seattle home. Three of the five raters counted five bedrooms, while the other two raters only counted four bedrooms. Keep in mind, each rater was given house plans, window schedules, insulation values and other default or non-observable information prior to their onsite assessment.

Even more amazingly, the calculated shell area ranged from 6,096 square feet to 7,107 square feet. No two raters calculated the same shell area square footage for this home … and this home’s range of indices was one of the smaller ranges in the entire study!

For the three-bedroom house in Denver, three of the five raters counted five bedrooms, while one rater counted four bedrooms and the other rater counted the correct number. Only three of the five raters conducted a total duct leakage test, and only two of the five raters tested for duct leakage to the outside. Astonishingly, the total area of wall square footage reported by the raters ranged from 2,187 square feet to 4,250 square feet. Window area ranged from 242 square feet to 451 square feet.

The number of returns ranged from five to nine. When it came to ceiling fan energy usage, three of the five raters didn’t mark anything down, and one rater entered zero for the refrigerator’s energy usage.

The study noted that a wide range of software was utilized, with the average home being rated using three different versions of software and, in one instance, a particular home was rated with five different versions of REM/Rate software among six raters. While its uncertain if software had a significant contribution to the range of variability, the inconsistencies noted above are independent of software. The discrepancies above can be attributed to either poor training or a failure to execute the requisite training.

This study was incredibly important because multiple jurisdictions (states and cities) have incorporated alternative code compliance paths built around energy ratings. As the study stated, “consistency and replicability of the rating process is crucial to the ERI path.”

It went on to say “while the HERS Index was not originally specified within the ERI path of the 2015 IECC, the connection was made more explicit when ANSI/RESNET/ICC Standard 301 was incorporated by reference in the 2018 IECC.”

This study does state that “it is based on a relatively small sample of homes, and should not be considered statistically representative.” Yet the very next sentence notes that “it… raises many important questions for further inquiry.”

The authors of the study call out five questions for further investigation, but the most significant questions can only be answered in retrospect. The first is: What reaction does this study elicit?

To their credit, RESNET reacted fairly quickly. On April 19, 2018 (approximately two months after the study’s results were conveyed), RESNET adopted the HERS Software Consistency Collaborative Modeling Process. One facet of this new effort was to recruit a “technical consultant with extensive knowledge of building energy software modeling” to serve as the RESNET Energy Modeling Director.

Unfortunately, that recruitment process took almost six months to produce a new staff member. Over the course of 2018, RESNET made the following revisions to the quality assurance aspects of the National Home Energy Rating Standards after the variability study was released:

  • Added a compliance path to achieve Quality Assurance Designee (QAD) status, whereby the minimum number of reviews increased from 25 to 30, though the type of reviews changed to allow file reviews (approved August 9, 2018)
  • Modified software accreditation to generate more consistency, and allowed for appeals to the RESNET Standing Software Consistency Committee (approved October 12, 2018)
  • Revised its original policy on the financial separation of Quality Assurance Designees (approved November 15, 2018)
  • Provided default ventilation fans for improved consistency while citing ASHRAE 62.2-2013 (approved November 29, 2018)

Another enormous question is: Will there be any damage done to the energy ratings industry, or the concept of energy ratings in general? There are many people and entities, not the least of whom are the 1,900 HERS raters across the country, who hope not. However, there are some changes taking place.

Some jurisdictions are moving away from citing HERS in their respective energy codes, and are instead adopting either the ERI path in the IECC or ANSI/RESNET/ICC 301. While the difference is subtle, there is significance in the fact that RESNET is not required to obtain a code compliant ERI. (RESNET is attempting to change that, but that’s a
story for another day.)

A jurisdiction’s decision to transition from HERS to ERI could be driven by the desire to
cite an ANSI standard, and might not have anything to do with the results of the variability study. Or, the results of the study might simply serve as a reinforcement of such a decision. Without asking each jurisdiction, it’s hard to say what their motivation is.

The unfortunate part of this whole saga is that this could have been avoided. At the 2013 RESNET Conference in Orlando, keynote speaker Green Builder Media President and Founder Ron Jones shared this sage advice to a crowded room:

“If you don’t keep your integrity as an organization and as an individual practitioner, you don’t amount to anything. You have to keep that integrity or we’re all sunk.” He went on to say: “We are only going to be as effective as the person in this room, the person in this industry, who cares the very least.”

By its own admission in the previously cited April 18, 2018 press release, RESNET had been striving “for the past four years… to enhance the consistency of the calculation of HERS Index Scores.” That means it took them a year to react to their own keynote speaker’s warning.

Setting that delay aside, it appears the release of the variability study is what really sparked significant action. That is supported by a HERS provider (who wished to remain anonymous) who felt the result of the variability study “wasn’t a surprise.” If this issue was known to exist, why wait to fix it?

The months and years ahead will tell us whether the issues highlighted in this study are a dent that can be buffed out or a devastating gash in the hull. The good news is that the instruments of repair already exist. In addition to the steps RESNET has already taken, other potential areas of improvement include: increased quality assurance and quality control; consistent software standards; more consistent training; reduced tolerance for errors; increased enforcement including, if needed, suspensions or revocations of an individual’s accreditation.

That leaves us with the final question: Will the industry fully commit to making the necessary improvements? If not, then anything short of that might simply amount to rearranging deck chairs.

Photo by woodleywonderworks

6 Comments on "HERS Variability Study: What Does It Mean for the Building Industry?"

  1. Kevin Hanlon | March 6, 2019 at 10:03 pm |

    Disappointing but constructive article about HERS index variability, but the study / experiment should not end here (or there?). Further investigation discussion needs to be conducted to determine if the raters that showed wide differences from the actual “As built” home, 1) Rated diligently and attentively but still came up w/ dramatically inaccurate results, or 2) were their ratings completed haphazardly where inattention and lack of respect for accuracy was the feature method. The raters that were diligent and attentive, but still way off in their areas tabs and bedroom counts, will have a reason, or, more like a misunderstanding, of how the standard,(301,380 and MINHERS) the IECC, its references and details of minimum rated features are to be gathered and modeled. They’ll have an argument or justification for doing so though they may be misinformed about the inspection and modeling language. That sort of inaccuracy can be fixed w/ more instruction, a more dynamic method to teach the standard(s) and some time. The raters who don’t have a good argument for backing up their model inputs, probably took the inattentive haphazard route, should be shown their poor results and directed to some other means of making a living. But hopefully this lab experiment doesn’t end here, ask those diligent raters why they used the inputs they did. And congratulate those raters who presented accurate HERS models of the homes they visited.

  2. I was just alerted to a somewhat similar study done 5 years ago. This particular study looked at 4 existing homes, rated by 6 different raters, in California in 2014.

    Here is a commentary on that study, done by Allison Bailes. His article provides a direct link to the study:
    https://www.energyvanguard.com/blog/76539/All-Over-the-Map-The-Imprecision-of-24-California-HERS-Ratings

  3. As President of Leading Raters of America(R), I can tell you that our member organizations are closely following the findings and conclusions of studies such as this. There is much room for improvement in how raters are trained in both inspections & testing as well as in the use of accredited software. We have sent RESNET several specific suggestions for how to enhance the consistency and accuracy of ratings and will continue to press for action in this very important matter.

  4. Publish every HERS Rating to Google maps.

    Create incentive for accuracy, or don’t expect accuracy.

    The opportunity for reward and punishment, pride and shame.

  5. Chris McTaggart, Co-Owner of Building Efficiency Resources (BER) and current RESNET Board Member, was unable to post the following comment. He e-mailed it to me, and I am pasting it here verbatim:
    ————————-
    I participated in this study as one of the “controls” for study homes rated through one of the REOs. While I respect the professionals at the DOE and the REOs, I think its important to understand that none of these people are trained HERS raters or have specific training/intimate knowledge of the HERS rating process. Due to this, there were some flaws in the process of this study:
    – It is not accurate that everyone had clear plans and specs ahead of visiting the homes.
    – The study asked raters to rate homes after substantial construction, which is different than the typical process where raters are involved in the process from the beginning, get plans and specs well ahead of time, perform mid-construction inspections, etc.
    – One of the study homes wasn’t even finished, and did not have installed lighting, appliance and mechanicals in place. Thus, this was highly irregular for raters to provide ratings on such homes.
    – Because there were multiple raters rating the same home, and none of them were supposed to know about it, raters were asked to provide only Projected Ratings, so that multiple ratings from different raters for the same address wouldn’t get flagged by the RESNET Registry or by Providers who may have served multiple raters. Therefore, all ratings may not have undergone RESNET QA review by QA Providers prior to issuing reports. Once again, this does not follow the standard QA process or systems by RESNET, and through QA could have potentially resulted in greater accuracy.

    Those are only the issues I encountered working with one of the REOs, and I respect them for bringing me in as a RESNET Trainer and QAD to help inform the process. I can’t speak to whether other REOs did the same or relied exclusively on their staff.

    Without a doubt, this study is valuable nevertheless. There were clear findings of inconsistency in rater process and ways the industry can get better. RESNET has doubled down on hiring extra staff and building in software systems to better analyze ratings from a macro standpoint to close the gap. Objectively, if you look at the studies from the DOE on variability on Prescriptive code compliance, I think we’ll all agree there is room for improvement across the entire country and building industry.

    The one point of this article that is unsubstantiated is that municipalities are “moving away from HERS” towards ERI. The implication is that, due to some issue with RESNET this is happening, whereas in reality its simply because ERI is the non-brand specific version of a HERS index that is referenced in R406 of the 2015 and 2018 IECC. I recommend this site retract that statement unless it has clear evidence this is accurate.

  6. Love to see this study repeated.

    Tell the guys up front that they are party of the study.

    I wonder how much better the results would be from that one change.

Comments are closed.