Regression software has become more and more a part of developing real estate property value opinions. As a result, it is paramount to understand the differences in tools available, and the elements of the regression process that are necessary for the valuer to understand and to disclose in a report.
While guidance is available related to the general use of AVMs and statistical tools, currently there is limited detailed guidance that speaks exclusively to the specifics of regression modeling. There is little instruction on evaluating the credibility and reliability issues that surround: the variety of features in regression tools, the process of developing a regression model, and the communication of the conclusions to the user. Below are important factors for the valuation professional to consider when evaluating the choices of software, and when generating, applying, and communicating regression results.
General knowledge of the regression software’s “inner workings”.
The valuer should understand and disclose the overall processes used by the tool and the type of regression employed. In addition, the valuer should understand any built-in equations, parameters, algorithms, and “givens”, which are automated and out of the control of the human valuer. How do those automated operations affect the process and results? For example, if the software automatically subtracts an estimated land value, automatically removes certain types of transactions, or automatically applies increases or decreases to prices based on market changes since the transaction date, how exactly is the software making those determinations and what is the impact on the final analysis?
Plain-language explanations.
It is likely that the user of a valuation report (and quite likely the valuer him or herself) is not a statistician or mathematician. Terms like R-squared, p-values, coefficients, ranges, intercepts, and other words related to regression should be explained so the user can understand the content of the report. This is not to say that specialized terms should never be used, but rather that they should at least be defined in a clear manner.
Data: source and delineation.
What data is brought into the regression software, and what is the source of that data? Consider and disclose the geographic area, transaction dates, physical and transactional characteristics, the size of the pool of sales, and other delineators. The valuer should understand and disclose what data is initially imported to the software, as well as how (or, if) additional filters are applied to that data. For example, if all residential sales from City A are imported to the software, and is then filtered to analyze only manufactured houses within a certain neighborhood, those constraints on the data should be understood and disclosed, with the implication that the resulting data set is relevant to the subject. Additionally, if data is supplied by a third-party, are the make-up and origin(s) of that data understood, and is the data thorough and adequate for a reliable analysis?
Excluding data.
Beyond the general filtering of data, the valuer should understand and disclose the rationale for the exclusion of other sales. Reasoning could include the sales being outliers, having unreliable or incorrect information, or other reasons. For example, were some properties remarkably large, small, old, new, high-quality, low-quality, or did they have data that was legitimately suspect or otherwise flawed which could not be corrected? Did manually excluding the data result in a regression model that was improved and more relevant to the subject than it would have been?
Pool size.
The valuer should understand and disclose the size of the pool of sales being analyzed. For example, is the pool so small that conclusions are not reliable or the pool so large that conclusions are not relevant to the subject, to the point the reliability and/or credibility of the model are affected? Does the pool size influence how many independent variables should be (or were) used?
Variables used.
Independent variables in a real estate regression model are the physical and/or transactional characteristics of the properties that are determined to affect (or not affect, in some cases) the sales price (the dependent variable). As such, the independent variables are of utmost importance to the analysis. The valuer should understand and disclose the method and rationale behind the choice of the specific variables used. If the selection of the variables was partially or fully in the control of the software, or was otherwise limited by the software, the valuer should have a general understanding of the software’s parameters that are used for such determinations, with disclosure in the report. Has the software so limited the selection of variables or forced the exclusion or inclusion of certain variables, to the point that the outcome of the regression model is affected?
Absent variables.
The valuer should consider if any independent variables are likely absent from the model, such as due to limitations in the data source or built-in parameters of the software. Absent variables might be evidenced by a low R-squared figure or other known factors, and might include characteristics that are not accounted for, such as condition, quality, location factors, or other characteristics. The valuer should consider and disclose how these circumstances may shape the reliability of the model and the valuer’s application of the model’s output. The valuer’s understanding of variables that are absent from the regression model can influence the valuer’s use and application of the output.
Testing the model.
The valuer should test the conclusions drawn from the regression model. Tests might include visual scatter graphs, to illustrate the degree of accuracy of the regression output for the pool of sales analyzed. Tests also might include applying the adjustment-rate conclusions to actual market sales (such as sales specifically comparable to the subject), to illustrate the comparison of the model’s predictions to the sales’ actual sales prices. Comparing regression output to other market data, such as comparing regression figures to paired sales figures, can be a useful process to test and reconcile reliable and credible conclusions.
Application, and other approaches to value.
The valuer should understand and disclose the extent and purpose for which the regression model is being used. For example, is it being used to simply indicate which characteristics are the important driving features in a particular market segment? Or does the model’s use also include estimating the likely value(s) for a type of or a specific property? If being used to estimate a value, the valuer should understand and disclose whether the model was used in isolation for its own end (such as to calculate a prediction of a subject property’s value) and/or in tandem with other approaches to value (such as to estimate adjustment rates in a market approach).
Note: This document is a summary of considerations related to using regression in real estate valuation; it is not comprehensive, nor is it a set of standards established, required, or endorsed by any organization or agency.