Process, application, and communication of regression modeling in valuing real estate

Regression software has become more and more a part of developing real estate property value opinions. As a result, it is paramount to understand the differences in tools available, and the elements of the regression process that are necessary for the valuer to understand and to disclose in a report.

While guidance is available related to the general use of AVMs and statistical tools, currently there is limited detailed guidance that speaks exclusively to the specifics of regression modeling. There is little instruction on evaluating the credibility and reliability issues that surround: the variety of features in regression tools, the process of developing a regression model, and the communication of the conclusions to the user. Below are important factors for the valuation professional to consider when evaluating the choices of software, and when generating, applying, and communicating regression results.

General knowledge of the regression software’s “inner workings”.

The valuer should understand and disclose the overall processes used by the tool and the type of regression employed. In addition, the valuer should understand any built-in equations, parameters, algorithms, and “givens”, which are automated and out of the control of the human valuer. How do those automated operations affect the process and results? For example, if the software automatically subtracts an estimated land value, automatically removes certain types of transactions, or automatically applies increases or decreases to prices based on market changes since the transaction date, how exactly is the software making those determinations and what is the impact on the final analysis?

Plain-language explanations.

It is likely that the user of a valuation report (and quite likely the valuer him or herself) is not a statistician or mathematician. Terms like R-squared, p-values, coefficients, ranges, intercepts, and other words related to regression should be explained so the user can understand the content of the report. This is not to say that specialized terms should never be used, but rather that they should at least be defined in a clear manner.

Data: source and delineation.

What data is brought into the regression software, and what is the source of that data? Consider and disclose the geographic area, transaction dates, physical and transactional characteristics, the size of the pool of sales, and other delineators. The valuer should understand and disclose what data is initially imported to the software, as well as how (or, if) additional filters are applied to that data. For example, if all residential sales from City A are imported to the software, and is then filtered to analyze only manufactured houses within a certain neighborhood, those constraints on the data should be understood and disclosed, with the implication that the resulting data set is relevant to the subject. Additionally, if data is supplied by a third-party, are the make-up and origin(s) of that data understood, and is the data thorough and adequate for a reliable analysis?

Excluding data.

Beyond the general filtering of data, the valuer should understand and disclose the rationale for the exclusion of other sales. Reasoning could include the sales being outliers, having unreliable or incorrect information, or other reasons. For example, were some properties remarkably large, small, old, new, high-quality, low-quality, or did they have data that was legitimately suspect or otherwise flawed which could not be corrected? Did manually excluding the data result in a regression model that was improved and more relevant to the subject than it would have been?

Pool size.

The valuer should understand and disclose the size of the pool of sales being analyzed. For example, is the pool so small that conclusions are not reliable or the pool so large that conclusions are not relevant to the subject, to the point the reliability and/or credibility of the model are affected? Does the pool size influence how many independent variables should be (or were) used?

Variables used.

Independent variables in a real estate regression model are the physical and/or transactional characteristics of the properties that are determined to affect (or not affect, in some cases) the sales price (the dependent variable). As such, the independent variables are of utmost importance to the analysis. The valuer should understand and disclose the method and rationale behind the choice of the specific variables used. If the selection of the variables was partially or fully in the control of the software, or was otherwise limited by the software, the valuer should have a general understanding of the software’s parameters that are used for such determinations, with disclosure in the report. Has the software so limited the selection of variables or forced the exclusion or inclusion of certain variables, to the point that the outcome of the regression model is affected?

Absent variables.

The valuer should consider if any independent variables are likely absent from the model, such as due to limitations in the data source or built-in parameters of the software. Absent variables might be evidenced by a low R-squared figure or other known factors, and might include characteristics that are not accounted for, such as condition, quality, location factors, or other characteristics. The valuer should consider and disclose how these circumstances may shape the reliability of the model and the valuer’s application of the model’s output. The valuer’s understanding of variables that are absent from the regression model can influence the valuer’s use and application of the output.

Testing the model.

The valuer should test the conclusions drawn from the regression model. Tests might include visual scatter graphs, to illustrate the degree of accuracy of the regression output for the pool of sales analyzed. Tests also might include applying the adjustment-rate conclusions to actual market sales (such as sales specifically comparable to the subject), to illustrate the comparison of the model’s predictions to the sales’ actual sales prices. Comparing regression output to other market data, such as comparing regression figures to paired sales figures, can be a useful process to test and reconcile reliable and credible conclusions.

Application, and other approaches to value.

The valuer should understand and disclose the extent and purpose for which the regression model is being used. For example, is it being used to simply indicate which characteristics are the important driving features in a particular market segment? Or does the model’s use also include estimating the likely value(s) for a type of or a specific property? If being used to estimate a value, the valuer should understand and disclose whether the model was used in isolation for its own end (such as to calculate a prediction of a subject property’s value) and/or in tandem with other approaches to value (such as to estimate adjustment rates in a market approach).

Note: This document is a summary of considerations related to using regression in real estate valuation; it is not comprehensive, nor is it a set of standards established, required, or endorsed by any organization or agency.

Paired Sales Are a Joke …

“Paired Sales are a joke” …So some folks say.


Okay, I agree that paired sales are not the be-all and end-all of adjustments, and that there are other methods for supporting adjustments, but the paired sales method really can be valuable.


Yes, paired sales are much more accurate in a textbook or in a state board’s newsletter than in an actual real-world environment. It’s wonderful if you can say “these two properties are exactly alike, except this one has a three-car garage, and that one has a two-car garage – so that means a third bay is worth $10,000”. The drawback, of course, is that it’s often difficult to find two properties similar enough (let alone, exact!) to even begin to isolate just one single characteristic for establishing adjustments. So, should we just give up on paired sales?


I think there are a few misunderstandings surrounding paired sales. Paired sales are not designed to arrive at an exact rate of adjustment to be used for each and every property you come across in each and every assignment. Simply because you calculate a $43.7982 per sq.ft. adjustment rate based on two sales doesn’t mean you’re going to use that exact rate for every assignment. Perhaps that support is only valid in a certain subdivision, or to new construction, or in a certain market area, or for certain age properties. If you determine a third garage bay is worth $15,000 for a certain segment of properties, will it also be worth $15,000 in so-called higher-end houses? Maybe, maybe not. Perhaps the $15,000 represents a percentage, not a flat dollar amount. Just like anything else, paired sales require analysis, not just calculations.


numbers and finance

And we won’t ever find the perfect textbook example of identical-except-for-one-characteristic paired sales. But is this reason to abandon the process? Paired sales will undoubtedly require adjustments in order to isolate a single characteristic. For example, maybe one is on a larger site, but you can reasonably adjust for site size differences, and then isolate the sq.ft. rate. Or, perhaps one property has slightly different GLA, so you qualitatively know that the difference in sale prices isn’t entirely attributed to the two-versus-three bay garages, but you can still draw a conclusion regarding if and how much value a third garage bay adds still taking into account that their GLA vary slightly.


Additionally, one paired sale is not the end of the process. Over time, we should be collecting, analyzing and storing the studies so that we can develop ranges of market-supported adjustments. Maybe our data indicates a garage bay is worth between 5% and 15% of the value for a particular property type in a particular area. So, where in that range does the appropriate adjustment fall for a specific assignment? …You are the human market expert qualified to develop the reasoning to support the specific adjustment you use in a report, which will most likely be based on the range your research indicates.


Paired sales will not show an exact number or percentage. So, what do they do? What use are they? First, they establish that there is (or is not) a value contribution associated with a specific characteristic. Second, they provide a basis for the adjustments that we end up making. A basis does not mean “a number to blindly follow without thinking about it”! A basis means our adjustments are logically grounded in real analysis, and not simply pulled out of thin air.



Of course, there are other methods of supporting adjustments: regression, market interviews and similar, which can be especially helpful in rural and non-homogeneous markets. Yes, only in textbooks are paired sales perfect. But to simply write them off because they are imperfect in the real world ignores the real evidence and support they can provide.


– Let me know what your experience is!

This article was first published from here.

Market Trend Analysis: Playing a Good Game

Both of my children play soccer, just as my father and I did. And as with any game, you have good games and bad games. But what is a “good game” really? What does it mean to “play well”, and do we mean as a team or as an individual player? If I say I “had a good season”, some folks might assume my team won most of its games, others might think we at least finished off the season strong, and others may even believe that I mean I improved my performance compared to the previous season. A seemingly-simple phrase or word can be interpreted numerous ways and really isn’t terribly meaningful on its own.


Business people planning


So, let’s step into appraising… “Declining”, “Increasing”, “Stable” and “Unstable” are wonderful places to start in the reporting of a market analysis, but they are certainly not good places to stop: more information is needed to report an understandable report. What will our Users believe we mean by “Increasing”, “Stable” or other trend-related words?


Many times (notably in mortgage-related work) we are not using these terms to forecast or to predict future market conditions. To avoid our Users’ believing we’re making predictions, we need to explain that we are using these terms to describe the market leading up to the effective date, and that those trend conclusions help to direct our time/date-of-sale adjustments (and possibly other areas of our analysis). Of course, in some assignments such as relocation-related work, we may in fact be developing forecasts.

3d- bar increasing


For our User’s sake, we need to define what exactly we mean by these types of terms. Just like we include a definition of Market Value and UAD codes and abbreviations, why not do the same with market conditions? Are we explaining what terms like “Decreasing”, “Increasing”, “Stable” and “Unstable” mean within the context of our report, what indicators and measurements we’ve analyzed to come to those conclusions, and how those conclusions impact other portions of our report? Without such explanation, we risk our User thinking “Increasing” means the market is going to continue increasing, that median sale prices have increased consistently over the past 12 months, or other interpretations that we possibly aren’t intending to make.


From reviewing appraisals, we can see these types of explanations are not always included in appraisal reports. So set yourself apart and include your own reliable explanations of these terms, citing available resources related to market conditions and trend analyses. For a start, take a look at HUD Mortgagee Letter 2009-09, which briefly defines a declining market. The Appraisal Foundation’s 2012 Valuation Advisory 3 provides a good list of indicators which can be analyzed for market trend analysis.


Then, take the next step: take the statistics and analyses for your particular assignment and relate them back to your definitions and specifically to the development of your appraisal.


This article was first published from here.