Getting the Data

This is downloaded from the County Auditor and Recorder’s Data. And some FOIA data. This can be any spreadsheet of county data as long as it has the parcel number, mortgage amount, tax district, effective rate, owner name, sale date, and (Auditor’s) market value.

Note: This is only commerical properties, in the Commerical/Industrial land use codes. The mortgages are incomplete. (See our methodology for more.)

#importing data
Commercial_Properties <- rio::import("Cuyahoga Properties last verified 8-07-19.csv") #OR wherever your file is
#Exempt properties taken out, as per auditor's edit. 
Commercial_Properties <- filter(Commercial_Properties, NET_TAX_OWED_1ST>0)

Calculating Baseline

What is the amount owed to each county?

Note: in theory we would just add all the taxes_paid and there’s always going to be a certain percentage of people behind on their taxes, and we’re not interested in that. So here we are calculating the total amount to taxes collected in theory.

Second Note: Each year, new properties are built. This doesn’t decrease the tax rate. It just adds to the amount of property tax collected. So this is essentially a snapshot.

Third caveat: We collected this data over several months so it’s always going to be slightly out of date, as people are constantly buying, building, and selling new properites. We’re interested in the best estimate.

Commercial_Properties <- mutate(Commercial_Properties, Amount_Owed_to_County=market_value*EFFECTIVE_RATE*.35*.001)

How much does each district raise?

TAXSET_DESCR Total_Per_District
Bay Village 1.280771e+06
Beachwood 2.824884e+07
Beachwood/Warrensville 8.546389e+06
Bedford 8.965521e+06
Bedford Hts. 9.293173e+06
Bedford Hts./Orange 2.457844e+06
Berea 5.299954e+06
Berea/Olmsted Falls 2.231797e+06
Bratenahl 2.151762e+05
Brecksville 8.813260e+06
Broadview Hts. 7.086926e+06
Broadview Hts./North Royalt 1.706686e+06
Brook Park 1.121132e+07
Brook Park/Cleveland 2.066731e+06
Brooklyn 1.208523e+07
Brooklyn Hts. 2.930425e+06
Chagrin Falls Village 2.626383e+06
Cleveland 2.512801e+08
Cleveland Hts. 1.674208e+07
Cleveland Hts./East Clevela 9.469686e+04
Cleveland/Berea 5.071719e+06
Cleveland/Shaker Hts. 2.289971e+06
Cuyahoga Hts. 3.474949e+06
East Cleveland 4.290780e+06
Euclid 2.030490e+07
Fairview Park 4.467671e+06
Fairview Park/Berea 3.820022e+04
Fairview Park/Rocky River 2.825286e+06
Garfield Hts. 1.107728e+07
Garfield Hts./Cleveland 1.970821e+05
Gates Mills 2.033804e+05
Glenwillow 4.499879e+06
Highland Hills Village 2.239329e+06
Highland Hts. 6.537852e+06
Hunting Valley 5.094720e+03
Independence 1.515705e+07
Lakewood 2.146778e+07
Linndale 1.253786e+05
Lyndhurst 1.062515e+07
Maple Hts. 7.855830e+06
Mayfield Hts. 2.277334e+07
Mayfield Village 8.836303e+06
Middleburg Hts. 1.806218e+07
Moreland Hills/Orange 5.204357e+05
Newburgh Hts. 5.491098e+05
North Olmsted 2.678409e+07
North Randall 4.540983e+06
North Royalton 1.076495e+07
Oakwood 4.320497e+06
Olmsted Falls 1.812459e+06
Olmsted Falls/Berea 1.720302e+03
Olmsted Township 4.857666e+06
Orange 3.187011e+06
Orange/Warrensville 8.748185e+05
Parma 2.718912e+07
Parma Hts. 6.679992e+06
Pepper Pike 3.132739e+06
Pepper Pike/Beachwood 8.520099e+03
Richmond Hts. 4.915983e+06
Richmond Hts./South Euclid 1.167779e+06
Rocky River 1.132576e+07
Seven Hills 2.419507e+06
Shaker Hts. 1.078308e+07
Solon 2.584266e+07
Solon/Orange 8.388176e+05
South Euclid 4.756233e+06
South Euclid/Cleveland Hts. 2.741612e+06
Strongsville 3.084553e+07
University Hts. 5.038997e+06
Valley View 6.250233e+06
Walton Hills 2.786994e+06
Warrensville/Orange 7.024654e+05
Westlake 3.274093e+07
Woodmere 3.802357e+06

How much is that altogether?

  sum(na.omit(Amount_Due_Per_Tax_District$Total_Per_District))
## [1] 797792740

That same number, but written as currency with commas so you can read:

scales::dollar(.Last.value)
## character(0)

Putting single purchases together

First, our data is in the wrong format. When the date, owner, and amount are the same, the sale amount is actually for ALL the properties together. But there are different parcel numbers because large properties are usually composed of several parcels next to each other.

So for example, parcels 123, 456, and 789 are are listed as sold to Company LLC on June 22, 2019 for 6 million dollars. But each is not worth 6 million. They are worth 1, million, 2 million, and 3 million dollars respectively. All the properties together are worth valued at 6 million dollars, BUT the mortgage amount field on each reads 6 million dollars. And we don’t want to accidentally log it as 18 million dollars.

So here we will group by owner, date of mortgage, and the mortgage amount. But then we run into another problem: Some of these have multiple tax districts:

Looking_For_Sort_of_duplicates <- Commercial_Properties %>% 
  group_by(GranteeName,RecordedDate,mortgage_amount) %>% 
  count(TAXSET_DESCR)  
datatable(Looking_For_Sort_of_duplicates)
## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

If you sort by n (which stands for the number of tax districts,) you can see that each purchase goes from 1 to 190,000 properties, though thankfully most are just 1. This really complicates our analysis, though, because we need the tax district to get the effective rate which is what we will multiply our new value by.

So first, we need to create a proportion for our tax district. A purchase that is in just one tax district will have a multiplier of 1. The rest will be a proportion of the current value.

For example, let’s say we purchased an LLC business for one million dollars (the mortgage) on two parcels of land in two tax districts and it is on the books as being 500,000 dollars total, (the Auditor’s market value.) Parcel A is listed as being worth 100,000 dollars in tax district 1 and Parcel B is listed being worth 400,000 dollars in tax district 2. Then we would have a parcel A multiplier of .2 (or 20 percent of the value) and parcel B having a multiplier of .8 (eighty percent of the value.)

Commercial_Properties <- Commercial_Properties %>%
  group_by(GranteeName,RecordedDate,mortgage_amount) %>% 
  mutate(Auditors_Total_Value_of_Parcels =sum(market_value)) %>%  #This creates a total for each transaction
  ungroup() %>% 
  mutate(Share_of_Market_Value = market_value/Auditors_Total_Value_of_Parcels) #This creates the share of the value
#Some parcels are valued at Zero, and therefore, dividing by 0 gives you NA. Which screws up our calculations further on. So for the 173 NAs here we will just set them to zero
Commercial_Properties$Share_of_Market_Value[is.na(Commercial_Properties$Share_of_Market_Value)] <- 0

Okay now we are looking for outliers.

In the Athens apartment case that is our example, (https://realestate.cleveland.com/realestate-news/2018/08/big_big_money_legal_loopholes.html) the mortgage was for so much more than the assessed value because it hadn’t be sold in like ~20 years. Every year the state takes the property’s past value, and estimates what someone would want to buy it for, given how many bedrooms and square footage and the neighborhood and whatnot. But really, the value of something is how much someone else wants to pay for it. When the missing tax value is because it was an LLC transfer, that’s what we are looking for.

Assessors generally try to assess a property at about 80 to 90 percent of its “true value” to allow for property prices to ebb and flow a bit. So in our case, we are going to see if a mortgage is 20 percent more than its listed value.

Looking for Outliers

#Creating a column that shows what twenty percent more is
Commercial_Properties <- Commercial_Properties %>% 
  group_by(GranteeName, RecordedDate, mortgage_amount) %>%
  mutate(Twenty_Percent_More = (Auditors_Total_Value_of_Parcels+.2*Auditors_Total_Value_of_Parcels)) %>% 
#Creating a column that shows if the mortgage value is twenty percent more than the property value. If a bank is willing to bet a significant amount more than the property value, then that is tax dollars lost. 
  mutate(Is_This_A_Potential_Outlier = ifelse(mortgage_amount > Twenty_Percent_More, "TRUE", "FALSE" )) %>% 
  ungroup()

Verification

Now this is the part where we want to certify that all our numbers are correct.

For example, in Warren County parcel numbers 15045020040, 16365020010, 16365020011, 12205020010, 12355020010, 11065020010, and 12145020020 have a mortgage of 2.55 million but their auditor’s “market value” is only ~125K. Why would a bank give a railroad 2.55 million dollars for property only worth $125K? They wouldn’t.

If you look at the recorder’s info, this is a mortgage for not just Warren county but the whole railroad up from Cincy to Columbus And fixtures, too. Probably trains in this case.

Later note: this wasn’t in our calculations anyway, but utilities are taxed by the state and not the county as it turns out.

So here we export the data, create a “verified” column, and re-import it.

#So uncomment this to export
rio::export(Commercial_Properties, "Cuyahoga Commercial Properties.csv")

Data Verification Part II

This gets exported, then a human looks at the biggest outliers to see if they are true, or if they are like the example above. Then the file is re-imported. You can skip the data verification part if this doesn’t apply.

#uncomment this and add your new file name with the amounts you have verified. (Via human)
#Commercial_Properties <- rio::import("Cuyahoga_Commerical_Properties_With_2016_to_2019_mortgages.csv") #Or whatever your new, verified filename is.

Calculating the new values

Now we want to take the mortgage amount as the market amount, assuming we have verified it.

So this says, if this is a potential outlier, and we have verifed that the outlier is true, then take the mortgage amount as the value. If not take the market value.

Commercial_Properties$mortgage_amount <- as.numeric(Commercial_Properties$mortgage_amount)
## Warning: NAs introduced by coercion
#First we need to set the NAs in mortgage amount to zero or we get an error further down when we are calculating things from it
Commercial_Properties$mortgage_amount[is.na(Commercial_Properties$mortgage_amount)] <- 0

#This was bugging as not logical for some reason, though the answers were all true or false.
#Commercial_Properties$Is_This_A_Potential_Outlier <- type.convert(Commercial_Properties$Is_This_A_Potential_Outlier, na.strings = "NA", logical)

Commercial_Properties <- Commercial_Properties %>% 
  mutate(Mortgage_Share=mortgage_amount*Share_of_Market_Value) %>% 
  mutate(True_Value=ifelse(Is_This_A_Potential_Outlier=="TRUE" & Verified!="RED", Mortgage_Share, market_value))


#This looks duplicative, and it is, but if not we introduce NAs by coercian
Commercial_Properties <- Commercial_Properties %>% 
  group_by(GranteeName, RecordedDate, mortgage_amount) %>%
  mutate(Twenty_Percent_More = (Auditors_Total_Value_of_Parcels+.2*Auditors_Total_Value_of_Parcels)) %>% 
  mutate(Is_This_A_Potential_Outlier = ifelse(mortgage_amount > Twenty_Percent_More, "TRUE", "FALSE" )) %>% 
  ungroup()
Commercial_Properties <- Commercial_Properties %>% 
  mutate(Mortgage_Share=mortgage_amount*Share_of_Market_Value) %>% 
  mutate(True_Value=ifelse(Is_This_A_Potential_Outlier=="TRUE" & Verified!="RED", Mortgage_Share, market_value))

Recalculating potential taxes

How much COULD the county raise if we knew the value of LLC businesses?

What was the value of taxable assets before and after finding more value?

#First, any True_Values that are 
scales::dollar(sum(Commercial_Properties$market_value))
## [1] "$24,995,537,400"
Commercial_Properties$True_Value[is.na(Commercial_Properties$True_Value)] <- 0
scales::dollar(sum(Commercial_Properties$True_Value))
## [1] "$33,998,119,121"

Calculating each district

What would the new tax rate be compared to the old tax rate?

First, we can’t average the rates, as shown by this very high-tech post-it: Can’t average the rates Sorry this is badly labeled, which I realized later. But essentially, on the top is the original calculation. There are two parcels worth 100 and 200 dollars. Both are in a district where the tax rate is .5. If we later determine that those parcel are actually worth 200 and 300, then the district is still going to collect 50 and 100 respectively. But the tax rate for each goes down.

We calculated the proportion that each parcel of land is, because sometimes businesses bought parcels that were in multiple tax districts. But to figure out the new tax rate, we will have to determine the amount per district. And then determine the new tax rate per district.

What’s the median property value in each tax district? What do they pay now? how much would they pay if we knew the true value of LLC transfers?

Tax_Rate_Comparison <- Commercial_Properties %>% 
  group_by(TAXSET_DESCR) %>% 
  summarize(
    Total_Per_District =sum(Amount_Owed_to_County),
    New_Taxable_Value=sum(True_Value*.35),
    Old_Rate=mean(EFFECTIVE_RATE)*.001,
    New_Rate=(Total_Per_District/New_Taxable_Value),
    Median_Property_Value=median(market_value),
    Current_Median_Tax_Bill=(Median_Property_Value*Old_Rate),
    Possible_Median_Tax_Bill=Median_Property_Value*New_Rate,
    Amount_Overpaid_By_Median_Owner=Current_Median_Tax_Bill-Possible_Median_Tax_Bill)
datatable(Tax_Rate_Comparison)

This looks at the summary of what the median overpaid. It ranges from nothing to several thousand dollars. And again that’s just the median. We could enter find any property and figure out what they would pay.

summary(Tax_Rate_Comparison$Amount_Overpaid_By_Median_Owner)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0    87.7  2927.8  2410.7 31736.0