Getting the required libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(htmltools)
library(glue)
## Warning: package 'glue' was built under R version 3.6.2
## 
## Attaching package: 'glue'
## The following object is masked from 'package:dplyr':
## 
##     collapse
library(robotstxt)
library(rvest)
## Loading required package: xml2
library(stringr)
library(purrr)
## Warning: package 'purrr' was built under R version 3.6.2
## 
## Attaching package: 'purrr'
## The following object is masked from 'package:rvest':
## 
##     pluck
library(httr)
## Warning: package 'httr' was built under R version 3.6.2
library(readr)
## 
## Attaching package: 'readr'
## The following object is masked from 'package:rvest':
## 
##     guess_encoding
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(DT)
library(muckrakr)
library(readxl)
library(XLConnect)
## XLConnect 1.0.1 by Mirai Solutions GmbH [aut],
##   Martin Studer [cre],
##   The Apache Software Foundation [ctb, cph] (Apache POI),
##   Graph Builder [ctb, cph] (Curvesapi Java library)
## http://www.mirai-solutions.com
## https://github.com/miraisolutions/xlconnect
library(rmiscutils)
## 
## Attaching package: 'rmiscutils'
## The following object is masked from 'package:dplyr':
## 
##     add_row

First, this is the final report we got from an Ohio Public Records Request from the Ohio Secretary of State’s Office. This is the last summary of absentee ballots sent from counties to the Ohio Secretary of state.

setwd("~/Code/Ballots2020/Responsive Records/Weekly Absentee Reports")

Absentee_Survey <- rio::import("3-17-20 -Absentee Survey 9.xlsx")

Some of these counties appear twice:

Absentee_Survey %>% 
  group_by(CountyName) %>% 
  summarize(total=n()) %>% 
  arrange(desc(total))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 88 x 2
##    CountyName total
##    <chr>      <int>
##  1 Adams          2
##  2 Highland       2
##  3 Jefferson      2
##  4 Montgomery     2
##  5 Perry          2
##  6 Putnam         2
##  7 Allen          1
##  8 Ashland        1
##  9 Ashtabula      1
## 10 Athens         1
## # … with 78 more rows

Per our conversation with the secretary of state’s office, that’s because their procedure is to directly edit the last sheet, add a new row, and delete the old one. So if any appear twice, it must be because the old one was not deleted. So we are going to take the newer versions of the duplicated counties, based on the timestamp.

Second update: Montgomery’s row with “submitted too early” actually has a timestamp that is later and has fewer votes than the other one so that must be a typo as well.

Absentee_Survey <- Absentee_Survey %>% 
  slice(-62, -36, -61,-77,-63,-72)


Absentee_Survey$Non_UOCAVA_Ballots_Requested_and_Cast_In_Person <- number_with_commas(Absentee_Survey$Non_UOCAVA_Ballots_Requested_and_Cast_In_Person)

Absentee_Survey$NonUOCAVA_Ballots_Returned_Submitted_for_Counting <- number_with_commas(Absentee_Survey$NonUOCAVA_Ballots_Returned_Submitted_for_Counting)

Now we want to total the number of ballot applications that were counted as “invalid” in total and by party to we can compare it to the total number of applications submitted.

UOCAVA is short for “The Uniformed and Overseas Citizens Absentee Voting Act.”

Here is the total invalid UOCAVA and nonUOCAVA applications:

Invalid_Total <- sum(Absentee_Survey$UOCAVA_Invalid_Applications) + sum(Absentee_Survey$NonUOCAVA_Invalid_Applications)

sum(Absentee_Survey$UOCAVA_Invalid_Applications) + sum(Absentee_Survey$NonUOCAVA_Invalid_Applications)
## [1] 72077

Here is the total valid UOCAVA and nonUOCAVA applications:

Valid_Total <- sum(Absentee_Survey$UOCAVA_Valid_Applications) + sum(Absentee_Survey$NonUOCAVA_Valid_Applications)

sum(Absentee_Survey$UOCAVA_Valid_Applications) + sum(Absentee_Survey$NonUOCAVA_Valid_Applications)
## [1] 1710004

What percentage was invalid?

Invalid_Total/(Invalid_Total+Valid_Total)
## [1] 0.04044541

How many total?

Valid_Total+Invalid_Total
## [1] 1782081

How many total ballots were transmitted?

sum(Absentee_Survey$UOCAVA_Ballots_Transmitted)+sum(Absentee_Survey$NonUOCAVA_Ballots_Transmitted)
## [1] 1705724

How many were returned and submitted for counting?

sum(Absentee_Survey$UOCAVA_Ballots_Returned_Submitted_for_Counting)+
  sum(Absentee_Survey$NonUOCAVA_Ballots_Returned_Submitted_for_Counting)+
  sum(Absentee_Survey$UOCAVA_Ballots_Requested_and_Cast_In_Person)+
  sum(Absentee_Survey$Non_UOCAVA_Ballots_Requested_and_Cast_In_Person)
## [1] 1459560

Okay next we are looking at Absentee Ballot Reports. We got these from a Public Records Request from the Secretary of State’s office. Each county emailed the reports to the SOS but not every day. These are mostly requested by parties.

We requested:

First, a caveat:

There are two labeled “78_04232020_163107.” One is in the April 24 folder, the other is in the April 23 folder. Both are the exact same. It’s not clear if we are missing a file or not.

56_0141620_162040 is in the April 17th folder and in the April 16th folder. Again they are the exact same. Again it’s not clear if we are missing a file or they only sent a report on the 16th and that got included in the 17th for some reason.

File 57_040320_090906 had too few columns. I added the country column.

Per the Secretary of State, each individual county will have to be consulted for data discrepancies.

I’m going to combine these all and look for complete duplicates versus just duplicate names and IDs. So we can sort through those just entered twice and those who had their ballot applications rejected and who applied again. But if we are missing files, and I think based on this naming pattern that we are missing a couple beyond the ones who just didn’t send reports everyday and there’s nothing I can do about that from a data standpoint. (We can request it again.)

First we are going to switch to the directory where we put all the absentee ballot reports. Then we combine them.

Note: A bunch of these contain some really old xls files that would not work with read_excel. I tried several other packages, such as the (XLConnect)[https://stackoverflow.com/questions/60609875/r-read-excel-libxls-error-unable-to-parse-file] one which worked for a while, but loading htem all overloaded the java memory. Increasing it did nothing. Long story short, we had to open a bunch of them and resave them as xlsx files.

setwd("~/Code/Ballots2020/Absentee Ballot Reports/Absentee Ballot Reports 4-3-2020 to 4-25-2020")



list_of_absentee_files_csv <- list.files(path = ".", recursive = TRUE,
                            pattern = "\\.csv$" , 
                            full.names = TRUE)

absentee_csv <- list_of_absentee_files_csv %>%
  purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
  purrr::map_df(read_csv, 
                .id = "FileName", skip =1,   
                                                col_names = c("county_id", 
                                                              "state_voter_id",
                                                  "county_voter_id",
                                                  "first_name",
                                                  "middle_name",
                                                  "last_name",
                                                  "suffix",
                                                  "in_care_of_line",
                                                  "av_mail_address",
                                                  "av_mail_address_2",
                                                  "av_mail_city",
                                                  "av_mail_state",
                                                  "av_mail_zip",
                                                  "av_mail_country",
                                                  "av_mail_postal",
                                                  "av_ballot_type",
                                                  "av_app_recd_date",
                                                  "run_date"), 
                col_types=cols('av_mail_country'=col_character(),
                               'av_mail_address_2'=col_character(),
                               `av_mail_postal`=col_character(),
                               `county_id`=col_character(),
                               `county_voter_id`=col_character(),
                               `av_mail_zip`=col_character()))
## Warning: 1 parsing failure.
##  row             col           expected            actual                       file
## 1792 in_care_of_line 1/0/T/F/TRUE/FALSE HOUSTON HOUSE 507 './09_04062020_181849.csv'
## Warning: 2 parsing failures.
##  row             col           expected            actual                       file
## 1062 in_care_of_line 1/0/T/F/TRUE/FALSE SCOTT HALL        './09_04172020_171500.csv'
## 4271 in_care_of_line 1/0/T/F/TRUE/FALSE HOUSTON HOUSE 507 './09_04172020_171500.csv'
## Warning: 4 parsing failures.
##   row             col           expected actual                     file
## 34233 av_mail_address delimiter or quote      E './31_040320_185920.csv'
## 34233 av_mail_address delimiter or quote        './31_040320_185920.csv'
## 36383 av_mail_address delimiter or quote      K './31_040320_185920.csv'
## 36383 av_mail_address delimiter or quote        './31_040320_185920.csv'
## Warning: 4 parsing failures.
##  row col   expected     actual                            file
## 1013  -- 18 columns 19 columns './62_040320_160610_Ottawa.csv'
## 1546  -- 18 columns 19 columns './62_040320_160610_Ottawa.csv'
## 2438  -- 18 columns 19 columns './62_040320_160610_Ottawa.csv'
## 2780  -- 18 columns 22 columns './62_040320_160610_Ottawa.csv'
## Warning: 1 parsing failure.
## row col   expected    actual                     file
##   1  -- 18 columns 1 columns './72_040720_165248.csv'
## Warning: 1811 parsing failures.
## row col   expected     actual                     file
##   1  -- 18 columns 19 columns './88_040320_171842.csv'
##   2  -- 18 columns 19 columns './88_040320_171842.csv'
##   3  -- 18 columns 19 columns './88_040320_171842.csv'
##   4  -- 18 columns 19 columns './88_040320_171842.csv'
##   5  -- 18 columns 19 columns './88_040320_171842.csv'
## ... ... .......... .......... ........................
## See problems(...) for more details.
list_of_absentee_files_xlsx <- list.files(path = ".", recursive = TRUE,
                            pattern = "\\.xlsx$", 
                            full.names = TRUE)

absentee_excel <- list_of_absentee_files_xlsx %>%
  purrr::set_names(nm = (basename(.) %>% tools::file_path_sans_ext())) %>%
  purrr::map_df(read_excel, 
                .id = "FileName", skip =1,col_names = c("county_id", 
                                                              "state_voter_id",
                                                  "county_voter_id",
                                                  "first_name",
                                                  "middle_name",
                                                  "last_name",
                                                  "suffix",
                                                  "in_care_of_line",
                                                  "av_mail_address",
                                                  "av_mail_address_2",
                                                  "av_mail_city",
                                                  "av_mail_state",
                                                  "av_mail_zip",
                                                  "av_mail_country",
                                                  "av_mail_postal",
                                                  "av_ballot_type",
                                                  "av_app_recd_date",
                                                  "run_date"), 
                col_types=c("text"))

#It was just easier to import these as text and then change the format. 
absentee_excel$av_app_recd_date <- as.numeric(absentee_excel$av_app_recd_date)
absentee_excel$av_app_recd_date <- janitor::excel_numeric_to_date(absentee_excel$av_app_recd_date, date_system = "modern")
absentee_csv$av_app_recd_date <- as.Date(absentee_csv$av_app_recd_date, format = "%m/%d/%Y")
absentee_SOS <- rbind(absentee_excel, absentee_csv)

absentee_SOS$run_date <- as.Date(absentee_SOS$run_date, format = "%m/%d/%Y")

How many votes are in each file?

number_of_records_in_each_file <- absentee_SOS %>% 
  group_by(FileName) %>% 
  summarize(Total=n())
## `summarise()` ungrouping output (override with `.groups` argument)
datatable(number_of_records_in_each_file)