Title: | Gather Data from GDOT's GeoPI Database |
---|---|
Description: | rGeoPI allows the user to easily interface with GDOT's GeoPI repository. This includes project information, phase years and costs, project geometry, and document information, returned in data frames. |
Authors: | Freyja Brandel-Tanis [aut, cre] , Modern Mobility Partners, LLC [cph] |
Maintainer: | Freyja Brandel-Tanis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-01-17 05:08:49 UTC |
Source: | https://github.com/ModernMobilityPartners/rgeopi |
null-coalescing operator. See purr for details.
lhs %otherwise% rhs
lhs %otherwise% rhs
lhs |
legt hand side |
rhs |
right hand side |
## Not run: a %otherwise% b ## End(Not run)
## Not run: a %otherwise% b ## End(Not run)
GDOT PIs are seven character IDs formatted as 1) numbers with leading zeros, 2) six numbers and a "-", and 3) six numbers with a leading M or S. Checks the format to skip running a GeoPI call if the project ID is not possible for the system.
check_pi(gdot_pi)
check_pi(gdot_pi)
gdot_pi |
GDOT PI. If not seven digits, will be padded if it's numeric |
True/False
check_pi("0012345") check_pi("546540-") check_pi("M023424") check_pi("FAKE523")
check_pi("0012345") check_pi("546540-") check_pi("M023424") check_pi("FAKE523")
Function for checking robots.txt file
check_rtxt(url, delay, user_agent, force, verbose)
check_rtxt(url, delay, user_agent, force, verbose)
url |
web address for download |
delay |
default delay |
user_agent |
user agent string |
force |
force re-downloading of robots.xtx |
verbose |
logical |
'geopi_session' masks 'polite::bow', creating a polite session with GeoPI. This can be plugged into any of the 'get_geopi' family of functions.
geopi_session(...)
geopi_session(...)
... |
any arguments passed to 'polite::bow' can be passed to |
object of class 'polite', 'session'
session <- geopi_session() session
session <- geopi_session() session
Get the available GeoPI data for a project. Adds the field 'Gather.Date' with the current date to document the date of extraction.
The function 'get_geopi_ef' is a more efficient version of 'get_geopi' that avoids pinging the GeoPi website excess times. It is a more recent development and may eventually replace 'get_geopi'.
get_geopi( gdot_pi, session = NULL, features = c("overview", "phases", "documents"), doc_mode = c("cr_only", "cr_check", "doc_summary"), geometry = FALSE, pi_check = TRUE, gather_date = NULL ) get_geopi_ef( gdot_pi, session = NULL, features = c("overview", "phases", "documents"), doc_mode = c("cr_only", "cr_check", "doc_summary"), geometry = FALSE, pi_check = TRUE, gather_date = NULL )
get_geopi( gdot_pi, session = NULL, features = c("overview", "phases", "documents"), doc_mode = c("cr_only", "cr_check", "doc_summary"), geometry = FALSE, pi_check = TRUE, gather_date = NULL ) get_geopi_ef( gdot_pi, session = NULL, features = c("overview", "phases", "documents"), doc_mode = c("cr_only", "cr_check", "doc_summary"), geometry = FALSE, pi_check = TRUE, gather_date = NULL )
gdot_pi |
GDOT Project ID. Can take a list or vector of IDs. |
session |
If NULL (the default), creates a new session. Can provide a session made with 'polite::bow'. In line with polite/ethical scraping, you can use an existing session or the function will create one for you. |
features |
Project features desired to retrieve. Can choose between "overview" (project name, description, etc.), "phases" (phases, their years, and money allocated), and "documents" (information about what documents GeoPI has for the project). |
doc_mode |
If "documents" is chosen, the doc_mode conveys what information to retrieve. Options are "cr_only" (description of all files under "approved concept reports"), "cr_check" (simple TRUE/FALSE for if the project has approved concept reports), and "doc_summary" (the name, file path, and type of all project documents). |
geometry |
if FALSE (the default), do not return spatial date. if TRUE, uses the 'get_geopi_sf' function to add a sf tibble named "geometry" to the output list. |
pi_check |
Check if the PI is a valid format. Defaults to TRUE. |
gather_date |
Date information is gathered from GeoPI. Defaults to today. |
rgeopi can only access the first 50 phase records. Having more than 50 phases for a single project is rare.
a list of tibbles
## Not run: get_geopi(gdot_pi = "0000820", doc_mode = "cr_check") ## End(Not run)
## Not run: get_geopi(gdot_pi = "0000820", doc_mode = "cr_check") ## End(Not run)
Check for Concept Reports
get_geopi_docs( gdot_pi, session = NULL, mode = c("cr_only", "cr_check", "doc_summary"), gather_date = NULL, pi_check = TRUE ) get_geopi_docs_ef( gdot_pi, page_scrape, dcmode = c("cr_only", "cr_check", "doc_summary"), gather_date = NULL )
get_geopi_docs( gdot_pi, session = NULL, mode = c("cr_only", "cr_check", "doc_summary"), gather_date = NULL, pi_check = TRUE ) get_geopi_docs_ef( gdot_pi, page_scrape, dcmode = c("cr_only", "cr_check", "doc_summary"), gather_date = NULL )
gdot_pi |
GDOT Project ID. Can take a list or vector of IDs. |
session |
If NULL (the default), creates a new session. Can provide a session made with 'polite::bow'. |
mode |
if "cr_only" (the default), returns the filepath and filenames of all approved concept reports. If "cr_check", merely returns T/F for each project if it has a concept report. If "doc_summary", returns the file name, type, and path for all project documents. |
gather_date |
Date information is gathered from GeoPI. Defaults to today. |
pi_check |
Check if the PI is a valid format. Defaults to TRUE. |
page_scrape |
Scraped GeoPI Page supplied by 'get_geopi_ef'. Only present in 'get_geopi_docs_ef'. |
dcmode |
if "cr_only" (the default), returns the filepath and filenames of all approved concept reports. If "cr_check", merely returns T/F for each project if it has a concept report. If "doc_summary", returns the file name, type, and path for all project documents. |
a tibble
## Not run: get_geopi_docs(gdot_pi = "0000820", mode = "cr_check") ## End(Not run)
## Not run: get_geopi_docs(gdot_pi = "0000820", mode = "cr_check") ## End(Not run)
Get Project Overview
get_geopi_overview( gdot_pi, session = NULL, gather_date = NULL, pi_check = TRUE ) get_geopi_overview_ef(gdot_pi, page_scrape, gather_date = NULL)
get_geopi_overview( gdot_pi, session = NULL, gather_date = NULL, pi_check = TRUE ) get_geopi_overview_ef(gdot_pi, page_scrape, gather_date = NULL)
gdot_pi |
GDOT Project ID. Can take a list or vector of IDs. |
session |
If NULL (the default), creates a new session. Can provide a session made with 'polite::bow'. |
gather_date |
Date information is gathered from GeoPI. Defaults to today. |
pi_check |
Check if the PI is a valid format. Defaults to TRUE. |
page_scrape |
Scraped GeoPI Page supplied by 'get_geopi_ef'. Only present in 'get_geopi_overview_ef'. |
a tibble
## Not run: get_geopi_overview(gdot_pi = "0000820") ## End(Not run)
## Not run: get_geopi_overview(gdot_pi = "0000820") ## End(Not run)
Get the Phase ID, Programmed Year, Date of Last Estimate, and Cost Estimate for a Project ID.
get_geopi_phase(gdot_pi, session = NULL, gather_date = NULL, pi_check = TRUE) get_geopi_phase_ef(gdot_pi, page_scrape, gather_date = NULL)
get_geopi_phase(gdot_pi, session = NULL, gather_date = NULL, pi_check = TRUE) get_geopi_phase_ef(gdot_pi, page_scrape, gather_date = NULL)
gdot_pi |
GDOT Project ID. Can take a list or vector of IDs. |
session |
If NULL (the default), creates a new session. Can provide a session made with 'polite::bow'. |
gather_date |
Date information is gathered from GeoPI. Defaults to today. |
pi_check |
Check if the PI is a valid format. Defaults to TRUE. |
page_scrape |
Scraped GeoPI Page supplied by 'get_geopi_ef'. Only present in 'get_geopi_phase_ef'. |
rgeopi can only access the first 50 phase records. Having more than 50 phases for a single project is rare. If the project has more than 50 records, the column "Surplus.Records" is added with the number of total records and pages.
a tibble
## Not run: get_geopi_phase(gdot_pi = "0000820") ## End(Not run)
## Not run: get_geopi_phase(gdot_pi = "0000820") ## End(Not run)
Get the spatial geometry of a GDOT project. 'get_geopi_sf' uses the httr2 package.
get_geopi_sf(gdot_pi, pi_check = TRUE)
get_geopi_sf(gdot_pi, pi_check = TRUE)
gdot_pi |
GDOT Project ID. Can be a single PI or a vector or list of PIs. If more than one PI is provided, each project will get its own row. |
pi_check |
Check if the PI is a valid format. Defaults to TRUE. |
An sf tibble
## Not run: get_geopi_sf(gdot_pi = "0000820") ## End(Not run)
## Not run: get_geopi_sf(gdot_pi = "0000820") ## End(Not run)
Guess filename for download from url
guess_basename(x)
guess_basename(x)
x |
url to guess filename from |
guessed file name
Polite download
polite_download_file( url, destfile = guess_basename(url), ..., quiet = !verbose, mode = "wb", path = "downloads/", user_agent = paste0("polite ", getOption("HTTPUserAgent")), delay = 5, force = FALSE, overwrite = FALSE, verbose = FALSE )
polite_download_file( url, destfile = guess_basename(url), ..., quiet = !verbose, mode = "wb", path = "downloads/", user_agent = paste0("polite ", getOption("HTTPUserAgent")), delay = 5, force = FALSE, overwrite = FALSE, verbose = FALSE )
url |
web address for the file to be downloaded |
destfile |
name of destination file |
... |
additional arguments passed to 'download.file' |
quiet |
default value is inverse of 'verbose' |
mode |
download mode. Default value is "wb" |
path |
path to save. Default path 'downloads/' |
user_agent |
default value 'paste0("polite ", getOption("HTTPUserAgent"))' |
delay |
default value equal 5 |
force |
force re-download of robots.txt |
overwrite |
overwrite downloaded file. Default value FALSE |
verbose |
default value is FALSE |
An (invisible) integer code, 0 for success and non-zero for failure.
Function to get robots.txt is structured form. Memoised
polite_fetch_rtxt(..., user_agent, delay, verbose)
polite_fetch_rtxt(..., user_agent, delay, verbose)
... |
arguments passed to 'robotstxt::robotstxt()' |
user_agent |
user agent string |
delay |
default delay |
verbose |
logical |
function that actually fetches response from the web
polite_read_html( url, ..., delay = 5, user_agent = paste0("polite ", getOption("HTTPUserAgent"), "bot"), force = FALSE, verbose = FALSE )
polite_read_html( url, ..., delay = 5, user_agent = paste0("polite ", getOption("HTTPUserAgent"), "bot"), force = FALSE, verbose = FALSE )
url |
web address for scraping |
... |
arguments passed to 'httr::GET()' |
delay |
scrapting delay. Default 5 sec |
user_agent |
user agent string. Default value 'paste0("polite ", getOption("HTTPUserAgent"), "bot")' |
force |
force re-download of robots.txt |
verbose |
default FALSE |