This retrieves rows in chunks of page_size
. It is most suitable for results
of smaller queries (<100 MB, say). For larger queries, it is better to
export the results to a CSV file stored on google cloud and use the
bq command line tool to download locally.
Usage
bq_table_download(
x,
n_max = Inf,
page_size = NULL,
start_index = 0L,
max_connections = 6L,
quiet = NA,
bigint = c("integer", "integer64", "numeric", "character"),
max_results = deprecated()
)
Arguments
- x
A bq_table
- n_max
Maximum number of results to retrieve. Use
Inf
to retrieve all rows.- page_size
The number of rows requested per chunk. It is recommended to leave this unspecified until you have evidence that the
page_size
selected automatically bybq_table_download()
is problematic.When
page_size = NULL
bigrquery determines a conservative, natural chunk size empirically. If you specify thepage_size
, it is important that each chunk fits on one page, i.e. that the requested row limit is low enough to prevent the API from paginating based on response size.- start_index
Starting row index (zero-based).
- max_connections
Number of maximum simultaneous connections to BigQuery servers.
- quiet
If
FALSE
, displays progress bar; ifTRUE
is silent; ifNA
picks based on whether or not you're in an interactive context.- bigint
The R type that BigQuery's 64-bit integer types should be mapped to. The default is
"integer"
, which returns R'sinteger
type, but results inNA
for values above/below +/- 2147483647."integer64"
returns a bit64::integer64, which allows the full range of 64 bit integers.- max_results
Value
Because data retrieval may generate list-columns and the data.frame
print method can have problems with list-columns, this method returns
a tibble. If you need a data.frame
, coerce the results with
as.data.frame()
.
Complex data
bigrquery will retrieve nested and repeated columns in to list-columns as follows:
Repeated values (arrays) will become a list-column of vectors.
Records will become list-columns of named lists.
Repeated records will become list-columns of data frames.
Larger datasets
In my timings, this code takes around 1 minute per 100 MB of data. If you need to download considerably more than this, I recommend:
Export a
.csv
file to Cloud Storage usingbq_table_save()
.Use the
gsutil
command line utility to download it.Read the csv file into R with
readr::read_csv()
ordata.table::fread()
.
Unfortunately you can not export nested or repeated formats into CSV, and the formats that BigQuery supports (arvn and ndjson) that allow for nested/repeated values, are not well supported in R.