This collect method is specialised for BigQuery tables, generating the
SQL from your dplyr commands, then calling bq_project_query()
or bq_dataset_query()
to run the query, then bq_table_download()
to download the results. Thus the arguments are a combination of the
arguments to dplyr::collect()
, bq_project_query()
/bq_dataset_query()
,
and bq_table_download()
.
Usage
collect.tbl_BigQueryConnection(
x,
...,
n = Inf,
api = c("json", "arrow"),
page_size = NULL,
max_connections = 6L
)
Arguments
- x
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
- ...
Other arguments passed on to
bq_project_query()
/bq_project_query()
- n
Maximum number of results to retrieve. The default,
Inf
, will retrieve all rows.- api
Which API to use? The
"json"
API works where ever bigrquery does, but is slow and can require fiddling with thepage_size
parameter. The"arrow"
API is faster and more reliable, but only works if you have also installed the bigrquerystorage package.Because the
"arrow"
API is so much faster, it will be used automatically if the bigrquerystorage package is installed.- page_size
(JSON only) The number of rows requested per chunk. It is recommended to leave this unspecified until you have evidence that the
page_size
selected automatically bybq_table_download()
is problematic.When
page_size = NULL
bigrquery determines a conservative, natural chunk size empirically. If you specify thepage_size
, it is important that each chunk fits on one page, i.e. that the requested row limit is low enough to prevent the API from paginating based on response size.- max_connections
(JSON only) Number of maximum simultaneous connections to BigQuery servers.