BigQuery¶

Cloudbox emulates the BigQuery REST API (v2) using DuckDB as the backend. Standard SQL queries, streaming inserts, and dataset/table management all work through the google-cloud-bigquery Python SDK without modification.

Connection¶

Port: 9050 (override with CLOUDBOX_BIGQUERY_PORT)

from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery

client = bigquery.Client(
    project="local-project",
    credentials=AnonymousCredentials(),
    client_options={"api_endpoint": "http://localhost:9050"},
)

Datasets¶

Create dataset¶

POST /bigquery/v2/projects/{project}/datasets

{
  "datasetReference": { "datasetId": "my_dataset" },
  "location": "US",
  "labels": { "env": "local" }
}

Creates the dataset and the corresponding DuckDB schema. Returns the dataset resource. 409 if it already exists.

Get dataset¶

GET /bigquery/v2/projects/{project}/datasets/{dataset_id}

Returns the dataset resource. 404 if not found.

List datasets¶

GET /bigquery/v2/projects/{project}/datasets

Returns { "kind": "bigquery#datasetList", "datasets": [...] }.

Delete dataset¶

DELETE /bigquery/v2/projects/{project}/datasets/{dataset_id}

Drops the DuckDB schema and all its tables. 204 on success.

Tables¶

Create table¶

POST /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables

{
  "tableReference": { "tableId": "my_table" },
  "schema": {
    "fields": [
      { "name": "id",    "type": "INTEGER", "mode": "REQUIRED" },
      { "name": "name",  "type": "STRING",  "mode": "NULLABLE" },
      { "name": "score", "type": "FLOAT",   "mode": "NULLABLE" }
    ]
  }
}

Creates the DuckDB table. Supported field types: STRING, INTEGER, INT64, FLOAT, FLOAT64, BOOLEAN, BOOL, BYTES, DATE, TIME, DATETIME, TIMESTAMP, NUMERIC, BIGNUMERIC, JSON, RECORD (nested). Returns the table resource.

Get table¶

GET /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables/{table_id}

Returns the table resource including schema. 404 if not found.

List tables¶

GET /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables

Returns { "kind": "bigquery#tableList", "tables": [...] }.

Update / patch table¶

PATCH /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables/{table_id}
PUT   /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables/{table_id}

Updates table metadata (description, labels). Schema updates (adding/removing columns) are also accepted and applied to the underlying DuckDB table.

Delete table¶

DELETE /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables/{table_id}

Drops the DuckDB table. 204 on success.

Running queries¶

Asynchronous job¶

POST /bigquery/v2/projects/{project}/jobs

{
  "configuration": {
    "query": {
      "query": "SELECT id, name FROM `local-project.my_dataset.my_table` WHERE score > 0.5",
      "useLegacySql": false,
      "queryParameters": [],
      "parameterMode": "NONE"
    }
  },
  "jobReference": { "jobId": "job-abc123" }
}

All jobs complete synchronously — there is no async execution. The response immediately contains the results. jobId is optional (auto-generated if omitted).

Get job results¶

GET /bigquery/v2/projects/{project}/queries/{job_id}?maxResults=1000&pageToken=

Returns the query results for a previously submitted job:

{
  "kind": "bigquery#queryResponse",
  "jobComplete": true,
  "schema": { "fields": [...] },
  "rows": [{ "f": [{ "v": "1" }, { "v": "Alice" }] }],
  "totalRows": "1"
}

Synchronous query¶

POST /bigquery/v2/projects/{project}/queries

{
  "query": "SELECT COUNT(*) FROM `local-project.my_dataset.my_table`",
  "useLegacySql": false
}

Executes the query and returns results in a single response — no separate job polling needed.

Get job status¶

GET /bigquery/v2/projects/{project}/jobs/{job_id}

Returns the job resource. Since all jobs are synchronous, status.state is always DONE.

Cancel job¶

POST /bigquery/v2/projects/{project}/jobs/{job_id}/cancel

No-op — returns the job resource unchanged (jobs are already complete).

Streaming inserts¶

POST /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables/{table_id}/insertAll

{
  "rows": [
    { "insertId": "row-1", "json": { "id": 1, "name": "Alice", "score": 0.9 } },
    { "insertId": "row-2", "json": { "id": 2, "name": "Bob",   "score": 0.7 } }
  ]
}

Inserts rows directly into the DuckDB table. insertId is optional. Returns:

{ "kind": "bigquery#tableDataInsertAllResponse", "insertErrors": [] }

Reading table rows¶

GET /bigquery/v2/projects/{project}/datasets/{dataset_id}/tables/{table_id}/data
    ?maxResults=1000&pageToken=

Lists rows from a table without running a query job. Returns:

{
  "kind": "bigquery#tableDataList",
  "totalRows": "100",
  "schema": { "fields": [...] },
  "rows": [{ "f": [...] }],
  "pageToken": "1000"
}

SQL dialect and identifier syntax¶

The emulator uses DuckDB's SQL dialect. BigQuery identifier syntax is rewritten automatically at query time:

BigQuery syntax	DuckDB equivalent
`project.dataset.table`	`"project"."dataset"."table"`
`dataset.table`	`"dataset"."table"`
`table`	`"table"`

Legacy SQL (useLegacySql: true) is accepted but falls through to DuckDB standard SQL without transformation.

Parameterized queries¶

Named and positional parameters are supported:

{
  "query": "SELECT * FROM `my_dataset.my_table` WHERE name = @username",
  "parameterMode": "NAMED",
  "queryParameters": [
    {
      "name": "username",
      "parameterType": { "type": "STRING" },
      "parameterValue": { "value": "Alice" }
    }
  ]
}

Supported parameter types: STRING, INT64, FLOAT64, BOOL, DATE, DATETIME, TIMESTAMP.

DML statements¶

INSERT, UPDATE, DELETE, and MERGE-style operations are supported via the standard query endpoints. After execution, numDmlAffectedRows is reported in the job status:

{
  "statistics": {
    "query": { "numDmlAffectedRows": "5" }
  }
}

Multi-statement scripts¶

Multiple SQL statements separated by ; are supported. Statements are executed sequentially in a single DuckDB connection. The job result contains:

Rows and schema from the last SELECT statement (if any).
The sum of numDmlAffectedRows across all DML statements when no SELECT is present.

CREATE TABLE my_dataset.events (id INTEGER, name STRING);
INSERT INTO my_dataset.events VALUES (1, 'click'), (2, 'view');
SELECT id, name FROM my_dataset.events ORDER BY id

Limitations: DECLARE/SET variable statements and BEGIN…END blocks are not supported.

Known limitations¶

Feature	Notes
Partitioned tables	Partition metadata accepted but query pruning not implemented
Clustered tables	Clustering config accepted but not enforced
External tables	Not supported
BigQuery ML (`CREATE MODEL`, `ML.PREDICT`)	Not implemented
Materialized views	Not implemented
Jobs.insert with load/extract config	Only `query` configuration is supported
Dataset / table IAM	Not enforced

Examples¶

uv run python examples/bigquery/query.py
uv run python examples/bigquery/streaming_insert.py