Query the Discovery API
The Discovery API supports ad-hoc queries and integrations. If you are new to the API, refer to About the Discovery API for an introduction.
Use the Discovery API to evaluate data pipeline health and project state across runs or at a moment in time. dbt Labs provide a GraphQL explorer for this API, enabling you to run queries and browse the schema.
Since GraphQL describes the data in the API, the schema displayed in the GraphQL explorer accurately represents the graph and fields available to query.
Prerequisites
- dbt Cloud multi-tenant or single tenant account
- You must be on a Team or Enterprise plan
- Your projects must be on dbt version 1.0 or later. Refer to Upgrade dbt version in Cloud to upgrade.
Authorization
Currently, authorization of requests takes place using a service token. dbt Cloud admin users can generate a Metadata Only service token that is authorized to execute a specific query against the Discovery API.
Once you've created a token, you can use it in the Authorization header of requests to the dbt Cloud Discovery API. Be sure to include the Token prefix in the Authorization header, or the request will fail with a 401 Unauthorized
error. Note that Bearer
can be used instead of Token
in the Authorization header. Both syntaxes are equivalent.
Access the Discovery API
-
Create a service account token to authorize requests. dbt Cloud Admin users can generate a Metadata Only service token, which can be used to execute a specific query against the Discovery API to authorize requests.
-
Find the API URL to use from the Discovery API endpoints table.
-
For specific query points, refer to the schema documentation.
Run queries using HTTP requests
You can run queries by sending a POST
request to the Discovery API, making sure to replace:
-
YOUR_API_URL
with the appropriate Discovery API endpoint for your region and plan. -
YOUR_TOKEN
in the Authorization header with your actual API token. Be sure to include the Token prefix. -
QUERY_BODY
with a GraphQL query, for example{ "query": "<query text>", "variables": "<variables in json>" }
-
VARIABLES
with a dictionary of your GraphQL query variables, such as a job ID or a filter. -
ENDPOINT
with the endpoint you're querying, such as environment.curl 'YOUR_API_URL' \
-H 'authorization: Bearer YOUR_TOKEN' \
-H 'content-type: application/json'
-X POST
--data QUERY_BODY
Python example:
response = requests.post(
'YOUR_API_URL',
headers={"authorization": "Bearer "+YOUR_TOKEN, "content-type": "application/json"},
json={"query": QUERY_BODY, "variables": VARIABLES}
)
metadata = response.json()['data'][ENDPOINT]
Every query will require an environment ID or job ID. You can get the ID from a dbt Cloud URL or using the Admin API.
There are several illustrative example queries on this page. For more examples, refer to Use cases and examples for the Discovery API.
Discovery API endpoints
The following are the endpoints for accessing the Discovery API. Use the one that's appropriate for your region and plan.
Deployment type | Discovery API URL |
---|---|
North America multi-tenant | https://metadata.cloud.getdbt.com/graphql |
EMEA multi-tenant | https://metadata.emea.dbt.com/graphql |
APAC multi-tenant | https://metadata.au.dbt.com/graphql |
Multi-cell | https://YOUR_ACCOUNT_PREFIX.metadata.REGION.dbt.com/graphql Replace YOUR_ACCOUNT_PREFIX with your specific account identifier and REGION with your location, which could be us1.dbt.com . |
Single-tenant | https://metadata.YOUR_ACCESS_URL/graphql Replace YOUR_ACCESS_URL with your specific account prefix with the appropriate Access URL for your region and plan. |
Reasonable use
Discovery (GraphQL) API usage is subject to request rate and response size limits to maintain the performance and stability of the metadata platform and prevent abuse.
Job-level endpoints are subject to query complexity limits. Nested nodes (like parents), code (like rawCode), and catalog columns are considered as most complex. Overly complex queries should be broken up into separate queries with only necessary fields included. dbt Labs recommends using the environment endpoint instead for most use cases to get the latest descriptive and result metadata for a dbt Cloud project.
Retention limits
You can use the Discovery API to query data from the previous three months. For example, if today was April 1st, you could query data back to January 1st.
Run queries with the GraphQL explorer
You can run ad-hoc queries directly in the GraphQL API explorer and use the document explorer on the left-hand side to see all possible nodes and fields.
Refer to the Apollo explorer documentation for setup and authorization info.
-
Access the GraphQL API explorer and select fields you want to query.
-
Select Variables at the bottom of the explorer and replace any
null
fields with your unique values. -
Authenticate using Bearer auth with
YOUR_TOKEN
. Select Headers at the bottom of the explorer and select +New header. -
Select Authorization in the header key dropdown list and enter your Bearer auth token in the value field. Remember to include the Token prefix. Your header key should be in this format:
{"Authorization": "Bearer <YOUR_TOKEN>}
.
- Run your query by clicking the blue query button in the top right of the Operation editor (to the right of the query). You should see a successful query response on the right side of the explorer.
Fragments
Use the ... on
notation to query across lineage and retrieve results from specific node types.
query ($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
applied {
models(first: $first, filter: { uniqueIds: "MODEL.PROJECT.MODEL_NAME" }) {
edges {
node {
name
ancestors(types: [Model, Source, Seed, Snapshot]) {
... on ModelAppliedStateNestedNode {
name
resourceType
materializedType
executionInfo {
executeCompletedAt
}
}
... on SourceAppliedStateNestedNode {
sourceName
name
resourceType
freshness {
maxLoadedAt
}
}
... on SnapshotAppliedStateNestedNode {
name
resourceType
executionInfo {
executeCompletedAt
}
}
... on SeedAppliedStateNestedNode {
name
resourceType
executionInfo {
executeCompletedAt
}
}
}
}
}
}
}
}
}
Pagination
Querying large datasets can impact performance on multiple functions in the API pipeline. Pagination eases the burden by returning smaller data sets one page at a time. This is useful for returning a particular portion of the dataset or the entire dataset piece-by-piece to enhance performance. dbt Cloud utilizes cursor-based pagination, which makes it easy to return pages of constantly changing data.
Use the PageInfo
object to return information about the page. The available fields are:
startCursor
string type — Corresponds to the firstnode
in theedge
.endCursor
string type — Corresponds to the lastnode
in theedge
.hasNextPage
boolean type — Whether or not there are morenodes
after the returned results.
There are connection variables available when making the query:
first
integer type — Returns the first nnodes
for each page, up to 500.after
string type — Sets the cursor to retrievenodes
after. It's best practice to set theafter
variable with the object ID defined in theendCursor
of the previous page.
Below is an example that returns the first
500 models after
the specified Object ID in the variables. The PageInfo
object returns where the object ID where the cursor starts, where it ends, and whether there is a next page.
Below is a code example of the PageInfo
object:
pageInfo {
startCursor
endCursor
hasNextPage
}
totalCount # Total number of records across all pages
Filters
Filtering helps to narrow down the results of an API query. If you want to query and return only models and tests that are failing or find models that are taking too long to run, you can fetch execution details such as executionTime
, runElapsedTime
, or status
. This helps data teams monitor the performance of their models, identify bottlenecks, and optimize the overall data pipeline.
Below is an example that filters for results of models that have succeeded on their lastRunStatus
:
Below is an example that filters for models that have an error on their last run and tests that have failed:
query ModelsAndTests($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
applied {
models(first: $first, filter: { lastRunStatus: error }) {
edges {
node {
name
executionInfo {
lastRunId
}
}
}
}
tests(first: $first, filter: { status: "fail" }) {
edges {
node {
name
executionInfo {
lastRunId
}
}
}
}
}
}
}