Skip to main content

Querying and Designing the API

Designing the API >

When we run our seedshop.sqrl script, DataSQRL compiles a data pipeline that exposes an API to access the resulting data. We queried the API via GraphiQL in the browser by opening http://localhost:8888//graphiql/. Let's look at those queries in more detail.

info

We will be accessing the generated GraphQL API. If you are new to the GraphQL API standard, take a quick look at the documentation for querying GraphQL. We are working on REST support.

Querying the API

In the first chapter we retrieved the purchase history and spending analysis of the user with id=10 by running the following query.

{
Users (id: 10) {
purchases {
id
totals {
price
saving
}
}
spending {
week
spend
saved
}
}}

At the root of this query, we are querying the Users table for rows that match the given id. We then navigate along the purchases and spending relationships to query for the related rows from the Orders and nested Users.spending tables, respectively.

Defining relationships explicitly in SQRL allows us to query related data easily in the API. We can also use filters in relationships to select the related rows we want to retrieve.

{
Products (category: "acorns") {
name
weight_in_gram
category
volume_10day(country: "US") {
quantity
spend
weight
}
}}

This query retrieves all products from the "acorns" category. Acorns are a favorite among squirrels. We then navigate to the nested volume_10day relationship to the nested table that aggregates orders of that product over the last 10 days and groups them by country. We filter those rows to only retrieve the aggregates for the US.

Relationships allow us to construct complex queries which return all the data we need in a single request. We don't have to stitch our desired result set together by querying multiple tables. That saves you a ton of time and is also a lot faster.

Customizing the API Specification

By default, DataSQRL generates an API specification that exposes query endpoints for all tables defined in the SQRL script and makes all fields in those tables accessible, including relationships to navigate to related tables. In addition, DataSQRL generates field filters for all queries and relationships that give the user of the API the option to filter out rows.

The easiest way to customize the GraphQL API is to start with the default GraphQL schema generated by DataSQRL with the command:

docker run --rm -v $PWD:/build datasqrl/cmd compile seedshop.sqrl -a graphql

First, let's take a look at the API specification that DataSQRL generated in the file schema.graphqls.

type NumOrders {
count: Int!
}

type Products {
id: Int!
name: String!
sizing: String!
weight_in_gram: Int!
type: String!
category: String!
usda_id: Int!
updated: String!
ordered_items(productid: Int, quantity: Int, unit_price: Float, discount: Float, total: Float): [items!]
volume_10day(country: String, quantity: Int, spend: Float, weight: Int): [volume_10day!]
}

type Query {
NumOrders(count: Int): [NumOrders!]
Orders(id: Int, customerid: Int, time: String): [orders!]
Products(id: Int, name: String, sizing: String, weight_in_gram: Int, type: String, category: String, usda_id: Int, updated: String): [Products!]
Users(id: Int, first_name: String, last_name: String, email: String, ip_address: String, country: String, changed_on: Int, timestamp: String): [Users!]
}

type Users {
id: Int!
first_name: String!
last_name: String!
email: String!
ip_address: String
country: String
changed_on: Int!
timestamp: String!
purchases(id: Int, customerid: Int, time: String): [orders!]
spending(week: String, spend: Float, saved: Float): [spending!]
past_purchases(productid: Int, num_orders: Int, total_quantity: Int): [past_purchases!]
}

.... [truncated]

The GraphQL schema has one type for each table we defined in the SQRL script. The types have a field for each column in the associated table, including relationship columns. Tables map to types and columns map to fields based on name.

The Query type contains one query endpoint for each (non-nested) table. The queries and relationships have one argument for each field in the queried or related table. When those arguments are provided by a user querying the API, they translate to a filter on the returned rows from the underlying table. In our query examples above, we filtered Users by id, Products by category, and the volume_10day relationship traversal by country.

If we specify multiple arguments, only those rows are returned that match all filter conditions.

However, most of the filters we don't need in the API, so we are going to remove them and trim down the API. We are also going to limit the query endpoints and remove some fields we don't want to expose in the API.

Rename the schema.graphqls file to seedshop.graphqls and change it to the following.

type Products {
id: Int!
name: String!
sizing: String!
category: String!
volume_10day(country: String): [volume_10day!]
}

type Query {
Products(id: Int, name: String, category: String): [Products!]
Users(id: Int!): Users
}

type Users {
id: Int!
first_name: String!
last_name: String!
country: String
purchases: [orders!]
spending: [spending!]
past_purchases(productid: Int): [past_purchases!]
}

type items {
quantity: Int!
unit_price: Float!
discount: Float!
total: Float!
product: Products!
}

type orders {
id: Int!
time: String!
items: [items!]
totals: totals
}

type past_purchases {
productid: Int!
num_orders: Int!
total_quantity: Int!
}

type spending {
week: String!
spend: Float!
saved: Float!
}

type totals {
price: Float!
saving: Float!
}

type volume_10day {
country: String
quantity: Int!
spend: Float!
weight: Int!
}

Note, that we removed the entire NumOrders table because we don't need it for now. We changed the Users table query endpoint to Users(id: Int!): Users to make a user id required and return only a single user (which may be null if it doesn't exist) instead of a list of users [Users!].

To instruct the DataSQRL compiler to use our custom API specification, we add it as a second argument to the command.

docker run --rm -v $PWD:/build datasqrl/cmd compile seedshop.sqrl seedshop.graphqls --mnt $PWD

To launch the updated pipeline, execute:

(cd build/deploy; docker compose up)

If refresh GraphiQL in the browser, you will see your custom API.

Another neat benefit of customizing and trimming down the API specification is that it allows DataSQRL to generate more efficient data pipelines. DataSQRL automatically removes computations that aren't visible in the API and selects optimal index structures for the database based on the filters that are available in the API.

Pagination

Our current API always returns all filtered results for queries or when navigating relationships. In some cases, those result sets can be very large, and we don't want to transfer huge result sets through the API. Instead, we want to allow consumers of our API to page through the results.

We are going to add limit+offset based pagination to our API. It only requires adding the limit: Int, offset: Int arguments to queries and relationship fields.

type Products {
id: Int!
name: String!
sizing: String!
category: String!
volume_10day(country: String, limit: Int = 20, offset: Int = 0): [volume_10day!]
}

type Query {
Products(id: Int, name: String, category: String, limit: Int!, offset: Int = 0): [Products!]
Users(id: Int!): Users
}

type Users {
id: Int!
first_name: String!
last_name: String!
country: String
purchases(limit: Int = 20, offset: Int = 0): [orders!]
spending: [spending!]
past_purchases(productid: Int): [past_purchases!]
}
.... [truncated]

The limit argument limits the size of the returned result set to the given number. The offset argument moves to the given position in the result set before starting to retrieve records. Both arguments can be made required and given a default value as shown in the example above.

Update the seedshop.graphqls schema with limit+offset pagination as shown above, save the file, and re-run the script. You can now execute the following query.

{
Products (category: "acorns", limit: 2, offset: 2) {
name
category
volume_10day(limit: 2) {
quantity
spend
weight
}
}}

This query limits the number of returned products to 5 starting after position 2 in the complete result set. When you navigate through a relationship, you can filter records and use limit and offset in the same way you would when querying a table at the top level to specify which related records you want to be returned, in what order, and how many of them.

Note, that these arguments are applied locally for each record that is returned. In the query above, volume_10day(limit: 2) means that we are asking for up to 2 results for each product and not 2 total for the entire request.
As we navigate through relationships, we need to keep in mind that result set cardinalities multiply and choose small enough page sizes to avoid huge responses from the server.

Mutations and Inserting Data

Next, we are going to collect product visits through the API in order to improve our recommendation engine with recent user behavior. We are going to capture when a user visits a product page and aggregate those product visits to determine which products a user is interested in.

First, we are going to add a mutation to our GraphQL API schema to capture the product visit event. Add the following to the end of the seedshop.graphqls file:

type Mutation {
ProductVisit(event: VisitEvent!): CreatedProductVisit
}

input VisitEvent {
userid: Int!
productid: Int!
}

type CreatedProductVisit {
_source_time: String!
productid: Int!
userid: Int!
}

We created a mutation (i.e. an API endpoint that accepts data) called ProductVisit that accepts an input of type VisitEvent and returns the type CreatedProductVisit.

When you create mutations, the input type can have arbitrary fields to represent the data you want to capture. The fields of the mutation return type must be a subset of those fields plus the special field _source_time which returns the time when the event was created on the server.

The use the product visits in our SQRL script, we import it like the other data before. Add the following line to the imports in the seedshop.sqrl.

IMPORT seedshop.ProductVisit;

To import the data from a mutation, you use the name of the GraphQL schema file as the package name (i.e. seedshop) and the mutation name as the table name (i.e. ProductVisit).

Now, we can aggregate the product visits over the last 90 days for each user to determine what products they might like:

Users.product_visits := SELECT productid, count(1) as visits
FROM @ JOIN ProductVisit v ON @.id = v.userid
WHERE v._source_time > now() - INTERVAL 90 DAY
GROUP BY productid ORDER BY visits DESC;

Add the table definition after Users.past_purchases and save the SQRL script.

To retrieve product_visits through the API, we add the corresponding type and relationship to the GraphQL schema:


type User {
[existing fields...]
product_visits: [product_visits!]
}

type product_visits {
productid: Int!
visits: Int!
}

And there we have our custom, polished data API in GraphQL that allows us to query and add data. Check out the final GraphQL schema and corresponding SQRL script.

Next Steps

Wonderful, you have completed the 3 essential steps of building a data pipeline with DataSQRL:

  • Writing SQRL scripts
  • Connecting data sources
  • Designing and querying the data API

Now you can go off, build amazing data products, and tell us about it.

If you are eager to continue learning, we have two more optional chapters in this tutorial:

  • Advanced Topics covers additional features of DataSQRL.
  • Deployment shows you how to deploy the data pipeline compiled by DataSQRL.

If you want to learn more about querying the data API from your application or favorite programming language, the reference documentation has an overview. It also covers API design in more detail.