Implementing SQRL Scripts
The processing and logic of our data product is defined in the SQRL script. That's where the action is. We are going to dive deeper into the SQL language variant of DataSQRL by reviewing and extending the SQRL script from the introduction.
When you make it through this chapter, you will know the key concepts of SQRL, be ready to write your own SQRL scripts, and lay down some serious data APIs.
What's in a SQRL Script?
An SQRL script defines tables and relationships between them. Together, they form the data model which is exposed as an API.
Tables and relationships are defined in SQL - with some additions and a little bit of syntactic sugar thrown in there to make your life easier.
Tables
A table is made up of columns (or fields) and rows. A row represents a data record. Columns have a data type and represent an attribute of a record in the table. We often refer to columns as fields in the context of a data record. Records and rows, columns and fields, Darth Vader and Anakin Skywalker are all name pairs that describe the same thing.
Tables are the central concept of SQRL. Tables structure and contain all the data you are building with in an SQRL script. Tables can be exposed as endpoints in the API, so you can query the data in them.
You define tables in SQRL through import or query statements.
Import Table
An important statement adds a table from an external data source to your SQRL script.
In the first chapter we imported the Orders
table.
IMPORT datasqrl.seedshop.Orders;
Orders
is contained in the datasqrl.seedshop
package. When importing tables, we specify the full
path to the table, including the package, so DataSQRL can locate it.
The datasqrl.seedshop
package we are using for this example is downloaded from the DataSQRL repository as a dependency. In the next chapter we will define a custom data source package and table.
Let's add more data to our script by importing the products data from our seedshop:
IMPORT datasqrl.seedshop.Products;
Import statements are at the top of your SQRL script, so you can see all the data you are importing at one glance.
Query Table
Once you have imported tables, you build with the data they contain. You do this by defining new tables that query the data in existing ones.
In the first chapter we defined the Users
table by querying
for all the unique customer ids in the Orders
table:
Users := SELECT DISTINCT customerid AS id FROM Orders;
You can use all the SQL you already know to define new tables in SQRL.
For example, suppose PM is telling us that they want to add a McDonald's inspired
"over X million orders served" banner to the nut shop homepage. \
We define a new NumOrders
table that contains the count:
NumOrders := SELECT COUNT(*) AS count FROM Orders;
The NumOrders
table gets exposed in our data API as an additional endpoint
through which the frontend team can query for the current order count. Run the modified script and execute the following query:
{
NumOrders
{
count
}
}
DataSQRL is still young, and we are actively working on SQL feature parity in SQRL. Take a look at the roadmap for current limitations and what we are working on.
Incremental Table Definition
You can also define tables incrementally by adding new columns to an existing table.
For example, we defined the total
column on the nested Orders.items
table:
Orders.items.total := quantity * unit_price - coalesce(discount,0.0)
The part on the left-hand side of the
assignment operator :=
is the fully specified Orders.items.total
column
we are defining. The part on the right-hand side is the expression that computes the total price of each order item.
This is a localized expression that is evaluated within the context
of the table on the left-hand side - in this case Orders.items
. That's why
we can refer to the quantity
, unit_price
, and discount
columns of the Orders.items
table without any
qualifiers.
Relationships
Relationships relate records within and between tables. Defining relationships has two benefits:
- Relationships can be used in queries and expressions instead of explicit JOINs which simplifies queries and makes them more readable.
- Relationships are exposed in the API allowing consumers to retrieve related records within a single request.
We defined the relationship column purchases
on the Users
table to relate to a user's Orders
:
Users.purchases := JOIN Orders ON Orders.customerid = @.id;
A relationship column is defined as a JOIN between two or more tables using the standard
SQL JOIN ... ON ...
syntax. The starting point of the JOIN is always the table
on the left-hand side, and we can use the special table handle @
to refer to
it in the JOIN expression. In this case, Customers
is our start table and
Orders
is our end table of the relationship definition.
In addition to the JOIN [table] ON [condition]
expression, a relationship
definition can end with an optional ORDER BY
and LIMIT
clause. The order
is used as the default ordering when the relationship is accessed through the API.
The limit specifies the maximum number of related rows that are returned
when traversing the relationship (i.e. the maximum multiplicity).
To order user purchases by time, we change the relationship definition to:
Users.purchases := JOIN Orders ON Orders.customerid = @.id ORDER BY Orders.time;
The relationship is defined as a field on the start table, and we can query it in the API:
{
Users (id: 10) {
purchases {
id
totals {
price
saving
}
}
}}
Relationship columns make the relationships in the data explicit. Adding structure to your data by explicitly defining relationships also clarifies the data itself and how you plan to use it.
Relationships can also be used in joins and expressions, which makes them easier to read and write as we'll see in the following section.
Nested Tables
SQRL supports nested and hierarchical data like JSON through nested tables. A nested table has a parent table and all rows in a nested table are associated with a single parent row in the parent table.
In the nut shop tutorial the Orders
table has a nested Orders.items
table
that contains the item records for each order. When referring to a nested table,
we have to use the fully qualified name of the table which includes the parent.
SELECT * FROM Orders.items
is valid but SELECT * FROM items
is not because
there is no table with that name in the global namespace of the SQRL script.
SQRL automatically adds a parent
relationship on the child table which relates
rows to their parent rows in the parent table. Likewise, SQRL also adds a relationship field with the name of the nested table to the parent table which relates all child records to the parent record. The items
relationship on the Orders
table can be used to query
the item records for a particular order.
Orders.totals := SELECT sum(total) as price,
sum(coalesce(discount,0)) as saving FROM @.items;
This statement defines a new nested table totals
underneath Orders
by aggregating the total price and discount over all items in each order.
The SELECT
query on the right is a localized query that is evaluated within
the context of the Orders
table because it selects from the special table handle @
. Think of a localized query as being executed for each row of table on the left-hand side. We can use the special table handle @
to refer to each row from the parent table.
In this instance, @.items
refers to the items
relationship column of the Orders
table.
Localized queries are a feature of SQRL that make it easy to express nested, grouped, or partitioned operations.
Nested tables are useful when we want to analyze our data in partitions, like order totals and spending by user as defined by the User.spending
table.
Whenever you want to build data analysis by dimension or compute a result set for each record, nested tables are your friend.
For example, let's build the simplest and most effective product recommendation for our seed shop: recommending products users have already purchased, ordered by frequency. In other words, we want to look at all the products purchased for each customer and sort them by frequency of purchase:
Users.past_purchases := SELECT i.productid, count(1) as num_orders,
sum(i.quantity) as total_quantity
FROM @.purchases.items i
GROUP BY i.productid
ORDER BY num_orders DESC, total_quantity DESC;
The table past_purchases
is defined as a nested table underneath Users
.
The SQL query on the right-hand side is a localized query which means it is evaluated in the context of the Users
table. Nested tables are always defined in the context of the parent table, and we can think of the query definition as being applied to each row of the
parent table.
We use the special table handle @
to refer to each row in the parent
Uses
table. The FROM
clause @.purchases.items
chains together
the purchases
relationship on Users
with the items
relationship
on Orders
to retrieve all item records for all order records associated
with a single customer record. Chaining together relationships allows us to
avoid the complexity of multiple JOIN expressions in this query.
Stream vs State Tables
We distinguish between stream and state tables in SQRL.
A stream table consists of immutable (i.e. unchanging) rows of data that
are incrementally added to the table over time and never deleted.
The Orders
table is a stream table because an order does not change once
it has been processed.
In a state table the column values of rows change over time and rows
are added to and deleted from the table. The Users
table is a state
table because it represents our current list of users based on the set of unique customerid
.
Why is this distinction important? Because stream tables have special features in SQRL and are treated differently from state tables. Stream tables give SQRL the ability to react to data and synchronize with arbitrary data sources.
All tables imported from external data sources are stream tables. In case of our imported Products
table, we get a change stream of product updates.
To turn Products
into a state table we overwrite it with the following de-duplication query.
Products := DISTINCT Products ON id ORDER BY updated DESC;
This special SQRL query selects the most recent version (as identified by the updated
timestamp) for each product (as identified by the key id
column).
When we are dealing with static data, there is no real difference between stream and state tables. However, when dealing with streaming data and connecting to data in databases it is important to understand the difference.
A useful way to think about it: a stream table contains events happen and don't change
after the fact whereas state tables represent entities that evolve over time. When you import
a table from a source that you want to treat as an entity, make sure to use a
DISTINCT
query to define the table as an entity table.
Now, that we have our Products
state table defined, let's relate it to the Orders.items
table.
Orders.items.product := JOIN Products ON Products.id = @.productid;
Products.ordered_items := JOIN Orders.items i ON i.productid = @.id;
These two relationship column establish a bidirectional relationship.
Time
When building real-time data products, time is often an important aspect. The two most common time-based transformations on data are grouping data points by time windows and aggregating over recent time intervals. SQRL provides convenience features to address both of those.
Time Windows for Grouping
We computed customer spending and savings profiles by week.
Users.spending := SELECT endOfWeek(p.time) AS week,
sum(t.price) AS spend, sum(t.saving) AS saved
FROM @.purchases p JOIN p.totals t
GROUP BY week ORDER BY week DESC;
This defines a nested table which aggregates over the orders for each user. SQRL has a built-in library of time functions that compute time windows of various durations.
We can then group on those windows in order to compute aggregates across non-overlapping time intervals. In this statement, we bucket order records into week-long intervals based on the order date to sum up the total and discounts for each week.
Time Slices for Recency
Another common time analysis is by recent time slice, i.e. you want to analyze all records that are younger than some amount of time.
For our seed shop, we want to analyze the order volume for each product over the last 10 days.
Products.volume_10day := SELECT sum(i.quantity) as quantity,
sum(i.total) as spend, sum(i.quantity * @.weight_in_gram) as weight
FROM @ JOIN @.ordered_items i JOIN i.parent o
WHERE o.time > now() - INTERVAL 10 DAY;
This statement defines the nested volume_10day
table similar to previously defined aggregations. The difference here is that we filter out orders that are older than 10 days by using the special now()
time function to refer to the current point in time.
Note, that this query will return an empty result set for now. We are going to fix that in the next chapter.
And that, my friend, is a pretty good start for our seed shop data API.
Next Steps
We've built a complete e-commerce data product with customer analysis, recommendation engine, and business intelligence. Good work 💪! Take a look at the final SQRL script that includes the changes and additions we discussed in this chapter. You've learned enough about SQRL to start building data pipelines on your own.
In the next chapter, we are going to define our own data source and looks at imports in more detail.
We covered many aspects of SQRL in this document. If you want to explore SQRL in more detail, take a look at the SQRL reference documentation, which provides detailed explanations of key SQRL concepts like tables, relationships, stream tables, and time.