Skip to main content

Deploying the Data Pipeline

DataSQRL Deployment >

We have built a customer 360 application and recommendation engine, now it's time to take it to production.

Run the Compiler

Execute the following command to compile our seedshop script and API specification:

docker run --rm -v $PWD:/build datasqrl/cmd compile seedshop.sqrl seedshop.graphqls  --mnt $PWD
DataSQRL Compilation >

The compiler takes a SQRL script, API specification, and optional package configuration as arguments and produces an executable for each component of our data pipeline:

  • topics and schemas for Kafka
  • a Flink jar with all dependencies
  • a physical data model and index definitions for the database
  • an API model that maps API endpoints to database queries and Kafka topics

Deploy Executables

You can find all the executables in the build/deploy folder. It also contains a docker-compose template docker-compose.yml for starting all the components of the data pipeline and running the executables.

> cd build/deploy
> docker compose up

The docker-compose template starts a Kafka cluster, Flink cluster, and Postgres database. It initializes the database with the compiled database schema and index structures. It installs the topics in the Kafka cluster. It submits the Flink jar to the Flink cluster. Finally, it launches a server instance with the API model.

To verify that everything is working correctly, you can execute GraphQL queries against the API through GraphiQL running at http://localhost:8888//graphiql/.

Customize Deployment

You can use the provided docker-compose template for your deployments and customize it to suit your needs.

You can deploy the deployment artifacts in any way you'd like. Because DataSQRL compiles to existing data technologies, the only limit is what those underlying technologies support.

For example, you can run the API server in Kubernetes, use a managed database service for the database, or submit the Flink jar to an existing Flink cluster.

You tell DataSQRL where and how you want to deploy the compiled data pipeline by configuring the individual components in the package configuration file package.json. The package.json file should be in the same directory as your SQRL script.

The package configuration specifies what engines DataSQRL compiles to. DataSQRL calls the data technologies that execute the components of a data pipeline "engines". For example, DataSQRL supports Apache Flink as a stream engine, Apache Kafka as a log engine, Postgres as a database engine, and Vert.x as a server engine.

Check out all the engines that DataSQRL supports and how to configure them in the package configuration.

That concludes our introductory tutorial! Great job and enjoy building with data(sqrl)!

Next Steps

  • For more information, refer to the reference documentation for building and deploying with DataSQRL as well as the DataSQRL command documentation for all the command line options.
  • Wanna know how DataSQRL compiles efficient data pipelines? The DataSQRL optimizer uses a cost model to divide up data processing among the components and generate the most efficient executables. You can provide hints when the optimizer makes the wrong choice.