This version is still in development and is not considered stable yet. For the latest stable version, please use StreamX Guides 1.0.1!

Build data aggregation with StreamX

Data aggregation involves collecting data from various sources and combining them to create new entities, offering a unified view or summary. When composite data parts are spread across multiple systems, managing the data flows can become complex. StreamX addresses this issue by offering a central assembly point and delivering precomputed data to the web server.

In this tutorial we will use a simple implementation of data aggregation with StreamX.

This tutorial covers the following topics:

  • aggregating data from multiple sources

  • page generation by using the StreamX Rendering Engine, including:

    • managing page templates

    • managing template data

Prerequisites

To complete this guide, you will need:

Verify that no other StreamX instance or any other application that uses port 8081 is running.

Step 1: Get the sources

Clone the Git repository containing source files for the example:

git clone https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

Our example StreamX Mesh is set up to combine data from several independent sources (products information from PIM, prices from internal system, reviews from FMS) into a new merged entity. The computed data feeds the StreamX Rendering Engine, which generates target pages.

  1. Open the terminal and go to build-data-aggregation-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh by using the following command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./mesh.yaml
    -------------------------------------------------------------------

Step 3: Publish template and data

  1. Publish the site/template.html template to the renderers channel with the following command:

    streamx publish -s 'template.bytes=file://site/template.html' renderers template.html
    • -s indicates that an external plain text file is the source for the published content

    • renderers is the channel you are publishing the template to

    • template.html is the publish key

  2. The StreamX Rendering Engine requires additional context such as:

    • data that triggers page generation,

    • type of generated output is,

    • names of generated results.

    To provide the generation details, publish the data with the following command:

    streamx publish rendering-contexts pages-rendering-context rendering-contexts/pages-rendering-context.json
  3. Publish the data/product.json data to the data channel with the following command:

    streamx publish -s 'content.bytes=file://data/product.json' data product:1

    The number 1 following the colon represents the id, serving to consolidate entities from several channels.

  4. Open your web browser and go to http://localhost:8081/generated/1.html. Verify that the page is accessible, but has no price and no reviews.

Step 4: Update optional data

  1. Publish the data/price.json data to the data channel with the following command:

    streamx publish -s 'content.bytes=file://data/price.json' data price:1
  2. Once again open http://localhost:8081/generated/1.html. Verify that the page contains the price.

  3. Now unpublish the data with price:1 key with the following command:

    streamx unpublish data price:1
  4. Visit http://localhost:8081/generated/1.html again. Confirm that the page generated from product:1 data is published, but its price is not available.

Step 5: Update multivalued data

  1. Publish the data/review_1.json and data/review_2.json data to the data channel with the following commands:

    streamx publish -s 'content.bytes=file://data/review_1.json' data review:1:firstReviewHash
    streamx publish -s 'content.bytes=file://data/review_2.json' data review:1:secondReviewHash

    Refresh http://localhost:8081/generated/1.html and verify that the page now contains two reviews.

  2. Unpublish a review with the review:1:firstReviewHash key with the following command:

    streamx unpublish data review:1:firstReviewHash
  3. Visit http://localhost:8081/generated/1.html again. Confirm that the review generated from review:1:firstReviewHash has disappeared, but the second review is still visible.

Summary

Congratulations! You have learned how to create pages from multiple external sources by using the StreamX Rendering Engine.