Build data aggregation with StreamX

Data aggregation involves collecting data from various sources and combining them to create new entities, offering a unified view or summary. When composite data parts are spread across multiple systems, managing the data flows can become complex. StreamX addresses this issue by offering a central assembly point and delivering precomputed data to the web server.

In this tutorial we will use a simple implementation of data aggregation with StreamX.

This tutorial covers the following topics:

  • aggregating data from multiple sources

  • page generation using the StreamX Rendering Engine, including:

    • managing page templates

    • managing template data

Prerequisites

To complete this guide, you will need:

Make sure that no other StreamX instance or any other application is using port 8081.

Step 1: Get the sources

Clone the Git repository containing source files for the example:

git clone https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

Our example StreamX Mesh is set up to combine data from several independent sources (products information from PIM, prices from internal system, reviews from FMS) into a new merged entity. The computed data feeds the StreamX Rendering Engine, which generates target pages.

  1. Open the terminal and navigate to build-data-aggregation-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh using command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./streamx-mesh.yml
    -------------------------------------------------------------------

Step 3: Publish template and data

  1. Publish the site/template.html template to the renderers channel with the following command:

    streamx publish -s 'template.bytes=file://site/template.html' renderers template.html
    • -s indicates that an external plain text file is used as the source of the published content

    • renderers is the channel you are publishing the template to

    • template.html is the publish key

  2. The StreamX Rendering Engine requires additional context such as:

    • data that triggers page generation,

    • type of generated output is,

    • names of generated results.

    To provide the generation details, publish the data with the following command:

    streamx publish rendering-contexts pages-rendering-context rendering-contexts/pages-rendering-context.json
  3. Publish the data/product.json data to the data channel using the following command:

    streamx publish -s 'content.bytes=file://data/product.json' data product:1

    Note that the number 1 following the colon represents the id, serving to consolidate entities from several channels.

  4. Open your web browser and navigate to http://localhost:8081/generated/1.html. Verify that the page is accessible, but has no price and no reviews.

Step 4: Update optional data

  1. Publish the data/price.json data to the data channel using the following command:

    streamx publish -s 'content.bytes=file://data/price.json' data price:1
  2. Once again open http://localhost:8081/generated/1.html. Verify that the page contains the price.

  3. Now unpublish the data with price:1 key using the following command:

    streamx unpublish data price:1
  4. Visit http://localhost:8081/generated/1.html again. Confirm that the page generated from product:1 data is published, but its price is not available.

Step 5: Update multivalued data

  1. Publish the data/review_1.json and data/review_2.json data to the data channel using the following commands:

    streamx publish -s 'content.bytes=file://data/review_1.json' data review:1:firstReviewHash
    streamx publish -s 'content.bytes=file://data/review_2.json' data review:1:secondReviewHash

    Refresh http://localhost:8081/generated/1.html and verify that the page now contains two reviews.

  2. Unpublish a review with the review:1:firstReviewHash key using the following command:

    streamx unpublish data review:1:firstReviewHash
  3. Visit http://localhost:8081/generated/1.html again. Confirm that the review generated from review:1:firstReviewHash has disappeared, but the second review is still visible.

Summary

Congratulations! You have learned how to create pages from multiple external sources using the StreamX Rendering Engine.