This version is still in development and is not considered stable yet. For the latest stable version, please use StreamX Guides 1.0.0!

Set up sitemap generation with StreamX

In this tutorial, you will set up a StreamX Mesh that is capable of automatically creating and managing a sitemap.

Website sitemap files are a structured way of describing relevant pages, assets and their relationships to search engine crawlers. For non-trivial setups, creating sitemap files when multiple source systems are involved quickly becomes a challenge. StreamX’s inherent support for aggregating data from multiple, heterogeneous source systems can dramatically simplify sitemap generation, even for complex setups.

Prerequisites

To complete this guide, you will need:

Verify that no other StreamX instance or any other application that uses ports 8080 and 8081 is running.

Step 1: Get the source files

Clone the Git repository containing source files for the example:

git clone https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

The StreamX Mesh for this tutorial consists of 3 services that take care of generating and serving sitemap files while HTML pages come and go.

  1. Open the terminal and go to set-up-sitemap-generation-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh by using the following command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./mesh.yaml
    -------------------------------------------------------------------

Step 3: Publish content

  1. Publish the index.html page for a hypothetical site by using the following command:

    streamx publish -s 'content.bytes=file://site/index.html' pages index.html
  2. Open your web browser and go to http://localhost:8081. Verify that the page index.html is accessible.

  3. Then go to http://localhost:8081/sitemap.xml in your web browser. Verify that the sitemap contains an entry for the index.html page.

    There might be a few seconds delay in sitemap generation

  4. Publish another example page article.html by running the following command:

    streamx publish -s 'content.bytes=file://site/article.html' pages article.html
  5. Publish sample pages for another hypothetical sub-site for blogs by executing the following 2 commands:

    streamx publish -s 'content.bytes=file://blog/blog.html' pages blog.html
    streamx publish -s 'content.bytes=file://blog/entry.html' pages blog/entry.html
  6. Visit http://localhost:8081/sitemap.xml again and verify that the sitemap now contains all 4 entries.

Step 4: Unpublish content

  1. Unpublish the page blog/entry.html by using the following command:

    streamx unpublish pages blog/entry.html
  2. Visit http://localhost:8081/sitemap.xml again. Verify that the entry for http://localhost:8081/blog/entry.html has disappeared from the sitemap.

Summary

Congratulations! You have just confirmed that sitemaps are automatically re-generated whenever you add or remove pages from StreamX. This automated process ensures that your website’s sitemap is always up-to-date, simplifying SEO optimization and enhancing search engine discoverability.