Set up sitemap generation with StreamX

In this tutorial, you will set up a StreamX Mesh that is capable of automatically creating and managing a sitemap.

Website sitemap files are a structured way of describing relevant pages, assets and their relationships to search engine crawlers. For non-trivial setups, creating sitemap files when multiple source systems are involved quickly becomes a challenge. StreamX’s inherent support for aggregating data from multiple, heterogeneous source systems can dramatically simplify sitemap generation, even for complex setups.

Prerequisites

To complete this guide, you will need:

Make sure that no other StreamX instance or any other application is using ports 8080 and 8081.

Step 1: Get the source files

Clone the Git repository containing source files for the example:

git clone https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

The StreamX Mesh for this tutorial consists of 3 services that take care of generating and serving sitemap files as HTML pages come and go.

  1. Open the terminal and navigate to set-up-sitemap-generation-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh using command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./streamx-mesh.yml
    -------------------------------------------------------------------

Step 3: Publish content

  1. Publish the index.html page for a hypothetical site by using the following command:

    streamx publish -s 'content.bytes=file://site/index.html' pages index.html
  2. Open your web browser and go to http://localhost:8081. Verify that the page index.html is accessible.

  3. Then go to http://localhost:8081/sitemap.xml in your web browser. Verify that the sitemap contains an entry for the index.html page.

    There might be a few seconds delay in sitemap generation

  4. Publish another example page article.html by running the following command:

    streamx publish -s 'content.bytes=file://site/article.html' pages article.html
  5. Publish sample pages for another hypothetical sub-site for blogs by executing the following 2 commands:

    streamx publish -s 'content.bytes=file://blog/blog.html' pages blog.html
    streamx publish -s 'content.bytes=file://blog/entry.html' pages blog/entry.html
  6. Visit http://localhost:8081/sitemap.xml again and verify that the sitemap now contains all 4 entries.

Step 4: Unpublish content

  1. Unpublish the page blog/entry.html by using the following command:

    streamx unpublish pages blog/entry.html
  2. Visit http://localhost:8081/sitemap.xml again. Verify that the entry for http://localhost:8081/blog/entry.html has disappeared from the sitemap.

Summary

Congratulations! You have just confirmed that sitemaps are automatically re-generated whenever you add or remove pages from StreamX. This automated process ensures that your website’s sitemap is always up-to-date, simplifying SEO optimization and enhancing search engine discoverability.