Set up sitemap generation with StreamX
In this tutorial, you will set up a StreamX Mesh that is capable of automatically creating and managing a sitemap.
Website sitemap files are a structured way of describing relevant pages, assets and their relationships to search engine crawlers. For non-trivial setups, creating sitemap files when multiple source systems are involved quickly becomes a challenge. StreamX’s inherent support for aggregating data from multiple, heterogeneous source systems can dramatically simplify sitemap generation, even for complex setups.
Prerequisites
To complete this guide, you will need:
-
Roughly 5 minutes
-
A web browser of your choice
Verify that no other StreamX instance or any other application that uses ports 8080 and 8081 is running. |
Step 1: Get the source files
Clone the Git repository containing source files for the example:
git clone -b 1.0.1 https://github.com/streamx-dev/streamx-docs-resources.git
Step 2: Run the StreamX Mesh
The StreamX Mesh for this tutorial consists of 3 services that take care of generating and serving sitemap files while HTML pages come and go.
-
Open the terminal and go to
set-up-sitemap-generation-tutorial
inside the cloned project directory. -
Run the StreamX Mesh by using the following command:
streamx run
-
Wait for the following output:
------------------------------------------------------------------- STREAMX IS READY! ------------------------------------------------------------------- ... ------------------------------------------------------------------- Network ID: ... Mesh configuration file: ./mesh.yaml -------------------------------------------------------------------
Step 3: Publish content
-
Publish the
index.html
page for a hypothetical site by using the following command:streamx publish -s 'content.bytes=file://site/index.html' pages index.html
-
Open your web browser and go to http://localhost:8081. Verify that the page
index.html
is accessible. -
Then go to http://localhost:8081/sitemap.xml in your web browser. Verify that the sitemap contains an entry for the
index.html
page.There might be a few seconds delay in sitemap generation
-
Publish another example page
article.html
by running the following command:streamx publish -s 'content.bytes=file://site/article.html' pages article.html
-
Publish sample pages for another hypothetical sub-site for blogs by executing the following 2 commands:
streamx publish -s 'content.bytes=file://blog/blog.html' pages blog.html
streamx publish -s 'content.bytes=file://blog/entry.html' pages blog/entry.html
-
Visit http://localhost:8081/sitemap.xml again and verify that the sitemap now contains all 4 entries.
Step 4: Unpublish content
-
Unpublish the page
blog/entry.html
by using the following command:streamx unpublish pages blog/entry.html
-
Visit http://localhost:8081/sitemap.xml again. Verify that the entry for
http://localhost:8081/blog/entry.html
has disappeared from the sitemap.
Summary
Congratulations! You have just confirmed that sitemaps are automatically re-generated whenever you add or remove pages from StreamX. This automated process ensures that your website’s sitemap is always up-to-date, simplifying SEO optimization and enhancing search engine discoverability.