Set up sitemap generation with StreamX and AEM

AEM provides built-in support for generating sitemaps. This makes the task straightforward if the platform has full control over the website’s structure. But, when you start dealing with environments that include multiple sources, markets, and projects, things get complicated fast. StreamX event-streaming service mesh has been designed to reduce integration complexity. Its sitemap generation feature zeroes in on streamlining integration and boosting search engine indexing accuracy.

In this tutorial we will set up StreamX sitemap generation with content originated from AEM.

Prerequisites

To complete this guide, you will need:

  • Roughly 15 minutes

  • StreamX CLI installed

  • Git installed

  • jq installed

  • A running instance of AEM author 6.5 instance with at least Service Pack 6.5.17 installed, and also with the out-of-the-box We.Retail sample application and content.

If you have an author instance that wasn’t started with the nosamplecontent run mode, you can safely assume that you have this installed. You can validate the installation of We.Retail by visiting We.Retail landing page in the English master. If it returns a 404, then you don’t have a proper We.Retail installation. In this case, please use another (for example a new & fresh) AEM author instance.
Ensure no other StreamX instance or any other application is occupying port 8081.

Step 1: Get the source files

Clone the Git repository containing source files for the example:

git clone -b 1.0.1 https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Install StreamX OSGi bundles and configuration

To integrate AEM with StreamX you must install StreamX OSGi bundle with all the necessary OSGi dependencies and configuration. Installed and configured OSGi bundle enable feeding StreamX Mesh with AEM sourced data. Follow the steps below to install the package:

  1. Visit AEM author - CRX Package Manager

  2. Upload and install aem-with-streamx-tutorials/streamx-aem.all-1.0.2.zip from the cloned project repository

Step 3: Run the StreamX Mesh

  1. Open the terminal and go to generate-sitemap-aem-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh by using the following command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./mesh.yaml
    -------------------------------------------------------------------

Step 4: Publish content from AEM

  1. Visit http://localhost:8081/sitemap.xml. Confirm that resource is not available. That’s because we haven’t ingested any content so far.

  2. Visit AEM author - Sites admin page - We.Retail United States.

  3. Select the /content/we-retail/us page and click Manage Publication from the top menu.

  4. On the next screen (Options) leave the defaults and proceed with Next

    1. Action : Publish

    2. Scheduling : Now

  5. On the next screen (Scope), click on the thumbnail of the /content/we-retail/us item which will reveal the Include Children option.

  6. Click the Include Children item and uncheck each checkbox, then confirm your changes by clicking on Add.

  7. Finally, click on Publish.

  8. Wait for AEM author to complete the publication.

After the publication is done, visit the http://localhost:8081/sitemap.xml again. Now the content should contain all the pages we’ve just published:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://localhost:8081/published/we-retail/us/en.html</loc>
    </url>
    <url>
        <loc>http://localhost:8081/published/we-retail/us/en/about-us.html</loc>
    </url>
    ...
    <url>
        <loc>http://localhost:8081/published/we-retail/us/en/women.html</loc>
    </url>
    <url>
        <loc>http://localhost:8081/published/we-retail/us/es.html</loc>
    </url>
</urlset>

Summary

Congratulations! You have set up the StreamX sitemap generation with AEM used for the datasource.