Set up search with StreamX

Site search is a common requirement nowadays. However, maintaining search capabilities becomes more difficult when, in addition to authorable content, the search must include data from other systems such as PIM or e-commerce.

With StreamX, you can automate the integration and indexing of site content regardless of source, reducing the need for manual maintenance of search capabilities. As a result, users experience more reliable and timely search results, enabling them to find the latest information quickly and accurately.

In this tutorial, we will configure StreamX Mesh to automatically manage search for a website.

Prerequisites

To complete this guide, you will need:

Verify that no other StreamX instance or any other application is using ports 8080, 8081 and 8082.

Step 1: Get the source files

Clone the Git repository containing source files for the example:

git clone https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

We are about to run a Mesh that contains a service extracting indexable data from the ingested data. This indexable data is passed to a delivery service that is responsible for feeding the search service (in this tutorial, the delivery service communicates with Opensearch started inside the same StreamX Mesh).

  1. Open the terminal and navigate to set-up-search-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh using command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./mesh.yml
    -------------------------------------------------------------------

Step 3: Feed StreamX with content for indexing

  1. Publish the index.html page by using the following command:

    streamx publish -s 'content.bytes=file://site/index.html' pages index.html
  2. Open your web browser and go to http://localhost:8081 to verify accessibility of the index.html page.

  3. Then go to http://localhost:8082/search/byQuery?query=greetings in your web browser. Verify that the search results include an entry for the index.html page.

  4. Publish the blog/entry.html page by using the following command:

    streamx publish -s 'content.bytes=file://site/blog/entry.html' pages blog/entry.html
  5. Visit http://localhost:8082/search/byPath?path=blog/entry.html in your web browser. Ensure that the search results contain an entry for the blog/entry.html page.

  6. Similarly, visit http://localhost:8082/search/byQuery?query=blog in your web browser and verify that the phrase search results include an entry for the blog/entry.html page.

Step 4: Unpublish the content and observe the search update

  1. Unpublish the blog/entry.html page by using the following command:

    streamx unpublish pages blog/entry.html
  2. Visit http://localhost:8082/search/byPath?path=blog/entry.html in your web browser. Observe that the entry for blog/entry.html disappears.

  3. Similarly, visit http://localhost:8082/search/byQuery?query=blog in your web browser. Notice that the entry for blog/entry.html also disappears.

Summary

Congratulations! You have learned how to set up site search with StreamX to streamline dynamic content management.