Big Data

Offload Real-Time Analytics from MongoDB Using Elasticsearch

Introduction

Organization nowadays looks for a tool to store, search and analyze the data quickly and in real-time and Elasticsearch comes as a solution combining all these features allowing users to retrieve data records in any form and analyze a massive amount of data in a very short time. In this article we will be introducing you to Elasticsearch and its benefits, we will learn what is Cross Cluster Elasticsearch Replication and how to set it up, and we will also familiarize you with Replication Storage.

What is Elasticsearch?

Elasticsearch is an open-source search and analytics engine which allows you to store, search and analyze huge volumes of data in real-time. It is a highly scalable, enterprise-level solution built on Apace Lucene and developed in Java. NRT, Cluster, Node, Index, Document, Shards & Replicas are some of the basic concepts of Elasticsearch. Instead of searching the text directly, it searches an index which helps it in achieving fast search responses. It can be used as a search and analytics engine for various types of data like numerical, textual, geospatial, unstructured, and structured.

Benefits of Elasticsearch

Let us have a look at some of the benefits of Elasticsearch:-

  • Enhanced Performance – Elasticsearch is able to perform fast searches compared to typical SQL databases as it uses distributed inverted indices and thus helps in enhancing the performance. 
  • Distributed Architecture – Elasticsearch comes up with a distributed architecture that helps to handle large volumes of data.
  • Scalability – Elasticsearch is based on a distributed architecture and thus can be scaled up to thousands of servers and store huge volumes of data. 
  • Compatibility – Elasticsearch is developed in Java and hence it is compatible to run on every platform.
  • Schema Free – Elasticsearch is schema-free, and hence it doesn’t require any data definition and uses some defaults unless you specify the data type.
  • Data Record – Elasticsearch records all the changes made in transaction logs on multiple nodes in a cluster thus preventing the chances of data loss.

What is Cross Cluster Replication in Elasticsearch? 

Cross Cluster Elasticsearch Replication feature in Elasticsearch helps to replicate the data across data centers, it can be used to ensure Data Recovery and maintain High Availability. Some of the use cases of Cross Cluster Elasticsearch Replication are:- 

  • Data Locality – In the case of Cross Cluster Replication the data gets replicates closer to the user or application server and this data locality helps to reduce latency and ensures faster processing.
  • High Availability – In the case of Cross Cluster Elasticsearch Replication, you will have multiple copies of data across the cluster ensuring that you have at least one copy of data available at any point in time thus maintaining high availability of data whenever any nodes are down.
  • Centralized Reporting – Using Cross Cluster Replication you can replicate data from various smaller clusters to a centralized reporting cluster and this may prove to be useful when it may not be efficient to query across a large network.

How to Set Up Cross Cluster Elasticsearch Replication

Now let us discuss the different steps that are involved in the process of setting up Cross Cluster Elasticsearch Replication:-

Step-1: Connect to Remote Cluster

In the first step in order to replicate an index on a remote cluster say cluster A to a local cluster say cluster B, you configure cluster A as a remote on cluster B.

In order to configure a remote cluster from Stack Management in Kibana:

  1. First, you have to Select Remote Clusters from the side navigation.
  2. Then Specify the Elasticsearch endpoint URL, or the IP address or host name of the remote cluster i.e cluster A, followed by the transport port of the remote cluster. 

Step-2: Enable Soft Deletes on Leader Indices

In order to enable the replication and to follow an index, you need to ensure that soft deletes are enabled while creating the indexes, in case you do not have the soft delete features enabled, then in that case you need to reindex it and use the new index as the leader index. Soft Deletes are enabled by default in Elasticsearch 7.0 and later.

Step-3: Create a Follower Index to Replicate the Leader Index

Now the follower index will follow the leader index and in order to create the follower index you need to take the following steps:-

  1. Select Cross-Cluster Replication from the side navigation, and choose the Follower Indices tab.
  2. Now select the leader index cluster that you want to replicate.
  3. In the final step provide the name of the leader index and also add the follower index.

Step-4: Create an Auto-follow Pattern to Replicate Time-series Indices

The auto-follow pattern can be used to create new followers in Time Series Indices. It needs the information about the remote cluster that you want to replicate, and one or more index patterns to replicate the time-series indices.

In order to create an auto-follow pattern, follow these steps:-

  1. Firstly, select Cross Cluster Replication and select the Auto-follow patterns tab from the side navigation.
  2. Now provide the name for the auto-follow pattern.
  3. Select the remote cluster containing the index.
  4. Now provide one or more index patterns to identify the indices you want to replicate from the remote cluster. 
  5. Use follower- as the prefix for follower indices in order to easily identify replicated indices.

Once the setup is done, Elasticsearch automatically replicates the new indices matching the pattern to local follower indices.

What is Replication Storage?

Storage Based Replication or Replication Storage helps to replicate the data available over a network to various different storage locations, which helps users to access data in real-time from various different storage locations when there are unexpected failures at the source storage location. It helps to enhance the availability, accessibility, and retrieval speed of data and allows replicating data across multi-vendor products.

Conclusion

In this article, we have discussed in length about Elasticsearch, an open-source tool that helps to solve organizational problems by allowing them to store, search and analyze data in real-time, and the benefits that it offers to businesses. We have also looked into what Cross Cluster Elasticsearch Replication is and the process to set it up and we have also introduced you to the concept of Replication Storage.

 

To Top

Pin It on Pinterest

Share This