Elasticsearch - a quick overview

Sajal SahaSajal Saha
3 min read

Elasticsearch is an open-source search and analytics engine where you can gather, process, store, analyze and visualize big volumes of real time data.

Benefits of using ElasticSearch :

  1. Rapid speed in searching (unlike relational DB which takes time due to joins)
  1. Scaling facility (distributed architecture also provide quick scaling in cluster of multiple nodes along with replication
  1. Analytical capability – which is more popular than search, great for log analysis.
  1. Inbuilt libraries to support multiple programing language

Working of ES:

Elastic stack has 3 components: Logstash which gather and process raw data , Elastic search then store and index those data after filtration , we can run complex queries base don indexes and also use aggregation on summarized data. Finally with help of Kibana – user can visualize their data along with dashboard.

Key terms of ES:

Data stored in ES as JSON format.

  1. Node: Single server to store data.

  2. Cluster: Collection of multiple nodes. Each node can discover other node with cluster name

  3. Index: Collection of documents of similar structure and used to store and read document in it

  4. Document: Basic unit of ES

  5. Shard: Subset of documents which can be laid across multiple nodes in case of large data set unable to be stored in single node.

  6. Replica: Copies of nodes to handle the failure of node.

Memory Requirements of ES:

  • Since ES use JAVA, thus JVM runtime engine is required to run ES.

  • ES require heap memory to run, thus not too much and not too less heap size should be considered

  • While too much heap size will fasten the searching and indexing , it can be impactful for other running applications. Again too less heap size is also slow down the ES performance and increasing I/O overhead in disc storage

  • Best practice to allocate heapsize as 50% of RAM

  • Other factors – JVM garbage collection settings, available RAM and size of dataset

  • For better result – need to monitor heapsize and garbage collection metrics

Caching:

ES also use memory for caching for better performance. Some of used cache are :

  • Field data cache

  • Node Query cache

  • Shard request cache

Use case of ES:

  • Storing and analyzing log, metrics and security events

  • Manage, integrate spatial information using ES as geographic Information System(GIS)

  • Automate business workflows with storage engine

  • Full text search used for e-commerce, enterprise search also preventing fraud /security issue

  • Scraping and combining public data

In conclusion, Elasticsearch is a powerful and versatile tool for modern data management and search applications. With its efficient indexing, fast search capabilities, and strong scalability, it helps developers and organizations deliver great user experiences and gain deeper insights from their data.

0
Subscribe to my newsletter

Read articles from Sajal Saha directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sajal Saha
Sajal Saha