Elasticsearch - a quick overview
Elasticsearch is an open-source search and analytics engine where you can gather, process, store, analyze and visualize big volumes of real time data.
Benefits of using ElasticSearch :
- Rapid speed in searching (unlike relational DB which takes time due to joins)
- Scaling facility (distributed architecture also provide quick scaling in cluster of multiple nodes along with replication
- Analytical capability – which is more popular than search, great for log analysis.
- Inbuilt libraries to support multiple programing language
Working of ES:
Elastic stack has 3 components: Logstash which gather and process raw data , Elastic search then store and index those data after filtration , we can run complex queries base don indexes and also use aggregation on summarized data. Finally with help of Kibana – user can visualize their data along with dashboard.
Key terms of ES:
Data stored in ES as JSON format.
Node: Single server to store data.
Cluster: Collection of multiple nodes. Each node can discover other node with cluster name
Index: Collection of documents of similar structure and used to store and read document in it
Document: Basic unit of ES
Shard: Subset of documents which can be laid across multiple nodes in case of large data set unable to be stored in single node.
Replica: Copies of nodes to handle the failure of node.
Memory Requirements of ES:
Since ES use JAVA, thus JVM runtime engine is required to run ES.
ES require heap memory to run, thus not too much and not too less heap size should be considered
While too much heap size will fasten the searching and indexing , it can be impactful for other running applications. Again too less heap size is also slow down the ES performance and increasing I/O overhead in disc storage
Best practice to allocate heapsize as 50% of RAM
Other factors – JVM garbage collection settings, available RAM and size of dataset
For better result – need to monitor heapsize and garbage collection metrics
Caching:
ES also use memory for caching for better performance. Some of used cache are :
Field data cache
Node Query cache
Shard request cache
Use case of ES:
Storing and analyzing log, metrics and security events
Manage, integrate spatial information using ES as geographic Information System(GIS)
Automate business workflows with storage engine
Full text search used for e-commerce, enterprise search also preventing fraud /security issue
Scraping and combining public data
In conclusion, Elasticsearch is a powerful and versatile tool for modern data management and search applications. With its efficient indexing, fast search capabilities, and strong scalability, it helps developers and organizations deliver great user experiences and gain deeper insights from their data.
Subscribe to my newsletter
Read articles from Sajal Saha directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by