Elasticsearch database introduction and terminology

BodheeshBodheesh
3 min read

Table of Contents

  • Elasticsearch is a distributed database where data is stored as JSON documents

  • Elasticsearch is horizontally scalable, i.e., the database can run in multiple servers (nodes)

  • Elasticsearch supports many data types like text, number, Geo-spatial, IP addresses etc

  • Elasticsearch stores data in a data structure called inverted index, where data is literally stored as searches. This makes querying very fast even if vast amounts of data storage

inverted_index_example1.png

inverted_index_example2.png

Analogy to Relational database

  • Index is a collection of documents. It is like an RDBMS table.
RDBMSElasticsearch
TableIndex
RowJSON Document
Columns in a rowattributes of JSON document

elasticsearch_rdms_mapping.png

Nodes, Indexes and Shards

  • Node means a computer (server) running Elasticsearch

  • An Index is a logical group of one or more physical shards. Each shard is a Lucene index (a self-contained index)

elasticsearch_index.png

  • There are two types of shards: primary and replicas. Replica shards are for redundancy and serving data queries.

  • The shards, data and queries are distributed among nodes to facilitate availability and scalability in a multi-node (multiple servers) cluster. The shards and data are automatically re-balanced when a node is added or removed

elasticsearch_index_shards_nodes.png

Index Template

  • Index template is the settings applied to the index while creation. It is like a blueprint for creating an index

  • Index template contains settings like number of shards and replicas, data mapping, priority etc.

  • Index data mapping of an index template defines the schema of documents stored in the index. Index data mapping can be set to dynamic, so that the schema will be derived while the data is being ingested. This is also called Schema on Write. If the index data mapping is set to strict, just like an RDBMS, the index will reject the incoming documents not complying to the index data mapping properties.

  • The following is an example console command to create an Index template

PUT _index_template/template_1
{
  "index_patterns": ["te*", "bar*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    },
    "aliases": {
      "mydata": { }
    }
  },
  "priority": 500,
  "composed_of": ["component_template1", "runtime_component_template"], 
  "version": 3,
  "_meta": {
    "description": "my custom"
  }
}

Index alias in Elasticsearch

elasticsearch_index_alias.png

  • An index alias is a group of indices. Documents can be inserted into an index group using alias. Only the index marked as write index can accept documents for insertion

  • An alias can be specified to include all the indices following an index pattern (like mylogs-*). The following command creates an alias named “logs” that groups all indices starting with “logs-”

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-*",
        "alias": "logs"
      }
    }
  ]
}
  • Using index alias with index lifecycle management, data of an index can be automated to roll over into new index based on a threshold age or size, so that the data of an index can be split into multiple indices for efficiency and tiered storage. Also splitting data into multiple indices also can utilize multi node cluster resources for parallel data queries

Data streams in Elasticsearch

  • Data stream is an abstraction on top of index designed for append only time-series documents. The clients interact with data stream for updating documents. The data stream stores data in backing indexes (also called hidden indices).

  • New index will be created as per the configured index lifecycle policy thresholds (like threshold age, threshold size etc.). Data can be queried from all indices but can be written only to the latest index.

elasticsearch_datastreams.png

0
Subscribe to my newsletter

Read articles from Bodheesh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bodheesh
Bodheesh

🚀 Bodheesh V C | Software Engineer | Tech Blogger | Vlogger | Learning Enthusiast | Node.js Developer | MERN Stack Advocate As a backend developer with over 2.5 years of experience, I’m passionate about building efficient, scalable systems that drive impactful results. Specializing in Node.js and the MERN stack, I’ve had the opportunity to work on a wide array of projects—from creating seamless APIs and real-time applications to implementing effective codes that enhance performance and scalability. I believe in simplicity and maintainability, striving to write clean, well-documented code that not only solves the problem at hand but also sets the foundation for future growth. My experience spans a variety of domains including backend architecture, database management, API development and cloud integrations. Beyond coding, I have a deep interest in sharing knowledge with the developer community. I actively contribute to blogs and social platforms, simplifying complex concepts around system design, DSA (Data Structures and Algorithms), and interview preparation, so that developers at all levels can benefit. My tech blogs, interview guides, and insights into best coding practices are geared toward helping others grow and succeed in their software development journey. I am also a vlogger, and I’ve created content around tech trends, tutorials, and tools to inspire others to pursue their passion for technology. As a continuous learner, I’m constantly exploring emerging technologies, such as AI-driven development, and finding ways to incorporate these advancements into real-world applications. My enthusiasm for coding, combined with my commitment to lifelong learning, keeps me motivated and excited for the future of software development. What I do: Backend development, specializing in Node.js and MERN stack Creating scalable, high-performance Applications Building RESTful APIs and microservices Writing tech blogs on Node.js, system design, and coding best practices Preparing developers for interviews with practical guidance on DSA and problem-solving Advocating for clean code and maintainable architectures Sharing knowledge through tutorials, blogs, and videos Whether it’s collaborating on an exciting project, learning from fellow developers, or simply discussing the latest in tech, I’m always eager to connect and grow together with the community. Let’s build something great!