Create a knowledge base using Amazon OpenSearch
Photo by Luke Tanis on Unsplash
So, let’s get straight to the point. We are going to build a knowledge base using Amazon OpenSearch, Lambda, and API Gateway on AWS. 🎇
What is a knowledge base?
According to Atlassian, it is a repository or collection of documents representing all the relevant information about a certain topic. A knowledge base can be composed of documents, tutorials, FAQs, and any other content about the topic. Companies use the knowledge base to train new joiners after onboarding, store RCA reports for future issue resolution, store architecture diagrams of various systems, and disseminate legal or HR information (🙁).
From the developers’ perspectives, a knowledge base can be used for different purposes:
Get the new joiners up to speed with various system setup steps (Say it three times! 😎).
Keep track of all RCA and issue resolution documents for future reference.
Manage CR documents all in one place.
and many more…
What you need to know to complete this tutorial:
An AWS account.
AWS CLI installed and at least one profile configured with access-key-id and secret-access-key.
Familiarity with SAM (Serverless Application Model).
Two ways you can follow this tutorial:
- Clone this repository on your local machine and then deploy the stack using,
$> sam build
$> sam deploy --guided
2. Set up a local SAM project from scratch and then update the code as we go through it.
To better understand how the knowledge base is implemented, we will take route#2 and explain everything from scratch.
Photo by Mindspace Studio on Unsplash
So, let’s begin! 🔥
Project setup:
- First, open a terminal and create a simple SAM project using this command.
$> sam init
# For different options and choices,
# I have provided a set of options that you can choose.
# Text marked with bold is the choice.
Which template source would you like to use? 1
Choose an AWS Quick Start application template 1
Use the most popular runtime and package type? (Python and zip) y
Would you like to enable X-Ray tracing on the function(s) in your application? n
Would you like to enable monitoring using CloudWatch Application Insights? n
Would you like to set Structured Logging in JSON format on your Lambda functions? n
Project name [sam-app] opensearch-kb-app
2. Change the current directory to the directory of the project that we have just created.
$> cd opensearch-kb-app
3. Now that the project is set up, we can modify it to create our knowledge base. For that, open the template.yaml file in your favorite code editor and delete the content. Delete the /hello_world directory as well. 💯
In the following steps, we will add blocks of code to the template.yaml file and explain what they are doing. Simply add them in sequence and I will provide a reference file at the end so that you can check with your final version.
4. First, we add a description of what this template is going to deploy on AWS.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
Opensearch Knowledge Base App
Sample SAM Template for Opensearch Knowledge Base App
5. Next, in the Globals section, we specify that we are going to use python3.10 as the runtime for all the Lambda functions, and the timeout is set to 20s. We also create an environment variable called STAGE_NAME so that this variable can be accessed from every lambda function. The !Ref implies that we are accessing the value from a parameter.
Globals:
Function:
Timeout: 20
Runtime: python3.10
Environment:
Variables:
STAGE_NAME: !Ref StageName
6. Now, we define the parameter variables that will be used throughout the template. AllPrefix and StageName parameters are used in the name of every resource to identify them as part of this stack and also to avoid name collisions.
Parameters:
AllPrefix:
Type: String
Default: 'knowledge-base'
StageName:
Type: String
Default: 'dev'
The following code snippets will be added under the Resources section.
7. First, we create a Role that will be used by our Lambda functions. The role describes that Lambda functions will be able to assume this role and perform actions specified in the Policies section. ‘lambda:InvokeFunction’, action implies that API Gateway will be able to call the Lambda functions associated with this role. This role also allows the Lambda functions to interact with SecretsManager and OpenSearch services through boto3 library. We will see examples of it later.
Resources:
# Main Role
CustomMainRole:
Type: AWS::IAM::Role
Properties:
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- 'lambda.amazonaws.com'
Action:
- 'sts:AssumeRole'
Policies:
- PolicyName: 'CustomLambdaPolicy'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- 'lambda:InvokeFunction'
Resource: '*'
- Effect: Allow
Action:
- 'secretsmanager:*'
Resource: '*'
- Effect: Allow
Action:
- 'es:*'
Resource: '*'
8. Now, we need to use some form of authentication system to access the OpenSearch service that we will create shortly. For this tutorial, we will use the “master user — password” authentication. Instead of storing the username and password in the template file, which is not a secure approach, we will use Amazon SecretsManager service and point the OpenSearch service to use the username and password to authenticate users trying to access the OpenSearch dashboard. As you can probably decipher from the description below, the “username” is set to “admin” and the “password” is auto-generated with some validation rules.
OpenSearchSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: !Sub '${AllPrefix}-os-secret-${StageName}'
Description: 'Password will be generated dynamically'
GenerateSecretString:
SecretStringTemplate: !Sub '{"username": "admin"}'
GenerateStringKey: 'password'
PasswordLength: 25
ExcludeCharacters: '"@/\'
9. The big guy now! Our OpenSearch service. Forget the alien configurations below 😱. The Gist of this configuration is this — We will be deploying an OpenSearch cluster with single instance of type t3.small.search. MasterUserOptions property specifies that for the authentication system, “username-password” will be used and where to go to get these values (the SecretsManager resource mentioned above).
OpenSearchServiceDomain:
Type: AWS::OpenSearchService::Domain
Properties:
DomainName: !Sub '${AllPrefix}-os-${StageName}'
EngineVersion: 'OpenSearch_2.11'
AccessPolicies:
Version: '2012-10-17'
Statement:
- Effect: 'Allow'
Principal:
AWS: '*'
Action: 'es:*'
Resource: '*'
ClusterConfig:
InstanceCount: 1
ZoneAwarenessEnabled: false
InstanceType: 't3.small.search'
NodeToNodeEncryptionOptions:
Enabled: true
EncryptionAtRestOptions:
Enabled: true
EBSOptions:
EBSEnabled: true
Iops: '0'
VolumeSize: 15
VolumeType: 'gp2'
DomainEndpointOptions:
EnforceHTTPS: true
AdvancedSecurityOptions:
Enabled: true
InternalUserDatabaseEnabled: true
MasterUserOptions:
MasterUserName: !Join [ '', [ '{{resolve:secretsmanager:', !Ref OpenSearchSecret, ':SecretString:username}}' ] ]
MasterUserPassword: !Join [ '', [ '{{resolve:secretsmanager:', !Ref OpenSearchSecret, ':SecretString:password}}' ] ]
10. We need to have some APIs to handle users’ requests for various CRUD operations involving OpenSearch, e.g. to index documents, update the indexed documents, and search for documents given a query. We will use API Gateway to manage these APIs.
DefaultApi:
Type: AWS::Serverless::Api
Properties:
StageName: !Ref StageName
GatewayResponses:
DEFAULT_4XX:
ResponseParameters:
Headers:
Access-Control-Allow-Origin: "'*'"
Access-Control-Allow-Headers: "'*'"
DEFAULT_5XX:
ResponseParameters:
Headers:
Access-Control-Allow-Origin: "'*'"
Access-Control-Allow-Headers: "'*'"
Cors:
AllowMethods: "'*'"
AllowHeaders: "'*'"
AllowOrigin: "'*'"
Now that we have configured all the necessary resources, we will create four APIs to handle users’ requests, namely create-document, update-document, get-document, and finally search-documents.
All these Lambdas follow a common pattern. We assign the role CustomMainRole that was created earlier. We define two environment variables containing the SecretsManager secret ID and OpenSearch domain endpoint. Note the Path and Method properties for each API. We will need them later.
11. Create Document API
CreateDocumentFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: lambdas/
Handler: create_doc.lambda_handler
Role: !GetAtt CustomMainRole.Arn
Environment:
Variables:
OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
Events:
PingRootEvent:
Type: Api
Properties:
Path: /{index_name}/kb-docs
Method: post
RestApiId: !Ref DefaultApi
12. Update Document API
UpdateDocumentFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: lambdas/
Handler: update_doc.lambda_handler
Role: !GetAtt CustomMainRole.Arn
Environment:
Variables:
OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
Events:
PingRootEvent:
Type: Api
Properties:
Path: /{index_name}/kb-docs/{doc_id}
Method: put
RestApiId: !Ref DefaultApi
13. Get Document API
GetDocumentFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: lambdas/
Handler: get_doc.lambda_handler
Role: !GetAtt CustomMainRole.Arn
Environment:
Variables:
OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
Events:
PingRootEvent:
Type: Api
Properties:
Path: /{index_name}/kb-docs/{doc_id}
Method: get
RestApiId: !Ref DefaultApi
14. Search Documents API
SearchDocumentsFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: lambdas/
Handler: search_docs.lambda_handler
Role: !GetAtt CustomMainRole.Arn
Environment:
Variables:
OPEN_SEARCH_SECRET: !Ref OpenSearchSecret
OPEN_SEARCH_DOMAIN_ENDPOINT: !GetAtt OpenSearchServiceDomain.DomainEndpoint
Events:
PingRootEvent:
Type: Api
Properties:
Path: /{index_name}/kb-docs/search
Method: post
RestApiId: !Ref DefaultApi
15. Finally! We are at the end of this template file 🥳. We output the base URL of the API Gateway so that we can call the APIs created above by simply appending the {Path} part for each API to the base URL.
Outputs:
ApiGatewayLambdaInvokeUrl:
Value: !Sub 'https://${DefaultApi}.execute-api.${AWS::Region}.amazonaws.com/${StageName}'
This concludes the configuration of our knowledge base template file. Now, did you mess anything up 🤔? No worries. Compare your version with this file and be merry.
In the next post, we will create the handling logic of our CRUD APIs and deploy the stack. See you then 🙏.
If you found this post useful, please give it a 👏🏽 and follow me on Medium. Let’s get connected on LinkedIn.
Subscribe to my newsletter
Read articles from Asadullah Al Galib directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by