Learn Terminal Commands with Mainframe

Vinayak NigamVinayak Nigam
11 min read

okaaaay let's go...

Now that I have successfully implemented the prototype of my idea, I think it's time for a blog about how and why I did it?

Originally this blog was for Hashnode AI for Tomorrow Hackathon but since I came to know about it the last week of the deadline and that ws and https security issue(described later), the terminal was not working on the deployed website so was not able to submit it on time. But I felt that since I worked so hard on this project and gave so many hours, I sat yesterday to fix that issue and actually release it for 2-3 months.

Caution: If you are reading this blog for the AI aspect, I have not described the process very much since you can very easily find it here. It will focus more on the web-based isolated terminal.

Visit learn terminal website

Click on the image above to visit the live application.

But why tho?

I actually came to know about the AI for Tomorrow hackathon quite late like on this(27/07/24) Monday only and I had just a week to hack something away and bring it to fruition. I kept thinking about what can I could make with AI with actual real-life use cases as well as useful to society as a whole. Thankfully I was able to get inspiration,

In this day of GUI(Graphical User Interface), the need of learning commands is growing scarce but I firmly believe that we should still know how terminal commands work cause even though we can generate them using AI, trying them directly on out system is kinda iffy.

Now I had an idea, "To build an AI-powered chatbot for learning terminal commands" but following the trend of hitting myself in the balls(you will get it if you read my previous blogs), I thought that just giving a simple chatbot would not cut it since the user can already do that by half-half screen setup. I wanted something more.

I wanted, that users can try the commands themself side by side as they are learning but I also knew that typing commands can be scary and that I don't mess up my machine or I accidentally don't delete something fueled by the nightmare-inducing Twitter stories.

I wanted to give users a safe and mistake-tolerant environment where they can try commands at ease without any worries, hence came the idea of isolated web-based terminals. But first, let's start with the AI.

Whats another chatbot?

So we start with a simple chatbot... (don't worry, this section will be small cause the interesting thing is saved for later)

For the chatbot, we all have used ChatGPT so we all know what the interface should look like right?
Well...yeah, you're not wrong in that sense but LLMs as names suggest are Large so if we want them to do a specific task then we would have to give them more context.
That is called the first prompt we give or know by its more technical name "System Prompt".

Since the API for OpenAI Chat Completion(for creating a chatbot) is stateless ie it doesn't know what is happening before or after the request;

Ask and you shall receive.

So for that, we have to make an array of messages which we pass to the API so that it knows the context and previous messages starting with the System prompt.

In this aspect, the Vercel @ai-sdk helped me very much cause they give a nice layer of abstraction over raw API calls and also give some functions built in like regenerate chat or stop generating etc.

But even before all that, we need an OpenAI API Key to do any of this. So for that, we have to take user input but how do we check if the user provided a valid key or not?
For this, I have a neat little trick which I also used in the classify emails app, ie calling the API to list the models with that API key, it's essentially a free way to check the validity of the key.

However I would say that since many applications are using OpenAI API for their application, OpenAI should create an SDK functionality which lets user connects their account and lets them choose which key they wanna use for an application and the key management is with OpenAI and not a headache of the developer

Now that we have the API Key we have to create the UI for the chatbot. I have personally used Shadcn for styling the components and I have to say, I became a fan of that. It gives full control over the components you are using and also ready to use if you don't want to give it much thought.
The Vercel @ai-sdk also gives a useChat() function which returns all the necessary components which you will need to create a basic ChatGPT-like chatbot like messages, regenerate function, input state control function, function to append the user query into messages etc. You can use all these things to create a general chatbot.
However to use that useChat you have to create a route handler named chat for POST request. There you can create an OpenAI instance using createOpenAI() and then use streamText function to do the actual call to Open AI API and you can then use the onFinish function which is a callback when the streaming finishes. The really cool part about ai-sdk is that it supports text streaming quite easily. For a naive and basic implementation you can also check out my repo for that.

I personally feel like for a basic chat for this prototype, you can see the existing repos like Vercel ai chatbot to get a feel for how to do it.

Let me skip to the good part

Before reading this part, I would suggest you see this and this video to build some context and I am assuming that you know the basics of AWS EC2.

Everyone under the sun has made a chatbot and a GPT wrapper so I wanted to do something new and fix the crux of the problem of why people are not actually straight-up using ChatGPT for learning commands. I identified the problem that an AI can hallucinate and trying out bash commands generated by AI on our machine is not a very bright idea if I say so myself.
The people need a safe environment in which they can try out commands without the fear of messing up their system. I concluded that I have to provide the users with a temporary command line in which they can try out things and also can just create a new machine if they mess up the current one with a click of a button.

Now that I have the idea, I have to think about how I will achieve this. Initially, when I had this idea, I had just one thought that I would have to use Docker or Kubernetes to solve this problem since we are talking about a temporary easy-to-throw machine without much overhead.
There were many puzzle pieces which I had to figure out for this so let's take it from the front and back.
I researched a bit on how web-based terminals are made and how the UI and process happen behind the scenes and I came to know about xtermJS which is used to build browser-based terminals. Well, that solves everything, right?

No, not quite. You see Xterm.js is the frontend component that powers many terminals including VS Code, Hyper and Theia! but it's just a hollow appearance. Sure it will handle the appearance(which is BIG and a very complex thing) but for it to function we would have to attach it to a bash process.
Fortunately, Xterm.js provides various add-ons to extend the base functionality of the emulator, one of which includes AttachAddon which basically takes a WebSocket and attachs it to the terminal, handling all the stream in and out by itself(I will preface all this by saying that the whole terminal thing is a huge security risk and when implemented should be done with security in mind).

So for my backend, I should give it a user ID and it should give me a Websocket link which can be used directly to connect to the Docker container.

Preparing for Backend

I realized that the first thing I needed to do was create a Dockerfile to specify the type of Docker Image I wanted. The Dockerfile builds upon layers, so for my purpose, I used the official Ubuntu latest Docker Image as a base. I then added basic packages and utilities and exposed port 3000 for later mapping and run the bash command when the container starts

After that, I built a Docker Image using that Docker file and uploaded it to Docker Hub.

Then I had to rent out a VPS(Virtual Private Server) to create the backend part where all the docker stuff would happen. I was leaning towards Digital Ocean since I get $200 through Github but I didn't have a credit card so couldn't complete the sign-up. At the end, I came back to my old friend AWS and rent out an EC2 Instance.

I created a new security group for this instance which allowed inbound connections on ports 22(SSH), 80(HTTP), 443(HTTPS) and any custom port which you wanna expose.

What is happening BTS?

My basic idea at first was to create an Express server with 2 endpoints, one for starting a terminal and one for removing it along with a WebSocket functionality. For initial security and the reason that I didn't wanna open tons of ports in the security group of EC2, I had set up Nginx as a reverse proxy so that only 2 ports(80/443) are used for communication.

This is something which I learned in the future at 2 am, 2hrs before my hackathon deadline that you cannot use ws:// link with https:// protocol(since everything was working fine in localhost and my website is deployed with HTTPS), you have to use wss:// with https:// so I also have to set up an SSL certificate in my nginx configuration so that wss can work out of the box. This part was a little difficult since when I was initially making it, I wasn't able to figure out how to do it properly but later I came to find out that it's quite easy.

PS: I did not enjoy knowing this information 4 hrs before deadline at 12 AM

Let's take it step by step (in very brief; I would recommend you to read the blogs below for details):

## These are just the short instructions.
sudo apt update
sudo apt install nginx
# In case you have ufw enabled
sudo ufw allow 'Nginx Full' 

# Setup a server block as given in the blog below
/etc/nginx/sites-available/example.com
server {
        listen 80;
        listen [::]:80;

        root /var/www/example.com/html;
        index index.html index.htm index.nginx-debian.html;

        server_name example.com;

        location / {
            proxy_pass http://localhost:3001;
            proxy_redirect off;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            # For handling HTTP upgrade request for WebSocket
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
        }
}

sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/

# Get a domain or if you have a common domain which you use
# (maybe your protfolio one) then you can just make a subdomain using A record
# and point it to the Public IPv4 Address of your EC2 Instance
# ⚠️ Your Public IP of EC2 changes after you shutdown and start again
# after updating the A record, check through whatsmydns.net to see if the DNS
# propagation happened for that record in all servers

sudo snap install core; sudo snap refresh core
sudo snap install --classic certbot 
sudo certbot --nginx -d example.com

If you wanna read in detail about all these commands, go to these excellent blogs and get more context

How to install Nginx on Ubuntu-20-04

How to secure Nginx with Let's Encrypt on Ubuntu-22-04

Putting it all together

Apart from the Express server, you would have the Nginx which acts as a reverse proxy since there would be so many containers and it should also be capable of handling WebSocket connection. Essentially you would run your server on localhost and all / request will be proxied to that localhost.
You would also have to create a WebSocket server with noServer:true option true since you would be attaching it to the HTTP server on which Express is running. Then you also need to handle HTTP upgrade requests on the server because all WebSocket requests start as HTTP requests, and when an upgrade request occurs, the protocol switches to WebSocket. You would also have to see that you only upgrade requests which are WebSocket connections made via the URL which /start-container made. You also have to handle when a new WebSocket connection is established(connection event handler is triggered), in which you would have to find the container for that user and attach the container stream to the WebSocket so the in and out are streamed between the container and WebSocket.

Express handles the /start-container and /stop-container routes to handle the container creation and removal process.

This is the general architecture of the system which was built for the server

Now there may be some error and your script crashes, you cannot just SSH into your EC2 each time to restart the script. For that, you can use pm2 which will start your script and restart it in case of any error or anything.

PM2 is a daemon process manager that will help you manage and keep your application online 24/7

And in 6+1 sleepless days and constant toil, Mainframe is born. There are still so many security implications and features which I have not taken into account while building this since it is a hackathon but I am thinking of rebuilding this production-grade and maybe writing about it.

That's about it, final thoughts: it was pretty rough but hella enjoyable!

Follow me on Twitter for regular updates and thank you for reading.

10
Subscribe to my newsletter

Read articles from Vinayak Nigam directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vinayak Nigam
Vinayak Nigam