Gemini API: Image Prompts Simplified

Tanmai KamatTanmai Kamat
5 min read

In last blog, we discussed how can we create a simple prompt API on ExpressJs with Help of Gemini. In this Blog, we will further develop APIs that can send Images to get Response based on the Prompt. So, there are multiple ways in which we may want to send the Image. Lets learn how can you utilize the Gemini to analyze the Image and get response.

Prerequisites

  1. Follow This Article: Create A Server Using Gemini

1. From Local File

Most Basic API can be created with image in local files.

1a. Utility Functions

Gemini needs data in a specific format to process and give results. Before Creating API, lets create utility functions as follows.

  1. loadImageFromStorage : To load Image from files.

  2. bufferToGeminiData : To convert the Image in format that is used by Gemini.

Create a new file utils/image_util.js. and create the following functions:

import fs from 'fs/promises';


let loadImageFromStorage = async (path) => {
    let data = await fs.readFile(path);
    return data;
}


let bufferToGeminiData = (buffer, mime_type = 'image/jpeg') => {

    const data = {
        inlineData: {
            data: buffer.toString("base64"),
            mime_type: mime_type,
        }
    }

    return data;
}

module.exports = {
    loadImageFromStorage,
    bufferToGeminiData
}

1b. Test Image in Assets

Firstly, we need an Image. You use any of your existing Images or download one using Unsplash API: https://source.unsplash.com/random. and save it to assets/test.jpg.

This is the Image I got: Fish Image

1c. Testing the Functions (Optional)

Lets test if we are able to load the image and convert it in required Format. Create a test/image.test.js file.

const image_utils = require('../utils/image_utils');

describe('Image Utils', () => {
    it('Testing ', async () => {
        const data = await image_utils.loadImageFromStorage('assets/test.jpg');
        const buffer = Buffer.from(data);
        const mime_type = 'image/jpeg';
        const geminiData=image_utils.bufferToGeminiData(buffer,mime_type);
        expect(geminiData).toBeDefined();
        expect(geminiData).toEqual({ inlineData: { data: buffer.toString("base64"), mime_type: mime_type } });
    });


});

To test, we need a module jest. Lets install the testing module with:

npm install --save-dev -g jest

Now you can test using the jest command:

PS ~> jest
 PASS  tests/image.test.js
  Image Utils
    √ Testing  (21 ms)                                                                                                                       

Test Suites: 1 passed, 1 total                                                                                                               
Tests:       1 passed, 1 total                                                                                                               
Snapshots:   0 total
Time:        0.519 s, estimated 1 s
Ran all test suites.

Thus, we have successfully tested the functions. Let's move to create API.

1d. Creating the API

Firstly, create a POST request and define what we already know. This time we will be using gemini-pro-vision model instead of gemini-pro model as we are dealing with images.

//const imports
const image_utils=require('../utils/image_utils');


//Prompt to use Image in Node Server to prompt
router.post('/prompt-with-server-image',async(req,res)=>{
    const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });
    const prompt = req.body.prompt;
    const image=await image_utils.loadImageFromStorage("assets/test.jpg");
    const imagebuffer=Buffer.from(image,'base64');
    const geminiImageData = image_utils.bufferToGeminiData(imagebuffer);

    const result = await model.generateContent([prompt, geminiImageData]);
    const response = await result.response;
    const text = response.text();
    console.log(text);
    res.status(200).json({answer:text});
})

You are all set to test this API!

1e. Testing the API

Congratulations! You are finally able to use images in the prompt! But this was tedious right? Downloading Images, putting to folder, getting the path. But what if we can directly use the Image URL along with the prompt? Thats what we will be doing next!

2. Using URL

The most convenient way to send images to analyze is sending the Image URL. But Gemini Vision Base model is not connected to Internet😞😞. So what? Our server is connected to Internet! Let's see how we utilize this to put Image to prompt.

2a. Fetching Image from Internet

Gemini needs data in a specific format to process and give results. Before Creating API, we will need another utility function:

  1. fetchImageFromUrl : To Fetch the Image from Internet

We need to add another function to our utils/image_utils.js

let fetchDataFromUrl = async (url) => {
    let data = await fetch(url).then((response) => response.arrayBuffer()).then((buffer) => Buffer.from(buffer));
    return data;
}

//make sure you update the module.exports as well
module.exports = {
    loadImageFromStorage,
    bufferToGeminiData,
    fetchImageFromUrl
}

We also will need to determine the mime Type of Image. For that we need to install this package:

npm install file-type@14

2b. Testing the Function (Optional)

Let's Create add more test cases to our image.test.js. So our final Test file will be like this:

const image_utils = require('../utils/image_utils');
const fileType = require('file-type');

describe('Image Utils', () => {

    it('Testing ', async () => {
        const data = await image_utils.loadImageFromStorage('assets/test.jpg');
        const buffer = Buffer.from(data);
        const mime_type = 'image/jpeg';
        const geminiData=image_utils.bufferToGeminiData(buffer,mime_type);
        expect(geminiData).toBeDefined();
        expect(geminiData).toEqual({ inlineData: { data: buffer.toString("base64"), mime_type: mime_type } });
    });

    it('File From URL Test ', async () => {
        const data = await image_utils.fetchImageFromUrl('https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png')

        const type=await fileType.fromBuffer(data);
        expect(type).toBeDefined();
        expect(type.mime).toContain('image');// image/png or image/jpeg

    });
    it('File From URL Test Wrong Image Url ', async () => {
        const data = await image_utils.fetchImageFromUrl('https://www.google.com')

        const type=await fileType.fromBuffer(data);
        expect(type).toBeUndefined();


    });



});

You can try and test out various URL. We can test out the functions with the following.

jest

2c. Lets Create the API!

Let move to our final step. Integrate the functions together to create a new API. Create a POST Request in router/gemini.js

//Old Code
router.post('/prompt-with-image-url',async(req,res)=>{
    const model = genAI.getGenerativeModel({ model: "gemini-pro-vision" });
    const prompt = req.body.prompt;
    const url=req.body.url;
    //Here we Get the image From URL and convert to Format required by Gemini
    const image_buffer=await image_utils.fetchImageFromUrl(url);
    //Type Verification code will come here
    const geminiImageData = image_utils.bufferToGeminiData(image_buffer);

    const result = await model.generateContent([prompt, geminiImageData]);
    const response = await result.response;
    const text = response.text();
    console.log(text);
    res.status(200).json({answer:text});
}) 

//module.exports....

We are almost Done!!

Now, we also need to make sure that image buffer received should be image and not anything else. Thus, we will use file-type here. Update the code as follows:

//Type Verification code will come here
    const type=await fileType.fromBuffer(image_buffer)
    if(type==undefined){
        res.status(400).json({error:"Invalid Image URL"});
        return;
    }
    const geminiImageData = image_utils.bufferToGeminiData(image_buffer,type.mime);
//..

We have successfully created the API. Lets test it!

2d. Testing the API

Lets start with a google link. As we see, We got an error!

Now, Lets try the previous Image URL that I used.

We can also test the api using the curl command.


curl -X POST -H "Content-Type: application/json" -d '{ "prompt":"Tell me about Yourself" ,"url":"https://www.google.com"}' http://127.0.0.1:8000/api/gemini/prompt-with-image-url

We have finally completed the APIs that can utilize Gemini api along with Images as input!

Lets Summarize:

  1. We began by crafting two utility functions aimed at loading images and converting data into the format necessary for Gemini.

  2. Then we created API to load data from server and create prompt

  3. We Implemented Another API to fetch images from URLs and utilize the retrieved images for prompts.

Stay tuned for more such blogs!

0
Subscribe to my newsletter

Read articles from Tanmai Kamat directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tanmai Kamat
Tanmai Kamat