A Comprehensive Guide to LinkedIn Scraping Apis
Hey there, LinkedIn explorers! Tired of the endless scrolling?
Gone are the days of endless profile browsing and manual copy-pasting. We're entering the era of the API, where data flows like espresso at a networking event (without the awkward silence, thank goodness).
This blog is your roadmap to the hidden world of LinkedIn scraping APIs. We'll unveil the top contenders, exploring their strengths and weaknesses and their use cases. Whether you're a seasoned data extractor seeking peak efficiency or a curious newcomer to the world of LinkedIn data, we'll equip you with the knowledge to choose the perfect API for your professional intelligence goals. We have chosen Python as the programming language.
1. ProxyCurl (Rating: 8/10)
Streamline LinkedIn Data Extraction with Proxycurl's API. Uncover valuable insights from profiles, jobs, and more with unparalleled speed and ease. Ditch the manual labor and empower your data-driven strategies with this high-performance tool. Explore Proxycurl today and unlock the potential of your LinkedIn data.
Here's how you can use it!๐ฅ
import requests
import time
api_key = 'Your ProxyCurl API_KEY' # Put your API Key here
headers = {'Authorization': 'Bearer ' + api_key}
api_endpoint = 'https://nubela.co/proxycurl/api/v2/linkedin'
params = {
'linkedin_profile_url': 'https://www.linkedin.com/in/pradipnichite/',
'skills': 'include',
'use_cache': 'if-recent',
'fallback_to_cache': 'never',
}
start_time = time.time()
response = requests.get(api_endpoint,
params=params,
headers=headers)
end_time = time.time()
latency = end_time - start_time
print("Time taken:", latency)
data = response.json()
print(data)
Response:
{'public_identifier': 'pradipnichite',
'profile_pic_url': 'https://s3.us-west-000.backblazeb2.com/proxycurl/person/pradipnichite/profile?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0004d7f56a0400b0000000001%2F20231221%2Fus-west-000%2Fs3%2Faws4_request&X-Amz-Date=20231221T062406Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=0032cb56e455b805568193d9712cb1547484fa04e7447b6cc5025cb45ba089c3',
'background_cover_image_url': 'https://s3.us-west-000.backblazeb2.com/proxycurl/person/pradipnichite/cover?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0004d7f56a0400b0000000001%2F20231221%2Fus-west-000%2Fs3%2Faws4_request&X-Amz-Date=20231221T062406Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=5409bcd64031af889774383e4aed7d74fb85d048057d7f3a5d6ac75be6c88eef',
'first_name': 'Pradip',
'last_name': 'Nichite',
'full_name': 'Pradip Nichite',
'follower_count': 20901,
'occupation': 'Founder & Lead Data Scientist at FutureSmart AI',
'headline': 'Top Rated Plus - NLP Freelancer | Custom NLP Solutions | GPT-4 | AI Demos',
'summary': "๐ I'm a Top Rated Plus NLP freelancer on Upwork with over $100K in earnings and a 100% Job Success rate. This journey began in 2022 after years of enriching experience in the field of Data Science.\n\nhttps://www.upwork.com/freelancers/pradipnichite\n\n๐ Starting my career in 2013 as a Software Developer focusing on backend and API development, I soon pursued my interest in Data Science by earning my M.Tech in IT from IIIT Bangalore, specializing in Data Science (2016 - 2018).\n\n๐ผ Upon graduation, I carved out a path in the industry as a Data Scientist at MiQ (2018 - 2020) and later ascended to the role of Lead Data Scientist at Oracle (2020 - 2022).\n\n๐ Inspired by my freelancing success, I founded FutureSmart AI in September 2022. We provide custom AI solutions for clients using the latest models and techniques in NLP.\n\n๐ฅ In addition, I run AI Demos, a platform aimed at educating people about the latest AI tools through engaging video demonstrations.\n\n๐งฐ My technical toolbox encompasses:\n๐ง Languages: Python, JavaScript, SQL.\n๐งช ML Libraries: PyTorch, Transformers, LangChain.\n๐ Specialties: Semantic Search, Sentence Transformers, Vector Databases.\n๐ฅ๏ธ Web Frameworks: FastAPI, Streamlit, Anvil.\nโ๏ธ Other: AWS, AWS RDS, MySQL.\n\n๐ In the fast-evolving landscape of AI, FutureSmart AI and I stand at the forefront, delivering cutting-edge, custom NLP solutions to clients across various industries.\n\nUpwork Profile: https://www.upwork.com/freelancers/~014fdabc6436bf9bd4?viewMode=1"
...
Priced at $49 for 2500 credits or $0.020 per credit for the monthly plan, and with a latency of around 2648ms, ProxyCurl extracts information such as full name, follower count, occupation, headline, summary, country, state, experiences, education, languages, certifications, and projects, among many others. However, it cannot scrape the posts or skills of the target person and does not scrape data in real-time.
2.Fresh LinkedIn Profile Data (Rating: 7.5/10)
Scrape valuable data from LinkedIn profiles like a pro with Fresh LinkedIn Profile Data API. This powerful tool unlocks a treasure trove of information, from contact details and skills to experience and education. Say goodbye to endless scrolling and hello to targeted data extraction, all with a simple API call. ๐
Here's how you can use it!๐ฅ
import requests
import time
url = "https://fresh-linkedin-profile-data.p.rapidapi.com/get-linkedin-profile"
querystring = {"linkedin_url":'https://www.linkedin.com/in/pradipnichite/',"include_skills":"true"}
headers = {
"X-RapidAPI-Key": 'Your Rapid API_KEY', # Put your API Key here
"X-RapidAPI-Host": "fresh-linkedin-profile-data.p.rapidapi.com"
}
start_time = time.time()
response = requests.get(url, headers=headers, params=querystring)
end_time = time.time()
latency = end_time - start_time
print("Time taken:", latency)
data = response.json()
print(data)
Response:
{'data': {'about': "๐ I'm a Top Rated Plus NLP freelancer on Upwork with over $100K in earnings and a 100% Job Success rate. This journey began in 2022 after years of enriching experience in the field of Data Science.\n\nhttps://www.upwork.com/freelancers/pradipnichite\n\n๐ Starting my career in 2013 as a Software Developer focusing on backend and API development, I soon pursued my interest in Data Science by earning my M.Tech in IT from IIIT Bangalore, specializing in Data Science (2016 - 2018).\n\n๐ผ Upon graduation, I carved out a path in the industry as a Data Scientist at MiQ (2018 - 2020) and later ascended to the role of Lead Data Scientist at Oracle (2020 - 2022).\n\n๐ Inspired by my freelancing success, I founded FutureSmart AI in September 2022. We provide custom AI solutions for clients using the latest models and techniques in NLP.\n\n๐ฅ In addition, I run AI Demos, a platform aimed at educating people about the latest AI tools through engaging video demonstrations.\n\n๐งฐ My technical toolbox encompasses:\n๐ง Languages: Python, JavaScript, SQL.\n๐งช ML Libraries: PyTorch, Transformers, LangChain.\n๐ Specialties: Semantic Search, Sentence Transformers, Vector Databases.\n๐ฅ๏ธ Web Frameworks: FastAPI, Streamlit, Anvil.\nโ๏ธ Other: AWS, AWS RDS, MySQL.\n\n๐ In the fast-evolving landscape of AI, FutureSmart AI and I stand at the forefront, delivering cutting-edge, custom NLP solutions to clients across various industries.\n\nUpwork Profile: https://www.upwork.com/freelancers/~014fdabc6436bf9bd4?viewMode=1",
'city': 'Mumbai',
'company': 'FutureSmart AI',
'company_domain': 'futuresmart.ai',
'company_employee_range': '1',
'company_industry': 'IT Services and IT Consulting',
'company_linkedin_url': 'https://www.linkedin.com/company/futuresmartai',
...
This API provides the company_names, about section, education, past experience, job_titles among other important data fields in the response body. The latency is around 4,031ms. Unlike many apis, this API can list all the skills too.
Priced at $45.00/month for the Pro version for 3000 requests a month, this can scrape data in real-time.
3. Scraping-bot.io (Rating: 9/10)
Streamline your web data extraction with ScrapingBot. This efficient tool effortlessly retrieves the precise information you need from any website, eliminating time-consuming manual copying and complex coding. Focus on your core tasks while ScrapingBot handles the data retrieval seamlessly. Discover effortless web harvesting and bid farewell to tedious data collection methods.
Here's how you can use it!๐ฅ
import requests
import json
from time import sleep
username = 'Your USERNAME' # Put your username here
apiKey = 'Your Scraping-bot.io API_KEY' # Put your API Key here
scraper = 'linkedinProfile'
url = 'https://www.linkedin.com/in/pradipnichite/'
apiEndPoint = "http://api.scraping-bot.io/scrape/data-scraper"
apiEndPointResponse = "http://api.scraping-bot.io/scrape/data-scraper-response?"
payload = json.dumps({"url": url, "scraper": scraper})
headers = {
'Content-Type': "application/json"
}
start_time = time.time()
response = requests.request("POST", apiEndPoint, data=payload, auth=(username, apiKey), headers=headers)
if response.status_code == 200:
print(response.json())
print(response.json()["responseId"])
responseId = response.json()["responseId"]
pending = True
count = 0
while pending:
count+=1
# sleep 5s between each loop, social-media scraping can take quite long to complete
# so there is no point calling the api quickly as we will return an error if you do so
sleep(5)
finalResponse = requests.request("GET", apiEndPointResponse + "scraper=" + scraper + "&responseId=" + responseId
, auth=(username, apiKey))
result = finalResponse.json()
if type(result) is list:
pending = False
print(finalResponse.text)
elif type(result) is dict:
if "status" in result and result["status"] == "pending":
print(result["message"])
continue
elif result["error"] is not None:
pending = False
print(json.dumps(result, indent=4))
else:
print(response.text)
end_time = time.time()
latency = end_time - start_time
print("Time taken:", latency)
Response:
[{'url': 'https://www.linkedin.com/in/pradipnichite/?_l=en',
'name': 'Pradip Nichite',
'current_company': {'name': 'FutureSmart AI',
'link': 'https://in.linkedin.com/company/futuresmartai?trk=public_profile_topcard-current-company'},
'avatar': 'https://media.licdn.com/dms/image/D4D03AQFU1AiD1jO0fg/profile-displayphoto-shrink_200_200/0/1674710874602?e=2147483647&v=beta&t=KWT0V2ZwkvcBxBK3vACgifZijNQ9JcQrlJdnW6n4yF8',
'about': "๐ I'm a Top Rated Plus NLP freelancer on Upwork with over $100K in earnings and a 100%โฆ",
'city': 'Mumbai, Maharashtra, India',
'followers': '21K followers',
'following': '500+ connections',
'educations_details': 'International Institute of Information Technology โ Bangalore',
'posts': [{'title': 'How does a machine learning algorithm learn? (with intuition and math that you already know )',
'attribution': 'By Pradip Nichite',
'img': 'https://static.licdn.com/scds/common/u/img/pic/pic_pulse_stock_article_9.jpg',
'link': 'https://www.linkedin.com/pulse/how-does-machine-learning-algorithm-learn-intuition-math-nichite?trk=public_profile_article_view',
'created_at': '2021-09-09T00:00:00.000Z'}],
'experience': [],
...
Beyond capturing traditional biographical information like projects, company and certifications, this API possesses the unique capability to monitor a target user's social media engagement. This includes tracking the sharing of their own posts and their reactions to those of others, offering profound insights into their interests, preferences, and online behavior. However, it fails to extract the user's skills and it is not consistent with the output of the education and experiences field.
With a competitive price of โฌ0.00039 per scrape, this innovative tool makes a compelling case for real-time LinkedIn data extraction.
4. Linkedin Data Scraper (Rating: 8.5/10)
Enter Linkedin Data Scraper API, your secret weapon for scraping profiles faster than a recruiter on Red Bull. Extract experience, education, certifications and courses among many other with the ease of a point-and-click adventure, and leave the data hoarding to the API. Let's dive in and turn your prospecting pan into a data-powered gold mine! โ๏ธ
Here's how you can use it!๐ฅ
import requests
import time
url = "https://linkedin-data-scraper.p.rapidapi.com/person"
payload = { "link": 'https://www.linkedin.com/in/pradipnichite/' }
headers = {
"content-type": "application/json",
"X-RapidAPI-Key": 'Your Rapid API_KEY', # Put your API Key here
"X-RapidAPI-Host": "linkedin-data-scraper.p.rapidapi.com"
}
start_time = time.time()
response = requests.post(url, json=payload, headers=headers)
end_time = time.time()
latency = end_time - start_time
print("Time taken:", latency)
data = response.json()
print(data)
Response:
{'success': True,
'status': 200,
'data': {'data': {'firstName': 'Pradip',
'lastName': 'Nichite',
'fullName': 'Pradip Nichite',
'publicIdentifier': 'pradipnichite',
'headline': 'Top Rated Plus - NLP Freelancer | Custom NLP Solutions | GPT-4 | AI Demos',
'associatedHashtags': 'Talks about #nlp, #gpt3, #freelancing, #machinelearning, and #artificialintelliegence',
'connections': 1,
'followers': 21286,
'emailRequired': False,
'creatorWebsite': {'name': 'My Youtube Channel ',
'link': 'https://www.youtube.com/c/PradipNichiteAI'},
'openConnection': True,
'urn': 'ACoAAA0aCz0B_r8k5MLp8w-N_giV2qCoIIYco6w',
'updates': [{'postText': "I've earned over $100K ๐ฐ and achieved Expert-Vetted status (top 1% ๐) on Upwork, specializing in #NLP and #generativeai . Interested in NLP or generative AI? \n\nExplore my YouTube tutorials ๐บ, featuring a range of topics: Hugging Face Transformers, SentenceTransformers, OpenAI's ChatGPT, GPT-4, various LLM libraries including LangChain and LlamaIndex, Vector databases like Pinecone and Chroma DB, plus insights on deploying LLM applications ๐\n\nChannel: https://lnkd.in/dR5x4A3y\n\nNLP Roadmap 2023: Step-by-Step Guide with Resources\nhttps://lnkd.in/gYw59y4T\n\nLearn How to use Hugging face Transformers Library\nhttps://lnkd.in/gf-j-CXr\n\nFine Tune Transformers Model like BERT on Custom Dataset.\nhttps://lnkd.in/gAdbr-9T\n\nSentence Transformers: Sentence Embedding, Sentence Similarity, Semantic Search and Clustering\nhttps://lnkd.in/gispjP44\n\nVector Database Beginer hands on Tutorial\nhttps://lnkd.in/gB5pVnac\n\nSemantic Search with Open-Source Vector DB: Chroma DB | Pinecone Alternative\nhttps://lnkd.in/gWTtzBC5\n\nBuilding a Document-based Question Answering System with LangChain, Pinecone, and LLMs like GPT-4.\nhttps://lnkd.in/gvsSFptJ\n\nChatbot Answering from Your Own Knowledge Base: Langchain, ChatGPT, Pinecone, and Streamlit\nhttps://lnkd.in/gx2WBatQ\n\nLangChain, SQL Agents & OpenAI LLMs: Query Database Using Natural Language\nhttps://lnkd.in/gdeygpUb\n\nMastering LlamaIndex : Create, Save & Load Indexes, Customize LLMs, Prompts & Embeddings\nhttps://lnkd.in/gfkqWNH4\n\nNL2SQL with LlamaIndex: Querying Databases Using Natural Language\nhttps://lnkd.in/g6v3a6MG\n\nUsing OpenAI's ChatGPT API to Build a Conversational AI Chatbot\nhttps://lnkd.in/g58GWQyM\n\nBuild Chatbot using OpenAI's Latest Assistants API - A Beginner's Guide\nhttps://lnkd.in/ghRcZgnF\n\nOpenAI Function Calling Explained: Chat Completions & Assistants API\nhttps://lnkd.in/gSBY6spe\n\nFine-Tuning GPT-3.5 on Custom Dataset: A Step-by-Step Guide\nhttps://lnkd.in/gDCcNBnz\n\nDeploy FastAPI & Open AI ChatGPT on AWS EC2: A Comprehensive Step-by-Step Guide\nhttps://lnkd.in/gCFXWT_r\n\nDeploy GPT Streamlit App on AWS EC2 | OpenAI | AWS Tutotrials\nhttps://lnkd.in/gqMZASjn",
'image': 'https://media.licdn.com/dms/image/sync/D4D27AQEBngiDneaqxQ/articleshare-shrink_800/0/1701979150910?e=1703750400&v=beta&t=RbrowfFjx78SD_HZwYYG-fi3gxw2hQbOZS1ox-cSHfk',
'postLink': 'https://www.linkedin.com/feed/update/urn:li:activity:7133477037373104128?updateEntityUrn=urn%3Ali%3Afs_feedUpdate%3A%28V2%2Curn%3Ali%3Aactivity%3A7133477037373104128%29',
'numLikes': 551,
'numComments': 22,
'reactionTypeCounts': [{'count': 510, 'reactionType': 'LIKE'},
{'count': 17, 'reactionType': 'PRAISE'},
{'count': 13, 'reactionType': 'EMPATHY'},
{'count': 7, 'reactionType': 'INTEREST'},
{'count': 4, 'reactionType': 'APPRECIATION'}]},
{'postText': "๐Wow, it's an honor to have been featured on Darshil Parmar's podcast! I'll never forget how his tips on writing proposals helped shape my journey as a data science freelancer in my early days. \n\n๐ I'm grateful for the opportunity to share my story and pay it forward.\n\n๐",
'image': 'https://media.licdn.com/dms/image/sync/D4D27AQHlqqpakeq6Pg/articleshare-shrink_800/0/1702179391806?e=1703750400&v=beta&t=4nZLOx2bJBnkSYUIRXYioOr1t4wBtylwyjD3_ZHVOpc',
...
With a latency of 1,897ms, this API can extract basic features like experiences, education, licenseAndCenrtificates, honorsAndAwards and languages among other fields in real-time. Priced at $25.00/month for 20,000 requests per month for the Pro subscription, this can also scrape the skills and featured posts of the target user.
Comparative Analysis:
Parameters | ProxyCurl | Fresh LinkedIn Profile Data | Scraping-bot.io | Linkedin Data Scraper |
Speed | 2,648ms | 4,031ms | 22,2990ms(also includes 5000s sleep time and varies according to different user profiles) | 1,897ms |
Recency | Does not scrape in real time. | Real-time scraping | Real-time scraping | Real-time scraping |
Pricing | $0.020/credit | $45.00/month for 3000 requests per month | โฌ0.00039 per scrape | $25.00/month for 20,000 requests per month |
Accuracy | 10/10 | 10/10 | 10/10 | 10/10 |
Completeness | 8/10 | 7.5/10 | 9/10 | 8.5/10 |
Comments | Not all skills are included, and posts are not shown. The projects of the target user are displayed. | Skills or top skills are shown. Posts are not shown. | Skills or top skills are not displayed. The system tracks user activities over the last approximately four days, such as sharing or liking posts. It also displays the projects of the target user. Note: Scraping-bot.io retrieves the titles of user posts, their reactions, and shares to other posts. The content of the posts can be obtained by modifying the query. | Skills shown but not all are included. Posts are not shown. Featured posts are shown. |
Applications:
Some notable use cases for data scraped from LinkedIn profiles using APIs can be:
1. Lead Generation and Sales:
Identify potential customers: Target individuals matching your ideal customer profile based on their job titles, industries, skills, and interests.
Personalize outreach: Craft highly relevant messages that resonate with their professional needs and aspirations.
2. Recruitment and Talent Acquisition:
Build talent pools: Create a database of potential candidates for future roles, saving time and effort in the long run.
Analyze candidate trends: Identify in-demand skills, experience levels, and salary expectations within specific industries or regions.
3. Competitive Intelligence:
Understand competitor strengths and weaknesses: Analyze their employee profiles to uncover expertise, growth areas, and potential vulnerabilities.
Track industry thought leaders: Identify influential individuals within your industry to stay abreast of key trends and developments.
4. Learning and Development:
Identify skill gaps: Assess employees' existing skills and compare them to industry trends to identify areas for training and development.
Connect employees with mentors and experts: Leverage LinkedIn's vast network to facilitate knowledge sharing and professional development opportunities.
Conclusion:
So, there you have it! From unearthing hidden skills with LLMs to mining real-time data with Proxycurl or Scraping-bot.io, the LinkedIn API landscape offers a treasure trove of insights for savvy prospectors. Remember, choose the right tool for the job, wield it responsibly, and unlock the true potential of your social media data for success. For further reading, you can refer to this: https://github.com/PradipNichite/FutureSmart-AI-Blog/tree/main/Linkedin%20Scraping
If your company is looking to embark on a similar journey of transformation and you're in need of a tailored NLP solution, we're here to guide you every step of the way. Our team at FutureSmart AI specializes in crafting custom NLP applications, including generative NLP, RAG, and ChatGPT integrations, tailored to your specific needs.
For a practical guide to using ProxyCurl API, feel free to refer to this video: Best Way to Scrape LinkedIn Profiles with ProxyCurl API | Python | Code
Say goodbye to manual data scrapingโunlock automation, efficiency, and a deeper understanding of your target audience through LinkedIn Scraping APIs. Reach out to us at contact@futuresmart.ai, and let's discuss how we can build a smarter, more efficient LinkedIn scraping system for personalized outreach and growth, driving your business ahead. Join the ranks of forward-thinking companies leveraging the best of AI, and witness the difference for yourself!
Stay Connected with FutureSmart AI for the Latest in AI Insights - FutureSmart AI
Eager to stay informed about the cutting-edge advancements and captivating insights in the field of AI? Explore AI Demos, your ultimate destination for staying abreast of the newest AI tools and applications. AI Demos serves as your premier resource for education and inspiration. Immerse yourself in the future of AI today by visiting aidemos.com.
Happy scraping!๐
Subscribe to my newsletter
Read articles from Trishanu Das directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by