The Chanakya - An Intelligent Fake News Detector

Vaibhav ChauhanVaibhav Chauhan
3 min read

Overview

During the KAVACH 2023 National-Level Cybersecurity Hackathon, a cutting-edge solution for identifying fake news was created. The project addressed a problem statement presented by India's Ministry of Education's Innovation Cell.


Problem Statement

Design and develop a technological solution/software tool for Tracking and tracing Fake News and its origin using official sources as the input filter. The solution should have a mechanism to mitigate the impact of the spread of Fake News by auto-populating the fake news spreaders’ inboxes with the official/authenticated news content.


Solution

  • Using a supervised learning approach for labeled datasets

  • For real-time dataset

    • Fetch news articles with Bing News API based on user queries.

    • Preprocess text for TF-IDF vectorization.

    • Calculate TF-IDF and cosine similarity to measure relevance.

    • Combine sentiment analysis with relevance score.

    • Sort articles by combined score, providing top results

Why not use a synthetic dataset?

  • Lack of realism: Synthetic data might not perfectly capture the complexities and variations present in real-world data, potentially leading to a gap in performance when the model is applied to real data.

  • Transferability: A model trained on synthetic data might struggle to generalize well to real-world data due to the differences between the two domains.

  • Overfitting: Depending on how the synthetic data is generated, there's a risk that the model could overfit to the synthetic data distribution and not perform well on unseen real data.

The major problem with follwing supervised learnig approach with labeled dataset is lake of real-time dataset.

We have introduced crowd-polling methodology for preparing real-time dataset. plz refer below attachements and images for better understanding.

Capabilities of Google's Bard (LLM)

  1. Ability to detect news accurately: It can be used to quickly identify the key points of an article or document. This can be helpful for fake news detection.

  2. Able to crawl all over the internet: It is the only autonomous mechanism to collect real-time data.

  3. Large language model: Bard is a large language model, which means it has been trained on a massive dataset of text and code. This allows Bard to understand the nuances of human language and to spot inconsistencies and errors in text that might be indicative of fake news.


Technology stack


Benefits

  • Supporting real-time fact-checking

  • Accurate information dissemination

  • Safeguarding public safety

  • Preserving trust in media

  • Assisting citizens in determining whether the news is credible or not


Glimpses of solution


Take a note
The project is currently in development phase. so we are finding efficient Large Language model methods to make the system enough intelligent to surf the internet and decide the truth of the news.

The Team Pratyagam welcomes any suggestion or discussion for the project. Always ready to collaborate 🚀🤝.

3
Subscribe to my newsletter

Read articles from Vaibhav Chauhan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vaibhav Chauhan
Vaibhav Chauhan

I am a full-stack developer from India, ML & Cloud enthusiast. always looking forward to solving real-life problems.