The Chanakya - An Intelligent Fake News Detector
Overview
During the KAVACH 2023 National-Level Cybersecurity Hackathon, a cutting-edge solution for identifying fake news was created. The project addressed a problem statement presented by India's Ministry of Education's Innovation Cell.
Problem Statement
Design and develop a technological solution/software tool for Tracking and tracing Fake News and its origin using official sources as the input filter. The solution should have a mechanism to mitigate the impact of the spread of Fake News by auto-populating the fake news spreaders’ inboxes with the official/authenticated news content.
Solution
Using a supervised learning approach for labeled datasets
For real-time dataset
Fetch news articles with Bing News API based on user queries.
Preprocess text for TF-IDF vectorization.
Calculate TF-IDF and cosine similarity to measure relevance.
Combine sentiment analysis with relevance score.
Sort articles by combined score, providing top results
Why not use a synthetic dataset?
Lack of realism: Synthetic data might not perfectly capture the complexities and variations present in real-world data, potentially leading to a gap in performance when the model is applied to real data.
Transferability: A model trained on synthetic data might struggle to generalize well to real-world data due to the differences between the two domains.
Overfitting: Depending on how the synthetic data is generated, there's a risk that the model could overfit to the synthetic data distribution and not perform well on unseen real data.
The major problem with follwing supervised learnig approach with labeled dataset is lake of real-time dataset.
We have introduced crowd-polling methodology for preparing real-time dataset. plz refer below attachements and images for better understanding.
Capabilities of Google's Bard (LLM)
Ability to detect news accurately: It can be used to quickly identify the key points of an article or document. This can be helpful for fake news detection.
Able to crawl all over the internet: It is the only autonomous mechanism to collect real-time data.
Large language model: Bard is a large language model, which means it has been trained on a massive dataset of text and code. This allows Bard to understand the nuances of human language and to spot inconsistencies and errors in text that might be indicative of fake news.
Technology stack
Benefits
Supporting real-time fact-checking
Accurate information dissemination
Safeguarding public safety
Preserving trust in media
Assisting citizens in determining whether the news is credible or not
Glimpses of solution
Take a note
The Team Pratyagam welcomes any suggestion or discussion for the project. Always ready to collaborate 🚀🤝.
Subscribe to my newsletter
Read articles from Vaibhav Chauhan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Vaibhav Chauhan
Vaibhav Chauhan
I am a full-stack developer from India, ML & Cloud enthusiast. always looking forward to solving real-life problems.