Earlier this year, I read a fantastic blog/newsletter by Jack Vanlightly about contextual data quality. He shared his thoughts on a research paper “Big data quality framework”. The original authors argue that we often focus on data quality metrics like completeness and consistency but overlook contextual data quality issues. For example, in healthcare, a patient visit record might have perfect data quality, valid patient ID, correct ICD-10 code format (Z34.90 for pregnancy care), proper visit date, and complete provider information. However, if that pregnancy care code is assigned to a 67-year-old male patient, there's a contextual data quality issue that traditional validation completely misses. Your EMR system might show green across all data quality dashboards while simultaneously creating impossible medical scenarios that could lead to incorrect treatments, insurance claim rejections, and regulatory reporting errors. This is where LLMs are great, they understand that pregnancy codes and elderly male patients don't make clinical sense together, catching the semantic inconsistencies that rigid rule based validation never could.

I highly encourage you to read his blog and subscribe to his newsletter "Humans of the Data Sphere.”

Ref: Big data quality framework: a holistic approach to continuous quality management | Journal of Big Data | Full Text

In this multi-part series, I will try to explore how AI/LLMs can possibly be used along with the rule-based DQ checks. In this first blog, I will focus on the first pass analysis, getting it set up using AI Functions in Microsoft Fabric and in the following blogs, I will operationalize it to create a robust evaluation framework.

Fabric AI Functions

I have written about AI Functions before. It allows you to use AI seamlessly in your data engineering applications with single line of code. In this introductory blog, I will show how we can use AI Functions for checking contextual DQ issues.

Data

The Consumer Financial Protection Bureau (CFPB) has comprehensive data on consumer complaints about financial products and services. It includes details like the product, sub-product, issue, complaint description, and more. When consumers log complaints, they select the product, sub-product, and issue categories. On the website, trends are shown for each product, sub-product, and issue based on the categories chosen by consumers. However, the website does not provide definitions for these categories, making it easy for consumers to assign their complaints to the wrong category. So, even though the data might be complete, the category assigned may not match the issue described which will also lead to incorrect trend analysis.

I downloaded the last 6 months data from the website (you can use API as well) with complaint narrative (~490K records).

To analyze the contextual DQ issues, we will:

Use LLM to review the Product, Sub-product, complaint and analyze if the complaint matches the product & sub-product assigned by the user
Return true or false based on the analysis
Return the suggested category
Return issue type (product mismatch, sub product mismatch or both)
Brief explanation
Flag for human evaluation if the LLM is unsure

As I mentioned above, this will be a first pass analysis and in the following blogs I will refine with evaluation harness to improve the process.

Prompt

The prompt follows the similar template I have used in previous blogs - instructions, examples, guidelines with tags and return the JSON in defined schema.

<INSTRUCTIONS>
You are a data quality expert analyzing CFPB consumer complaints for product categorization accuracy.
Your task is to determine if the complaint narrative matches the assigned product/sub-product categories.
</INSTRUCTIONS>

<VALID_PRODUCTS_SUBPRODUCTS>
Credit reporting or other personal consumer reports:
- Credit reporting
- Other personal consumer report

Debt collection:
- I do not know
- Credit card debt
- Other debt
- Telecommunications debt
- Rental debt
- Medical debt
- Auto debt
- Payday loan debt
- Federal student loan debt
- Private student loan debt
- Mortgage debt

Checking or savings account:
- Checking account
- Other banking product or service
- Savings account
- CD (Certificate of Deposit)

Credit card:
- General-purpose credit card or charge card
- Store credit card

Money transfer, virtual currency, or money service:
- Domestic (US) money transfer
- Mobile or digital wallet
- Virtual currency
- International money transfer
- Money order, traveler's check or cashier's check
- Check cashing service
- Foreign currency exchange

Student loan:
- Federal student loan servicing
- Private student loan

Mortgage:
- Conventional home mortgage
- FHA mortgage
- VA mortgage
- Home equity loan or line of credit (HELOC)
- Other type of mortgage
- USDA mortgage
- Manufactured home loan
- Reverse mortgage

Vehicle loan or lease:
- Loan
- Lease

Payday loan, title loan, personal loan, or advance loan:
- Installment loan
- Payday loan
- Personal line of credit
- Title loan
- Other advances of future income
- Earned wage access
- Pawn loan
- Tax refund anticipation loan or check

Prepaid card:
- General-purpose prepaid card
- Government benefit card
- Gift card
- Payroll card
- Student prepaid card

Debt or credit management:
- Debt settlement
- Credit repair services
- Mortgage modification or foreclosure avoidance
- Student loan debt relief
</VALID_PRODUCTS_SUBPRODUCTS>

<EXPECTED_OUTPUT>
{
"is_correctly_categorized": true/false,
"issue_type": "correct|product_mismatch|subproduct_mismatch|both_incorrect",
"explanation": "brief explanation",
"suggested_product": "correct product if wrong",
"suggested_subproduct": "correct sub-product if wrong",
"should_review": true/false
}
</EXPECTED_OUTPUT>

<REVIEW_CRITERIA>
Set should_review to true if:
- The complaint narrative is ambiguous or could fit multiple categories
- The technical/financial terminology is unclear or inconsistent
- Multiple financial products are mentioned making categorization difficult
- The complaint lacks sufficient detail to make a confident assessment
- You are uncertain about the correct categorization
</REVIEW_CRITERIA>

Analyze the complaint and return ONLY valid JSON.

Code

In my case, I used the Python notebook. However, if you use the Pyspark notebook, as announced recently, you do not need to install any libraries. AI Functions is now part of the Fabric RT1.3.


%pip install -q --force-reinstall openai==1.30 2>/dev/null

%pip install -q --force-reinstall https://mmlspark.blob.core.windows.net/pip/1.0.12-spark3.5/synapseml_core-1.0.12.dev1-py2.py3-none-any.whl 2>/dev/null

%pip install -q --force-reinstall https://mmlspark.blob.core.windows.net/pip/1.0.12.2-spark3.5/synapseml_internal-1.0.12.2.dev1-py2.py3-none-any.whl 2>/dev/null

import synapse.ml.aifunc as aifunc
import pandas as pd
import openai
from synapse.ml.aifunc import Conf

cols = ['Date received','Product','Sub-product','Issue','Sub-issue','Consumer complaint narrative', 'Complaint ID']
df = pd.read_csv("/lakehouse/default/Files/dq_aifunction/complaints-2025-08-01_12_34.csv", usecols =cols)
df['Date received'] = pd.to_datetime(df['Date received'],errors='coerce')
df = df.sort_values(by = ['Date received'], ascending=False)

Steps:

Pre process the data
Analyze the categories using AI functions
Parse JSON
Prepare output df

import pandas as pd
import json
import re

def clean_narrative_text(text):
 if pd.isna(text):
     return ""
 cleaned = re.sub(r'XXXX+', 'XXXX', str(text))
 cleaned = re.sub(r'\s+', ' ', cleaned).strip()
 return cleaned

def check_qlty_prmpt():
 prompt = """
<INSTRUCTIONS>
You are a data quality expert analyzing CFPB consumer complaints for product categorization accuracy.
Your task is to determine if the complaint narrative matches the assigned product/sub-product categories.
</INSTRUCTIONS>

<VALID_PRODUCTS_SUBPRODUCTS>
Credit reporting or other personal consumer reports:
- Credit reporting
- Other personal consumer report

Debt collection:
- I do not know
- Credit card debt
- Other debt
- Telecommunications debt
- Rental debt
- Medical debt
- Auto debt
- Payday loan debt
- Federal student loan debt
- Private student loan debt
- Mortgage debt

Checking or savings account:
- Checking account
- Other banking product or service
- Savings account
- CD (Certificate of Deposit)

Credit card:
- General-purpose credit card or charge card
- Store credit card

Money transfer, virtual currency, or money service:
- Domestic (US) money transfer
- Mobile or digital wallet
- Virtual currency
- International money transfer
- Money order, traveler's check or cashier's check
- Check cashing service
- Foreign currency exchange

Student loan:
- Federal student loan servicing
- Private student loan

Mortgage:
- Conventional home mortgage
- FHA mortgage
- VA mortgage
- Home equity loan or line of credit (HELOC)
- Other type of mortgage
- USDA mortgage
- Manufactured home loan
- Reverse mortgage

Vehicle loan or lease:
- Loan
- Lease

Payday loan, title loan, personal loan, or advance loan:
- Installment loan
- Payday loan
- Personal line of credit
- Title loan
- Other advances of future income
- Earned wage access
- Pawn loan
- Tax refund anticipation loan or check

Prepaid card:
- General-purpose prepaid card
- Government benefit card
- Gift card
- Payroll card
- Student prepaid card

Debt or credit management:
- Debt settlement
- Credit repair services
- Mortgage modification or foreclosure avoidance
- Student loan debt relief
</VALID_PRODUCTS_SUBPRODUCTS>

<EXPECTED_OUTPUT>
{
"is_correctly_categorized": true/false,
"issue_type": "correct|product_mismatch|subproduct_mismatch|both_incorrect",
"explanation": "brief explanation",
"suggested_product": "correct product if wrong",
"suggested_subproduct": "correct sub-product if wrong",
"should_review": true/false
}
</EXPECTED_OUTPUT>

<REVIEW_CRITERIA>
Set should_review to true if:
- The complaint narrative is ambiguous or could fit multiple categories
- The technical/financial terminology is unclear or inconsistent
- Multiple financial products are mentioned making categorization difficult
- The complaint lacks sufficient detail to make a confident assessment
- You are uncertain about the correct categorization
</REVIEW_CRITERIA>

Analyze the complaint and return ONLY valid JSON.
"""
 return prompt

def analyze_category(df, sample_size=None):
 if sample_size:
     sample_df = df.sample(min(sample_size, len(df)), random_state=42).copy()
 else:
     sample_df = df.copy()

 sample_df['cleaned_narrative'] = sample_df['Consumer complaint narrative'].apply(clean_narrative_text)
 sample_df = sample_df[sample_df['cleaned_narrative'].str.len() > 10].copy()

 sample_df['analysis_input'] = sample_df.apply(lambda row: 
     f"Current Product: {row['Product']}\n"
     f"Current Sub-product: {row['Sub-product']}\n" 
     f"Issue: {row['Issue']}\n"
     f"Complaint: {row['cleaned_narrative'][:800]}..."
 , axis=1)

 prompt = check_qlty_prmpt()
 sample_df['llm_analysis'] = sample_df[['analysis_input']].ai.generate_response(prompt, , conf=Conf(seed=0, max_concurrency=25))

 return sample_df

def parse_json(df):
 def parse_json_response(response_text):
     try:
         json_match = re.search(r'\{.*\}', response_text, re.DOTALL)
         if json_match:
             json_str = json_match.group(0)
             return json.loads(json_str)
         return None
     except:
         return None

 df['parsed_analysis'] = df['llm_analysis'].apply(parse_json_response)
 df['analysis_valid'] = df['parsed_analysis'].notna()

 valid_df = df[df['analysis_valid']].copy()

 if len(valid_df) > 0:
     valid_df['is_correctly_categorized'] = valid_df['parsed_analysis'].apply(
         lambda x: x.get('is_correctly_categorized', None) if x else None)
     valid_df['issue_type'] = valid_df['parsed_analysis'].apply(
         lambda x: x.get('issue_type', None) if x else None)
     valid_df['explanation'] = valid_df['parsed_analysis'].apply(
         lambda x: x.get('explanation', None) if x else None)
     valid_df['suggested_product'] = valid_df['parsed_analysis'].apply(
         lambda x: x.get('suggested_product', None) if x else None)
     valid_df['suggested_subproduct'] = valid_df['parsed_analysis'].apply(
         lambda x: x.get('suggested_subproduct', None) if x else None)
     valid_df['should_review'] = valid_df['parsed_analysis'].apply(
         lambda x: x.get('should_review', None) if x else None)

 return valid_df

def check_quality(df, sample_size=None):
 if sample_size:
     print(f"Analyzing {sample_size} complaints for categorization:")
 else:
     print(f"Analyzing all {len(df)} complaints for categorization:")

 analyzed_df = analyze_category(df, sample_size)
 results_df = parse_json(analyzed_df)

 output_cols = [
     'Date received', 'Product', 'Sub-product', 'Issue', 
     'Consumer complaint narrative', 'is_correctly_categorized',
     'issue_type', 'explanation', 'suggested_product', 'suggested_subproduct',
     'should_review'
 ]

 available_cols = [col for col in output_cols if col in results_df.columns]
 final_df = results_df[available_cols].copy()

 print(f"{len(final_df)} complaints analyzed.")
 print(f"Incorrectly categorized: {(~final_df['is_correctly_categorized']).sum()}")
 print(f"Flagged for review: {final_df['should_review'].sum()}")

 return final_df

## first 250 complaints for demo purposes
results_df = check_quality(df.head(250)) 
display(results_df)

Output:

Out of the 250 sample complaints analyzed, 113 were labeled as incorrectly categorized and 64 were flagged for further human evaluation. Thus we can see that we can “potentially” use AI Functions for catching data quality issues. I said potentially because more evaluation needs to be done (future blog) to use this effectively and reduce errors.

Let’s look at the very first example:

Complaint:

I am filing this new complaint because the debt collector, First Financial XXXX XXXX, and the creditor, XXXX XXXX ( XXXX XXXX, XXXX ), have provided me with debt verification documents that are factually inaccurate. This is about my previous, closed complaint ( Complaint ID # [ XXXX ] ). The 'Vehicle Valuation Report ' sent to me as verification of the debt is based on an incorrect vehicle mileage of XXXX miles. This is false. The official rental agreement ( # XXXX ) confirms that the vehicle 's actual mileage at the time of rental was XXXX miles. This discrepancy of over XXXX miles has artificially and significantly inflated the vehicle 's valuation, which is the entire basis for the debt they claim I owe. By knowingly or negligently providing a valuation based on false information and continuing to demand payment based on it, they are making false representations in an attempt to collect a debt. This is a serious issue that goes beyond a simple dispute over methods. I have sent a formal letter disputing this invalid verification and demand that this illegitimate debt be waived.

The user submitted this issue under Debt Collection → Other Debt whereas AI Functions categorized as Debt Collection → Auto Debt. This complaint is clearly about debt related a vehicle.

Some complaints were flagged for further review, e.g.

I canceled a Google Play subscription on XX/XX/XXXX through my XXXX device, but I was still charged on XX/XX/XXXX. I did not receive the service and have not received a refund or the service restored. I attempted to resolve the issue through Google Play support, but my request was denied or unresolved. They sent back several emails saying my card info had to be updated but it has been the same info since I uploaded the card which covers XX/XX/XXXX. Further, they said that restoring the subscription would have to come from the app developer, XXXX XXXX. XXXX XXXX rebutted this and said XXXX handles subscriptions. What I want : I want a full refund for the charge and for my cancellation to be properly, and promptly, honored.

This was categorized by the user as Money transfer, virtual currency, money service → digital wallet. However, there complaints is about unauthorized subscription, credit card billing rather than money transfer or bitcoin etc. One could argue that Google play uses Google wallet and hence digital wallet is the right option. So not all categorizations are straightforward and LLM can identify complaints that should be reviewed further and thus improving the DQ.

This was one example, but this approach can also be used for tabular data to catch inconsistencies and errors. One of the very first Power BI report I ever built 10 years ago was actually for catching errors Oracle ERP system. We saw more than 50% of the codes assigned were incorrect leading to delayed product shipments, incorrect processing of orders, warranty claims etc. I think using AI Functions for such tasks is a very viable solution.

Challenges

As always, we need to remember that, like rule-based checks, LLMs cannot be completely predictable. We can manage their behavior to some degree and measure their accuracy, but it's important to always verify and validate by creating baselines and evaluation tools that you can improve over time. I will write about in future blogs.

References

Transform and enrich data seamlessly with AI functions - Microsoft Fabric | Microsoft Learn

Submit a complaint | Consumer Financial Protection Bureau

About Me — Jack Vanlightly

Unstructured to Structured : Extracting Data From Messy Excel Sheets Using Fabric AI Function

Big data quality framework: a holistic approach to continuous quality management | Journal of Big Data | Full Text

Unstructured To Structured : Using Fabric AI Functions For Contextual Data Quality Check