Stop my temporary table from filtering out data unprompted


Question:
I'm aggregating US county cost of living data, but I realized my temporary table is only returning rows for families without kids for some reason. For every county, there's a row for families from no kids to 4 kids for both 1 and 2 parent homes. So 10 rows per county.
Earlier on to test something I did have a 0 child family filter in the 2nd SELECT at the bottom, but its long gone and the sessions restarted. I have quintuple checked that the original dataset HAS data with kids and is unscathed, and I've also tried adding the following:
WHERE CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) > 0
OR CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) < 1 ;
But to no avail. Family information in the original data is a string where X parents and Y kids is displayed as "XpYc"
This is all the relevant code for this particular temp table:
--Create temp table to manipulate
CREATE OR REPLACE TEMP TABLE temp_us_col AS
SELECT *
FROM `capstone-project-456214.us_col_county.us_cost_of_living_county`
WHERE CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) > 0 OR CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) < 1 ;
--this where statement wasn't in the original code
--add columns for family count and whether or not they have children
ALTER TABLE temp_us_col
ADD COLUMN family_count INT64,
ADD COLUMN children INT64;
--updates columns with family counts and whether they have children or not
UPDATE temp_us_col
SET
family_count = CAST(REGEXP_EXTRACT(family_member_count, r'(\d+)p') AS INT64) +
CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64),
children = CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64)
WHERE TRUE;
SELECT * FROM temp_us_col
I've also tried changing the where statement before the 2nd Select
, but still nothing. I'm unsure what else to try. When I have a where statement to filter for just the families WITH kids it only gets the families with 1 or 2 kids which confuses me more. Is there a glaring issue I'm missing?
Answer:
From the explanation, here are the issues:
1. Logical issue in WHERE clause
Your WHERE clause:
\> 0 OR < 1
This is always true except when the extracted value is NULL. So effectively, you're just excluding rows where the REGEXP_EXTRACT() fails to match.
2. REGEXP_EXTRACT() returns NULL on no match
If the regex pattern doesn’t match the string, REGEXP_EXTRACT() returns NULL, and any math or comparison with NULL will also be NULL (i.e., filtered out).
If your pattern doesn’t match p(\d+)c exactly, the child count becomes NULL, and those rows get silently skipped.
3. Regex misalignment
You were extracting parent counts using
REGEXP_EXTRACT(family_member_count, r'(\d+)p')
That might work, but for better precision, use
REGEXP_EXTRACT(family_member_count, r'^(\d+)p')
To explicitly anchor the match to the beginning of the string, the following contains a clean and safe version of your code that:
- Uses SAFE_CAST() to avoid NULL-related errors
- Extracts parents and children correctly
- Computes total family count
- Flags whether a family has children
Step 1: Create temp table with extracted fields
CREATE OR REPLACE TEMP TABLE temp_us_col AS
SELECT *,
SAFE_CAST(REGEXP_EXTRACT(family_member_count, r'^(\d+)p') AS INT64) AS num_parents,
SAFE_CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) AS num_children
FROM capstone-project-456214.us_col_county.us_cost_of_living_county
WHERE REGEXP_CONTAINS(family_member_count, r'\dp\dc'); -- Ensure valid format
Step 2: Add computed columns
ALTER TABLE temp_us_col
ADD COLUMN family_count INT64,
ADD COLUMN has_children BOOL;
Step 3: Populate them
UPDATE temp_us_col
SET family_count = num_parents + num_children,
has_children = num_children > 0
WHERE TRUE;
Step 4: View your results
SELECT * FROM temp_us_col
ORDER BY county_name, family_member_count;
Note
Use SAFE_CAST() and REGEXP_CONTAINS() to protect against regex match failures. Be cautious with logical expressions that rely on imperfect extraction. Consider normalizing the schema long-term to avoid parsing strings like 2p3c.
If you're frequently manipulating structured strings or temp tables like this, a visual tool like dbForge Data Compare can really help.
Subscribe to my newsletter
Read articles from Gabriella Barajas directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Gabriella Barajas
Gabriella Barajas
A database Q&A blog for those who’d rather debug queries than read documentation. Practical answers to real questions — mostly SQL Server and MySQL, with a pinch of sarcasm and a love for clean code.