Stop my temporary table from filtering out data unprompted

Question:

I'm aggregating US county cost of living data, but I realized my temporary table is only returning rows for families without kids for some reason. For every county, there's a row for families from no kids to 4 kids for both 1 and 2 parent homes. So 10 rows per county.

Earlier on to test something I did have a 0 child family filter in the 2nd SELECT at the bottom, but its long gone and the sessions restarted. I have quintuple checked that the original dataset HAS data with kids and is unscathed, and I've also tried adding the following:

WHERE CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) > 0 
OR CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) < 1 ;

But to no avail. Family information in the original data is a string where X parents and Y kids is displayed as "XpYc"

This is all the relevant code for this particular temp table:

--Create temp table to manipulate
CREATE OR REPLACE TEMP TABLE temp_us_col AS
SELECT *
FROM `capstone-project-456214.us_col_county.us_cost_of_living_county`
WHERE CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) > 0 OR CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) < 1 ;
--this where statement wasn't in the original code

--add columns for family count and whether or not they have children
ALTER TABLE temp_us_col
    ADD COLUMN family_count INT64,
    ADD COLUMN children INT64;

--updates columns with family counts and whether they have children or not
UPDATE temp_us_col
SET 
  family_count = CAST(REGEXP_EXTRACT(family_member_count, r'(\d+)p') AS INT64) +
                 CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64),
  children = CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64)
WHERE TRUE;

SELECT * FROM temp_us_col

I've also tried changing the where statement before the 2nd Select, but still nothing. I'm unsure what else to try. When I have a where statement to filter for just the families WITH kids it only gets the families with 1 or 2 kids which confuses me more. Is there a glaring issue I'm missing?

Answer:

From the explanation, here are the issues:

1. Logical issue in WHERE clause

Your WHERE clause:

\> 0 OR < 1

This is always true except when the extracted value is NULL. So effectively, you're just excluding rows where the REGEXP_EXTRACT() fails to match.

2. REGEXP_EXTRACT() returns NULL on no match

If the regex pattern doesn’t match the string, REGEXP_EXTRACT() returns NULL, and any math or comparison with NULL will also be NULL (i.e., filtered out).

If your pattern doesn’t match p(\d+)c exactly, the child count becomes NULL, and those rows get silently skipped.

3. Regex misalignment

You were extracting parent counts using

REGEXP_EXTRACT(family_member_count, r'(\d+)p')

That might work, but for better precision, use

REGEXP_EXTRACT(family_member_count, r'^(\d+)p')

To explicitly anchor the match to the beginning of the string, the following contains a clean and safe version of your code that:

  • Uses SAFE_CAST() to avoid NULL-related errors
  • Extracts parents and children correctly
  • Computes total family count
  • Flags whether a family has children

Step 1: Create temp table with extracted fields

CREATE OR REPLACE TEMP TABLE temp_us_col AS 
SELECT *, 
   SAFE_CAST(REGEXP_EXTRACT(family_member_count, r'^(\d+)p') AS INT64) AS num_parents, 
   SAFE_CAST(REGEXP_EXTRACT(family_member_count, r'p(\d+)c') AS INT64) AS num_children 
FROM capstone-project-456214.us_col_county.us_cost_of_living_county 
WHERE REGEXP_CONTAINS(family_member_count, r'\dp\dc');  -- Ensure valid format

Step 2: Add computed columns

ALTER TABLE temp_us_col 
 ADD COLUMN family_count INT64, 
 ADD COLUMN has_children BOOL;

Step 3: Populate them

UPDATE temp_us_col 
SET family_count = num_parents + num_children, 
 has_children = num_children > 0 
WHERE TRUE;

Step 4: View your results

SELECT * FROM temp_us_col 
ORDER BY county_name, family_member_count;

Note
Use SAFE_CAST() and REGEXP_CONTAINS() to protect against regex match failures. Be cautious with logical expressions that rely on imperfect extraction. Consider normalizing the schema long-term to avoid parsing strings like 2p3c.

If you're frequently manipulating structured strings or temp tables like this, a visual tool like dbForge Data Compare can really help.

0
Subscribe to my newsletter

Read articles from Gabriella Barajas directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gabriella Barajas
Gabriella Barajas

A database Q&A blog for those who’d rather debug queries than read documentation. Practical answers to real questions — mostly SQL Server and MySQL, with a pinch of sarcasm and a love for clean code.