Change Duplicate Contents to NULL — From SQL to SPL #36

esProcesProc
2 min read

Problem Description & Analysis:

The first two columns of a certain database table have duplicate values, such as the 2nd-3rd records and the 1st record being duplicated below.

Task: Now we need to change all duplicate values to null. In other words, after grouping by the first two fields (or one of them), only the first record in the group remains unchanged, and the first two fields of other records are changed to null.

Code Comparisons:

SQL

SELECT CASE a_rn WHEN 1 THEN column_a END AS column_a,
       CASE b_rn WHEN 1 THEN column_b END AS column_b,
       column_c
FROM   (
  SELECT column_a,
         column_b,
         column_c,
         ROW_NUMBER() OVER (PARTITION BY column_a ORDER BY column_b, column_c)
           AS a_rn,
         ROW_NUMBER() OVER (PARTITION BY column_a, column_b ORDER BY column_c)
           AS b_rn
  FROM   table_name)

SQL does not have natural row numbers, nor does it have row numbers within a group, and can only generate them using window functions, making the code relatively cumbersome.

SPL has built-in row numbers, including row numbers within a group.

✨✅ Try.DEMO

A1: Load data.

A2: Group by Column_A, modify the data of each group, and when the row number within the group is not 1, change Column_A and Column_B to null; Finally, union the data from each group. # indicates the row number within the group.

If the data amount is not large, it is also possible not to group and union, and directly compare the Column_A of the current record with all previous Column_As. If the latter includes the former, then set Column_A and Column_B to null. The code is as follows:

\=A1.run(if(Column_A[:-1].contain(Column_A), (Column_A=Column_B=null)))

[: -1] represents the set from the first record to the previous record of the current record.


Powerful Data Processing Made Simple — Try It FREE 🚀👉🏻 esProc SPL FREE Download.

10
Subscribe to my newsletter

Read articles from esProc directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

esProc
esProc

esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine. FREE download👉🏻: https://www.esproc.com/download-esproc