Custom Split Delimited String into Columns in SQL Server

Rajender SinghRajender Singh
2 min read

After ending my blog oraclebrains in 2009. I am excited to finally share my first post in years. I am motivated to continue writing and sharing my thoughts, experiences, and insights with others.

I wanted to share a fun little challenge that I have recently come across with the hope it may help someone.

During the recent Data Migration task, I faced a challenge while uploading files to the Database. The file names contain metadata that I must extract and process.

Data is stored in table ( RD_DocStore ) with structure as follows:

rawdata.RD_DocStore { DocID, DocName }

Sample Data Set for “RD_DocStore->DocName”, RD_DocStore->DocID*- Self Generated Surrogate Key.*

128978787_1272896_20240524_Care_Plan_Diagram_Form_1045645.PDF

12897234_1272688_20240623_Care_Plan_Diagram_104524348.PDF

1289789_12723436_20240704_Care_Form_104589.PDF

128946_127235_20220408_Care_Form_2023423__2045605234_104456.PDF

Need to Parse “DocName” as follows

Parsing Requirement

For each line(record):

1st box = Field No1, (this is easy got fixed position for ‘_’)
2nd box = Field No2, (this is easy got fixed position for ‘_’)
3rd box = Field No3, (this is easy got fixed position for ‘_’)
4th box = Field No4, (this is little bit complicated as ‘_’ is not fixed, can be any number of ‘_’ involved)
5th box = Field No5, (little complicated due to “Field No4” but ok)
6th box = Field No 6, (this is easy, got fixed start position after ‘.’), -- Not taking in consideration in my initial solution

First I got important index positions using the CHARINDEX function for all important characters which can help me achieve my objective, then I got reverse strings of the document name using REVERSE function as follows:

SELECT DocName
      ,CHARINDEX('_',([DocName]),  0) l1
      ,CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  0))+1) l2
      ,CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  0))+1))+1) l3
      ,CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  0))+1))+1))+1) l4
      ,REVERSE([DocName]) RvName
      ,CHARINDEX('.',REVERSE([DocName]),0) rv_start
      ,CHARINDEX('_',REVERSE([DocName]),0) rv_end
      ,CHARINDEX('.',([DocName]),  0) l5    
FROM [rawdata].[RD_DocStore]

Using above query as my inner query I was able to parse the “DocName” as follows:

SELECT docname
, SUBSTRING(docname,0, L1) value1
, SUBSTRING(docname,L1+1, L2-l1-1) value2
, SUBSTRING(docname,L2+1, L3-l2-1) value3
, SUBSTRING(docname,L3+1, L5-l3-(rv_end-rv_start+1)) value4
,REVERSE(SUBSTRING( RvName, rv_start+1, rv_end-rv_start-1)) value5
FROM (
SELECT TOP (1000) [DocID]
      ,[DocName]
      ,[DocExt]
         ,CHARINDEX('_',([DocName]),  0) l1
         ,CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  0))+1) l2
         ,CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  0))+1))+1) l3
         ,CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  (CHARINDEX('_',([DocName]),  0))+1))+1))+1) l4
         ,REVERSE([DocName]) RvName
         , CHARINDEX('.',REVERSE([DocName]),0) rv_start
         , CHARINDEX('_',REVERSE([DocName]),0) rv_end
         ,CHARINDEX('.',([DocName]),  0) l5    
  FROM [rawdata].[RD_DocStore]) a1

Result was as follows:

Sample Results

This initial rough solution became the basis for my final solution. Thank you for reading it, and I hope it will help somebody.

0
Subscribe to my newsletter

Read articles from Rajender Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rajender Singh
Rajender Singh