Custom Split Delimited String into Columns in SQL Server
After ending my blog oraclebrains in 2009. I am excited to finally share my first post in years. I am motivated to continue writing and sharing my thoughts, experiences, and insights with others.
I wanted to share a fun little challenge that I have recently come across with the hope it may help someone.
During the recent Data Migration task, I faced a challenge while uploading files to the Database. The file names contain metadata that I must extract and process.
Data is stored in table ( RD_DocStore ) with structure as follows:
rawdata.RD_DocStore { DocID, DocName }
Sample Data Set for “RD_DocStore->DocName”, RD_DocStore->DocID*- Self Generated Surrogate Key.*
128978787_1272896_20240524_Care_Plan_Diagram_Form_1045645.PDF
12897234_1272688_20240623_Care_Plan_Diagram_104524348.PDF
1289789_12723436_20240704_Care_Form_104589.PDF
128946_127235_20220408_Care_Form_2023423__2045605234_104456.PDF
Need to Parse “DocName” as follows
Parsing Requirement
For each line(record):
1st box = Field No1, (this is easy got fixed position for ‘_’)
2nd box = Field No2, (this is easy got fixed position for ‘_’)
3rd box = Field No3, (this is easy got fixed position for ‘_’)
4th box = Field No4, (this is little bit complicated as ‘_’ is not fixed, can be any number of ‘_’ involved)
5th box = Field No5, (little complicated due to “Field No4” but ok)
6th box = Field No 6, (this is easy, got fixed start position after ‘.’), -- Not taking in consideration in my initial solution
First I got important index positions using the CHARINDEX function for all important characters which can help me achieve my objective, then I got reverse strings of the document name using REVERSE function as follows:
SELECT DocName
,CHARINDEX('_',([DocName]), 0) l1
,CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), 0))+1) l2
,CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), 0))+1))+1) l3
,CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), 0))+1))+1))+1) l4
,REVERSE([DocName]) RvName
,CHARINDEX('.',REVERSE([DocName]),0) rv_start
,CHARINDEX('_',REVERSE([DocName]),0) rv_end
,CHARINDEX('.',([DocName]), 0) l5
FROM [rawdata].[RD_DocStore]
Using above query as my inner query I was able to parse the “DocName” as follows:
SELECT docname
, SUBSTRING(docname,0, L1) value1
, SUBSTRING(docname,L1+1, L2-l1-1) value2
, SUBSTRING(docname,L2+1, L3-l2-1) value3
, SUBSTRING(docname,L3+1, L5-l3-(rv_end-rv_start+1)) value4
,REVERSE(SUBSTRING( RvName, rv_start+1, rv_end-rv_start-1)) value5
FROM (
SELECT TOP (1000) [DocID]
,[DocName]
,[DocExt]
,CHARINDEX('_',([DocName]), 0) l1
,CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), 0))+1) l2
,CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), 0))+1))+1) l3
,CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), (CHARINDEX('_',([DocName]), 0))+1))+1))+1) l4
,REVERSE([DocName]) RvName
, CHARINDEX('.',REVERSE([DocName]),0) rv_start
, CHARINDEX('_',REVERSE([DocName]),0) rv_end
,CHARINDEX('.',([DocName]), 0) l5
FROM [rawdata].[RD_DocStore]) a1
Result was as follows:
Sample Results
This initial rough solution became the basis for my final solution. Thank you for reading it, and I hope it will help somebody.
Subscribe to my newsletter
Read articles from Rajender Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by