Saturday, November 26, 2016

BULK INSERT with a file containing linefeeds gotcha


I blogged about this before but someone ran into this issue again at work so I thought it made sense to revisit this.

I prefer to use BULK INSERT or bcp as much as possible, this is especially true if all I need to do is dump the file into a table. If there is a need to do complex stuff,  I will use SQL Server Integration Services or Informatica. These days files are generated by all kinds of systems, these can be Windows, *nix, Mac, Raspberry Pis, IoT systems and many other systems.

If you get an error importing one of these files, your first course of action is to open one of these files in something like Notepad++ or Editplus.  Notepad++ and Editplus have more functionality and are many times faster than notepad. The nice thing about either of these is that you can see control characters.



See how you can see the linefeeds here? You cannot do that in notepad




So let's say you get a file where the row terminator is a linefeed, how would you specify that as a row terminator in BULK INSERT?

You can try n which stands for newline

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = 'n')

Nope, that doesn't work, you get 0 rows inserted


You can try r which stands for carriage return

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = 'r')

Nope, that doesn't work either, you get 0 rows inserted


What about l for linefeed?

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = 'l')

You get an error

Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (SomeDate).



What about if you try a CHAR(10) which is a linefeed

BULK INSERT SomeTable
 FROM 'D:\JunkdrawImportMe.txt'
 WITH (FIELDTERMINATOR = 't',
 FIRSTROW =2,
 ROWTERMINATOR = CHAR(10) )

You get this error

Msg 102, Level 15, State 1, Line 5
Incorrect syntax near 'CHAR'.



How about if you embed it by using Dynamic SQL

DECLARE @cmd varchar(1000)
SET @cmd = 'BULK INSERT SomeTable
FROM ''D:\JunkdrawImportMe.txt''
WITH ( FIELDTERMINATOR = ''t'',
 FIRSTROW =2,
 ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(@cmd)

The dynamic SQL solution works, that was my original answer to this.

What about if we use the hex code for line feed 0x0a?

BULK INSERT SomeTable
FROM 'D:\JunkdrawImportMe.txt'
WITH (FIELDTERMINATOR = 't',
FIRSTROW =2,
ROWTERMINATOR = '0x0a') --CHAR(10) or ‘l’ does not work 
-- 0x0a is the hex code for linefeed CHAR(10)

Bingo, that works as well.  I actually prefer this, who wants to muck around with Dynamic SQL if there is an easier way....

That's all, keep this in mind next time you get a file with a linefeed and you are struggling importing that file

No comments: