Check file newline characters

Sometimes you definitely can face issue with newline characters in your project, some configuration files in the system, or elsewhere. The point is that we don’t bear this stuff in the mind all the time and sometimes just can omit obvious solution. Mostly we don’t even check what newline characters in our files are. But there are cases when these characters might be a root cause of some bug.

I faced the same issue the other day and would like to share the solution.

Everyone knows (or should know) that we have a few different kinds of newline characters. It depends on your platform, settings in IDE, and some other factors.
You can easily find some tables and descriptions on Wikipedia or all over the internet. I just out some short table to remind you what we are dealing with. That’s not a complete list of all available newline characters, but most popular.

Abbrevation HEX Escape sequence
CR 0D \r
LF 0A \n
CRLF 0D0A \r\n
LF+CR 0A0D \n\r

Dealing through the IDE

Normally most IDEs can show you current newline characters in your files or project. And you can easily switch between them. That’s pretty clean and easy. And that’s why I don’t want to describe that a lot. Just keep in mind that you can change such settings in your IDE. If the IDE you are using doesn’t support these options or you are working with some server via SSH connection (which means that usually, you have terminal), welcome to the next chapter.

Dealing through the terminal

Let’s say that you have some tool that creates some report file like CSV. Then you publish that file using some message broker to distribute that for your consumers. They are going to use this file in different ways only they know. And you receive the complaint from a few consumers that they have some troubleshooting during parsing the file because they see another newline character. They definitely expected that escape sequence is \n but they stuck with \r\n somehow.

The bug raised and you are going to investigate the issue. Just imagine that it’s fixed but you need to check whether this file is generated correctly or not.
How to do that? Of course, if you have the possibility to check that inside your IDE you are lucky and here you can stop thinking about that. But what if this case can be reproduced on some staging environment where you only have a SSH access? Believe me or not there are a lot of such cases when retesting could much harder than fixing.

Okay, for example, you were clever and you saved original file before bug fixing and you are sure that in this file the newline character is \r\n. But how to check that? Probably you heard that you can use some terminal command file. I prepared two files with CRLF and LF newline characters.

eignatik@Evgenys-MacBook-Pro ~/myGeneratedReports> file report-20180312.csv
report-20180312.csv: ASCII text, with CRLF line terminators

But if the line separator rather than CRLF you won’t check that using this command:

eignatik@Evgenys-MacBook-Pro ~/myGeneratedReports> file report-20180320.csv
report-20180320.csv: ASCII text

What can we do here? Have you noticed that in the table earlier there were some HEX codes for each separator? That might be helpful.
We can do some HEX dump and check if those symbols persist or not.

xxd <filename> is appropriate command for that. This one shows the whole HEX dump into default output. hence I would rather direct the output to some file and then open it via a vim (or whatever editor you like).

eignatik@Evgenys-MacBook-Pro ~/myGeneratedReports> xxd report-20180312.csv > report-CRLF.txt
eignatik@Evgenys-MacBook-Pro ~/myGeneratedReports> xxd report-20180320.csv > report-LF.txt

Take a look at the following screenshot. These both files are opened in split tabs in vim. On the left tab, you can see that I highlighted 0a sequences while on the right tab you can see highlighted 0d0a. So, as you know from the table above 0a is LF, while 0d0a is CRLF.

We obtained HEX dumps of those files. Just for your information, you have three sections there. Our interest is in the second and third sections. Dots in the third section help you to distinguish the end of line. Each character is represented by its HEX code. So, comparing those two dumps you can assure that one of them has CRLF and another LF.

If you have found a spelling error, please, notify us by selecting that text and pressing Ctrl+Enter.

Leave a Reply

Your email address will not be published.