Archive | Linux RSS feed for this section

*nix tricks explained – removing new lines with sed

27 Sep

In my last blog post, I mentioned the following command line to replace new lines i
n text with something else:

sed -e ':a;N;$!ba;s/\n/replace new lines with this/g'

A line like this looks complex, but once you understand it, you begin to see the full power of stream editors like sed:

sed -e
This invokes sed, the stream editor. This runs programs on an input stream, one line at a time. The flag “-e” tells sed that the program to run is supplied on the command line – that’s the ‘:a;N;$!ba;s/\n/replace new lines with this/g’
:a;
This declares a label, like those for goto in C, assembler and basic. Unlike those, labels for sed begin with a colon, rather than end. The semicolon ends the statement, just like in C, and program control moves to the next statement.
N;
Load the next line into the stream buffer and advance the line index. This is the magic right here. Normally, sed operates on a single line at a time, loading only that line into the buffer. The command ‘N’ moves the next line also into the buffer, and advances the position in the stream, letting you work with more than one line in the buffer
$!
Only perform the next command if not on the last line of the document. Sed supports conditionals based on range within the file, for example $ means the last line of the file. Following a range marker with a ! tells sed that you want it to match the opposite of the specified range – in this case, every line but the last
ba;
Branch to label a. This is just like a goto. The full statement is “$!ba”, meaning branch to a if not on the last line of the file
s/\n/replace new lines with this/g
This one is slightly complex. “s” means substitute, it replaces anything that matches the immediately following pattern with the statement after that. Normally, substitution only works on the first approprate target in the material. By specifying ‘g’ (global) at the end, the substitution will be performed for every matching position within the buffer

Command line stories: The annoying lines

26 Sep

The problem: Removing new lines from text, optionally postpending each old line with some content

The solution: Use the common tools ‘tr’ and ‘sed’ to do all the heavy lifting for you

It’s 16:52 on friday. You’re looking forward to the weekend, and your colleague has just packed up and left the office. Then you get the following email from a client:

Hey Michelle,
Can you send that incident report for that outage out to the following people right away?

jeff.blogs@client.co
jane.bloggs@client.co
playon@personaladdress.org
nighthawk@thinkstheyreahacker.com
fred@client.co
techteam@client.co
seniormanagement@client.co
everyone@client.co
reallyeveryone@client.co
[and it goes on like that for another 50 addresses]

Thankfully, you finished that report earlier, so that’s no issue. But your email client can’t exactly handle a string of email addresses, one per line. Your colleagues might have gone and manually attempted pulling them together, but you know better.

Enter your handy command line, and ‘tr’, the translation or deletion (trim, as I like to think) tool. tr is very simple – it replaces characters matching the first set with those from the second:

tr '\n' \;

So you run the above*, feeding in the email list**, and you get the below output:

jeff.blogs@client.co;jane.bloggs@client.co;playon@personaladdress.org;nighthawk@thinkstheyreahacker.com;fred@client.co;techteam@client.co;seniormanagement@client.co;everyone@client.co;reallyeveryone@client.co;[and so on]

There’s no spaces between the addresses, but that’s what tr does – it just removes and replaces, one character for one character. No matter – it might be ugly, but most email systems should be able to manage that.

But what if you wanted to add those spaces? Or that client asks for something like the below:

Hey Michelle,

Sorry to bother you again so late, but since we’re launching that new campaign tonight, the whole senior management team want text alerts for any issues. Can you set them up to go out to the following?

07700 955095
07700 933404
07700 966227
07700 967067
07700 934567
07700 910984
07700 958368
07700 957390
07700 967390
[and there's another twenty numbers]

Thankfully, text alerts are something you already do – you’ve got a lovely email-to-sms gateway setup that takes an email address in the form [phonenumber]@textnow.michelledisraeli.com. So all you need to do is remove that space from each line, turn them all into email addresses, and remove all the new lines.

Removing the spaces is easy:
tr -d [:blank:]
tr supports some predefined ranges, of which [:blank:] represents horizonal whitespace, like spaces. Then you just add the “-d” flag to tell tr that you want to delete characters, not replace them.

To add the email address details, we can’t exactly use tr any more – it can only replace a single character at a time. You know, however, that sed is able to do far more complex replacements through the magic of regular expressions:

sed -e 's/what-to-replace/replace-with-this/g'

Unfortunately, sed doesn’t like finding new lines, so you instead replace all the new lines with a unique character, and then get set to replace those:

tr '\n' \#

This gets us the ugly text:

07700955095#07700933404#07700966227#07700967067#07700934567#07700910984#07700958368#07700957390#07700967390#

Which we can then work on via sed with something like:

sed -e 's/\#/@textnow.michelledisraeli.com; /g'

Which finally gives us the output we’ve been wanting:

07700955095@textnow.michelledisraeli.com; 07700933404@textnow.michelledisraeli.com; 07700966227@textnow.michelledisraeli.com; 07700967067@textnow.michelledisraeli.com; 07700934567@textnow.michelledisraeli.com; 07700910984@textnow.michelledisraeli.com; 07700958368@textnow.michelledisraeli.com; 07700957390@textnow.michelledisraeli.com; 07700967390@textnow.michelledisraeli.com;

But if sed can manage to replace regular text, why can’t it replace new lines as well? Well, the problem is that sed is a stream editor, so it normally treats new lines as a change of what to work on. To get sed to do this, you need to use a more archiaic command:

sed -e ':a;N;$!ba;s/replace/withthis/g'

This command looks crazy, but once you figure out how it works, the entire power of sed becomes visible. And that’s what I’ll explain in the next article

*Note the need to escape the semi-colon, otherwise the shell you’re using will think you’re talking to it rather than instructing tr. To escape characters, just type a backslash (‘\’) before them. Some normal letters do special things when escaped – so ‘\n’ means new line, and ‘\t’ means tab. To get tr to actually look for new lines, you need to quote it as in the above example, to get the shell to ignore the escape character.

** Need a quick means to just paste in a pile of text and then feed that though a pipe into some unix tool? try using the below, then typing ‘END’ once you’re done:
cat <<END | [some tool]
cat echoes a file or standard in straight into standard out. “<<END” redirects a program’s standard in to use whatever text you enter all the way until you type END. You can replace the word END with something else if that appears in your source material.
Whilst you can use stream input on the command you’re running directly, I find using cat and piping it in to be safest when you’re working with a string of pipes to make sure the data all enters at the correct point. It also means your workflow will be much the same when working on the command line as when working with files.