Skip to content


Delete Lines Matching Regular Expression From Multiple Files

Today, I had to remove 650 instances of a line matching a certain pattern scattered across 145 different XML files. Not a pleasant task. (If you’re wondering, I’m deprecating a field in the XML DTD and wished to remove all current instances).

Just to save you all the searching and debugging, here is the final form and my notes.

egrep -rl '^<pattern>$' * | xargs sed -i .bak '/^<pattern>$/d'


Note a difference in the regex as used in grep and sed: in sed, the parenthesis are escaped, like \(.*\), as are forward slashes which delimit the regex, like \/. However, since you’re just deleted an entire line, parenthesis probably shouldn’t be needed.

For bonus points, you can count the number of instances of a pattern scattered across a number of files using

egrep -rc '^<pattern>$' * | awk -F: '{print $2}' | awk '{sum += $1} END {print sum}'

And to finish it off, here’s a real live example!

egrep -rl '^.*<length>(.*)</length>\w*$' * | xargs sed -i .bak '/^.*<length>\([0-9]*\)<\/length>\w*$/d'
 
egrep -rc '^.*<length>(.*)</length>\w*$' * | awk -F: '{print $2}' | awk '{sum += $1} END {print sum}'

Posted in Tutorials.


One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. FocusedWolf says

    Thanks this was helpful. Also, I had to replace “-i .bak” with “-i.bak” for this to work for me.



Some HTML is OK

or, reply to this post via trackback.

 



Log in here!