Skip to content


Delete Lines Matching Regular Expression From Multiple Files

Today, I had to remove 650 instances of a line matching a certain pattern scattered across 145 different XML files. Not a pleasant task. (If you’re wondering, I’m deprecating a field in the XML DTD and wished to remove all current instances).

Just to save you all the searching and debugging, here is the final form and my notes.

egrep -rl '^<pattern>$' * | xargs sed -i .bak '/^<pattern>$/d'


Note a difference in the regex as used in grep and sed: in sed, the parenthesis are escaped, like \(.*\), as are forward slashes which delimit the regex, like \/. However, since you’re just deleted an entire line, parenthesis probably shouldn’t be needed.

For bonus points, you can count the number of instances of a pattern scattered across a number of files using

egrep -rc '^<pattern>$' * | awk -F: '{print $2}' | awk '{sum += $1} END {print sum}'

And to finish it off, here’s a real live example!

egrep -rl '^.*<length>(.*)</length>\w*$' * | xargs sed -i .bak '/^.*<length>\([0-9]*\)<\/length>\w*$/d'
 
egrep -rc '^.*<length>(.*)</length>\w*$' * | awk -F: '{print $2}' | awk '{sum += $1} END {print sum}'

Posted in Tutorials.


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.



Log in here!