Today, I had to remove 650 instances of a line matching a certain pattern scattered across 145 different XML files. Not a pleasant task. (If you’re wondering, I’m deprecating a field in the XML DTD and wished to remove all current instances).
Just to save you all the searching and debugging, here is the final form and my notes.
egrep -rl '^<pattern>$' * | xargs sed -i .bak '/^<pattern>$/d' |
Note a difference in the regex as used in grep and sed: in sed, the parenthesis are escaped, like \(.*\), as are forward slashes which delimit the regex, like \/. However, since you’re just deleted an entire line, parenthesis probably shouldn’t be needed.
For bonus points, you can count the number of instances of a pattern scattered across a number of files using
egrep -rc '^<pattern>$' * | awk -F: '{print $2}' | awk '{sum += $1} END {print sum}' |
And to finish it off, here’s a real live example!
egrep -rl '^.*<length>(.*)</length>\w*$' * | xargs sed -i .bak '/^.*<length>\([0-9]*\)<\/length>\w*$/d' egrep -rc '^.*<length>(.*)</length>\w*$' * | awk -F: '{print $2}' | awk '{sum += $1} END {print sum}' |
Thanks this was helpful. Also, I had to replace “-i .bak” with “-i.bak” for this to work for me.