Difference between revisions of "Grepping in XML and other structured files"
(+xgrep) |
m (3 revisions) |
||
(One intermediate revision by one other user not shown) | |||
Line 3: | Line 3: | ||
Here's an example to show how you might yank all the virtualhost directives out of a httpd.conf: | Here's an example to show how you might yank all the virtualhost directives out of a httpd.conf: | ||
grep -v ^# httpd.conf | sgrep -i '"<virtualhost".."</virtualhost>"' | grep -v ^# httpd.conf | sgrep -i '"<virtualhost".."</virtualhost>"' | ||
+ | (Note how you must use regular grep to remove comments first, because they ''are'' line-based.) | ||
Yay! | Yay! |
Latest revision as of 21:49, 4 January 2013
Sometimes you need to grep in a HTML file or an XML file or other kinds of files which are not line-based. This is usually hard or painful using your standard grepping tools. Luckily there's a wonderful tool called sgrep which does exactly this sort of thing. You'll find it in apt if you're using Ubuntu and probably ports if you are using a BSD and if not, you're being really difficult but its homepage might be [1].
Here's an example to show how you might yank all the virtualhost directives out of a httpd.conf:
grep -v ^# httpd.conf | sgrep -i '"<virtualhost".."</virtualhost>"'
(Note how you must use regular grep to remove comments first, because they are line-based.)
Yay!
Also in apt I found a tool called xgrep (homepage probably [2]) which is less neat than sgrep but might work better for some cases when you have well-formed XML files (which httpd.conf certainly isn't), because it allows you to specify tag ancestry and such.
And a small tip for dealing with ugly XML files: Expat comes with a tool called xmllint which is able to reformat files thus:
xmllint --format - <uglymess.xml
That'll be $5.