Grepping in XML and other structured files

From WTFwiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Sometimes you need to grep in a HTML file or an XML file or other kinds of files which are not line-based. This is usually hard or painful using your standard grepping tools. Luckily there's a wonderful tool called sgrep which does exactly this sort of thing. You'll find it in apt if you're using Ubuntu and probably ports if you are using a BSD and if not, you're being really difficult but its homepage might be [1].

Here's an example to show how you might yank all the virtualhost directives out of a httpd.conf:

grep -v ^# httpd.conf | sgrep -i '"<virtualhost".."</virtualhost>"'

(Note how you must use regular grep to remove comments first, because they are line-based.)


Also in apt I found a tool called xgrep (homepage probably [2]) which is less neat than sgrep but might work better for some cases when you have well-formed XML files (which httpd.conf certainly isn't), because it allows you to specify tag ancestry and such.

And a small tip for dealing with ugly XML files: Expat comes with a tool called xmllint which is able to reformat files thus:

xmllint --format - <uglymess.xml

That'll be $5.