[ANNOUNCE] HyParSuite - Hypertext Parsing Suite - Linuxers

5 Sep 2002

      Hi all.
This is to announce a pre-alpha release of my Hypertext Parsing
Suite.
It can "clean" HTML ( or other markup languages, depending on
the user-defined rule table ) to get an output which matches the
rules in your table, create a DOM tree for use in various
applications and a lot of other stuff.
It comes in the class of Tidy ( tidy.sourceforge.net ), but has
very different goals. One is - it does NOT try to create standards-compliant
HTML, but cleans and parses to create something useful for
applications ( like, in information retrieval and extraction etc. ).
It reads its rules from a user-supplied table, and the default one
resembles HTML very closely.
The suite includes a sample.cpp file to show how to use the library.
And the library can be linked to your applications too.
Examples could be extracting particular parts from your markup language
files like links, or tables.
It is written in C++, and does not *yet* use the auto* tools.
Simple make commands to build and install ( in your $HOME )
We have run this code on huge amounts of data ( > 1 GB ), and
found it to be stable and without memory leaks. There are other
related applications written around this code which I'm planning
to release soon.
Please download and test.
URL: http://www.it.iitb.ac.in/~jaju/hypar/
Online docs created from doxygen can be found on the site.
-- 
jaju

PS: Probably not the right time to release, as I'll be away from Saturday
for a week. But please let me know of any issues/ideas/suggestions, and
I'll look into them after I come back.