XML- SAX Parser using JAXP API
|Introduction to XML Parser:
In computing terms, a parser is a program
that takes input in the form of sequential
instructions, tags, or some other defined
sequence of tokens, and breaks them up into
easily manageable parts.
XML parser is used to read, update, create and
manipulate an XML document. Whenever the
XML document executes, the parser recognizes
and responds to each XML structure taking
some specified action based on the structure
XML parsers can be validating or nonvalidating.
Validating parser checks the contents of a
document against a set of specific rules i.e. in
what order they must appear. These rules
appear in an XML document either as an optional
XML structure called a document type definition,
or DTD, or as an XML Schema. Nonvalidating
parsers are smaller and faster, but they do not
check documents against the DTD. They only
check whether the XML document is structurally
well formed or not.
Parsing XML Documents
To manipulate an XML document, XML parser is
needed. The parser loads the document into
the computerís memory. Once the document is
loaded, its data can be manipulated using the
We will soon discuss APIs and parsers for
accessing XML documents using serially access
mode (SAX) and random access mode (DOM).
The specifications to ensure the validity of XML
documents are DTDs and the Schemas.
DOM (Document Object Model)
The XML Document Object Model (XML DOM)
defines a standard way to access and manipulate
XML documents using any programming
language (and a parser for that language).
The DOM presents an XML document as a treestructure
(a node tree), with the elements,
attributes, and text defined as nodes. DOM
provides access to the information stored in
your XML document as a hierarchical object
The DOM converts an XML document into a
collection of objects in an object model in a
tree structure (which can be manipulated in any
way). The textual information in XML document
gets turned into a bunch of tree nodes and a
user can easily traverse through any part of
the object tree, any time. This makes easier to
modify the data, to remove it, or even to insert
a new one. This mechanism is also known as
the random access protocol.
DOM is very useful when the document is small.
DOM reads the entire XML structure and holds
the object tree in memory, so it is much more
CPU and memory intensive. The DOM is most
suited for interactive applications because the
entire object model is present in memory, where
it can be accessed and manipulated by the user.
SAX (Simple API for XML)
This API was an innovation, made on the XMLDEV
mailing list through product collaboration,
rather than being a product of the W3C.
SAX (Simple API for XML) like DOM gives access
to the information stored in XML documents
using any programming language (and a parser
for that language).
This standard API works in serial access mode
to parse XML documents. This is a very fastto-
execute mechanism employed to read and
write XML data comparing to its competitors.
SAX tells the application, what is in the
document by notifying through a stream of
parsing events. Application then processes
those events to act on data.
SAX is also called as an event-driven protocol,
because it implements the technique to register
the handler to invoke the callback methods
whenever an event is generated. Event is
generated when the parser encounters a new
XML tag or encounters an error, or wants to
tell anything else. SAX is memory-efficient to a
Feb 2008 | Java Jazz Up |24
|View All Topics
|All Pages of this Issue