Working with XML in newLISP
Version 9.3 of newLISP has been released and has new functions to facilitate working with XML. newLISP has had the ability to parse XML data into an association list since version 6.10. 9.3 adds the ability to more easily traverse a parsed document and edit its contents.
What makes newLISP’s XML handling great?
The majority of modern languages take an object oriented approach to XML. They parse the document and store it in an object. The ugliest use the functions defined in the DOM specification. A couple have more expressive interfaces, such as PHP’s simplexml, or merely novel, such as Python’s event-based handling.
Each of these languages have list structures and an excellent, established array of functions for dealing with them. So why use a complex, arcane interface? Why not simply translate the XML into a list?
The fundamental structure in lisp is the linked list. Lists can be of arbitrary length and depth and can be used to create infinitely complex structures. newLISP parses an XML string and converts it into a list.
Lisp’s strength is list processing. Working with XML as a list is therefore a simple task. Elements can be matched and located using match
and unify
or by use of a recursive function.
What doesn’t work?
newLISP is lacking in a function to convert a list back to XML. However, this is not too difficult to achieve on our own.
A concrete example
I recently built an application to locally cache information from another program which stores photos and publishes photos’ metadata via XML. The application that uses the cached data is written in Python, but I have never liked the way that Python (or most languages) deal with XML, so I decided to write this module in newLISP.
To download the remote document:
(define (get-xml-data url) (let ((xml-data (get-url url 3000))) (if (regex "ERR: (.+?)$" xml-data) (throw-error (format "download error: %s" $1))) xml-data))
This function downloads a url and returns it as a string. If the host times out after 3000 milliseconds or there is another error (which get-url
stores in its return value preceded by “ERR:”), we throw an error. We match this using regex
so that any error text can be captured in the system variable $1
. Otherwise the XML is returned as a string.
We can then define a function to parse the result and verify its validity:
(define (parse-xml-data str) (xml-type-tags nil "CDATA" nil nil) (let ((xml (xml-parse str 5))) (or xml (throw-error (xml-error)))))
Let’s walk through this function. First, xml-type-tags
defines how we want xml-parse
to deal with TEXT, CDATA, COMMENT, and ELEMENT type tags. By default, these are included as strings in each node. I only want to know if something is CDATA and may contain oddball characters.
The xml-parse
function’s third parameter is formed by adding the value of various options together. Granted this is a very C-like approach and not extremely expressive, but hey, life isn’t fair. Here is a list of the options and values (from the newLISP documentation):
1 suppress whitespace text tags
2 suppress empty attribute lists
4 suppress comment tags
8 translate string tags into symbols
16 add SXML (S-expression XML) attribute tags
From this we can see that we have remove whitespace tags and comments. I do not recommend suppressing empty attribute lists; there is little way to intuit then whether or not the first item in an element is its attribute list or a child element. We then evaluate to either the XML data or, if XML is nil (meaning there was an error parsing the XML), we throw an error.
We now have our data. Here is a fragment of the XML document and its resulting parsed list:
9374 /some/path/on/the/server something.jpg something_thumb.jpg January 9, 2008 4:55 PM EST January 9, 2008 5:20 PM EST 9375 /some/path/on/the/server something2.jpg something_thumb2.jpg January 9, 2008 4:56 PM EST January 9, 2008 5:21 PM EST
'(("album" () ("media" (("type" "photo")) ("id" () "9374") ("media_root" () "/some/path/on/the/server") ("fullsize" () "something.jpg") ("thumbnail" () "something_thumb.jpg") ("authored" () "January 9, 2008 4:55 PM EST") ("submitted" () "January 9, 2008 5:20 PM EST")) ("media" (("type" "photo")) ("id" () "9375") ("media_root" () "/some/path/on/the/server") ("fullsize" () "something2.jpg") ("thumbnail" () "something_thumb2.jpg") ("authored" () "January 9, 2008 4:56 PM EST") ("submitted" () "January 9, 2008 5:21 PM EST"))))
The similarities between XML and lisp lists are obvious (as is the wordiness of XML.)
Accessing the first media element is simple: (assoc (xml "album" "media"))
. Note that the full path is used to find the nested association. However, that will return only the first association found for the key “media”. What if we want all media elements?
(define (media xml , lst) (while (assoc (xml "album" "media")) (push (pop-assoc (xml "album" "media")) lst -1)) lst))
This function shows how pop-assoc
can be used to reduce a list.
pop-assoc
returns the popped association, which is then pushed onto the list that gets returned. The result of this function:
'((("media" ("id" "9374") ("media_root" "/some/path/on/the/server") ("fullsize" "something.jpg") ("thumbnail" "something_thumb.jpg") ("authored" "January 9, 2008 4:55 PM EST") ("submitted" "January 9, 2008 5:20 PM EST")) ("media" ("id" "9375") ("media_root" "/some/path/on/the/server") ("fullsize" "something2.jpg") ("thumbnail" "something_thumb2.jpg") ("authored" "January 9, 2008 4:56 PM EST") ("submitted" "January 9, 2008 5:21 PM EST"))))
Exporting a list to XML
This is an area where newLISP is lacking. There is no function to return a list to XML. To that end I’ve written a basic XML module to automate parsing XML in a defined manner and then writing the list back to XML. One neat thing I have done in this module is to make the xml-type-tag
symbols evaluate to macros that each know how to render a node as a string.
So long as the sxml conventions are maintained (a full element is (element "tag-name" (@ attribute-list) child-node-list)
, i.e. (XML:element "a" (@ ("href" "https://artfulcode.net")) ((XML:text "An excellent website")))
=> An excellent website
) and empty attributes lists are provided (as (@)
), conversion back and forth is straigh-forward.
If the XML string that is parsed has a declaration, it will be maintained. Otherwise, it uses a generic UTF-8 declaration. Here is the code:
(context 'XML) (set 'declaration {< ?xml version="1.0" encoding="UTF-8" ?>}) (define-macro (text) "Formats plain text." (string (args 0))) (define-macro (comment) "Formats comments." (format {-- %s -->} (string (args 0)))) (define-macro (cdata) "Formats CDATA nodes." (format "< ![CDATA[%s]]>" (string (args 0)))) (define-macro (@ attr) "Turns an s-xml attribute list into an XML attribute list." (cond ((null? attr) "") ((list? attr) (format { %s="%s"%s} (string (attr 0)) (string (attr 1)) (eval (cons @ (args))))))) (define-macro (element tag-name attributes children) "Recursively evaluates the contents of the s-xml list." (let ((string-format {< %s%s<%s>/%s>}) (items (list (string tag-name) (eval attributes) (join (map eval children) "") (string tag-name)))) (format string-format items))) (define (xml->list xml-string) "Parses XML string xml-string. Returns a list in s-xml format." (if (starts-with xml-string "< ?xml") (begin (regex {^(<\?xml .+? \?>)$} xml-string 16) (set 'declaration $1))) (xml-type-tags 'text 'cdata 'comment 'element) (first (xml-parse xml-string 17))) (define (list->xml xml-lst) "Evaluates an s-xml-formatted list to XML." (format "%s\n%s" declaration (eval xml-lst))) (context MAIN)
The declaration can be manually set via the symbol XML:declaration
. The XML node symbols element
, text
, comment
, cdata
, text
, and @
are all expected to be XML context symbols. The main forward-facing functions are xml->list
and list->xml
. They are used, simply enough, like this:
(set 'xml-string { 9374 /some/path/on/the/server something.jpg something_thumb.jpg January 9, 2008 4:55 PM EST January 9, 2008 5:20 PM EST 9375 /some/path/on/the/server something2.jpg something_thumb2.jpg January 9, 2008 4:56 PM EST January 9, 2008 5:21 PM EST })
(set 'xml-list (XML:xml->list xml-string))
Now, xml-list
is:
(XML:element "album" (@) ((XML:element "media" (@) ((XML:element "id" (@) ((XML:text "9374"))) (XML:element "media_root" (@) ((XML:text "/some/path/on/the/server"))) (XML:element "fullsize" (@) ((XML:text "something.jpg"))) (XML:element "thumbnail" (@) ((XML:text "something_thumb.jpg"))) (XML:element "authored" (@) ((XML:text "January 9, 2008 4:55 PM EST"))) (XML:element "submitted" (@) ((XML:text "January 9, 2008 5:20 PM EST"))))) (XML:element "media" (@) ((XML:element "id" (@) ((XML:text "9375"))) (XML:element "media_root" (@) ((XML:text "/some/path/on/the/server"))) (XML:element "fullsize" (@) ((XML:text "something2.jpg"))) (XML:element "thumbnail" (@) ((XML:text "something_thumb2.jpg"))) (XML:element "authored" (@) ((XML:text "January 9, 2008 4:56 PM EST"))) (XML:element "submitted" (@) ((XML:text "January 9, 2008 5:21 PM EST")))))))
We can convert back with XML:list->xml
:
(set 'new-xml-string (XML:list->xml xml-list))
Which makes new-xml-string
:
< ?xml version="1.0" encoding="UTF-8" ?> 9374 /some/path/on/the/server something.jpg something_thumb.jpg January 9, 2008 4:55 PM EST January 9, 2008 5:20 PM EST 9375 /some/path/on/the/server something2.jpg something_thumb2.jpg January 9, 2008 4:56 PM EST January 9, 2008 5:21 PM EST
The only difference is that the new string has an XML declaration.