Using rnginline From Python¶
Firstly, ensure you’ve installed rnginline.
Basic Usage¶
The example files used here are shown at the bottom of this section.
RELAX NG files on the filesystem can be referenced by path:
>>> import rnginline, os
>>> sorted(os.listdir('.'))
['external.rng', 'schema.rng']
>>> rnginline.inline('schema.rng')
<lxml.etree.RelaxNG object at ...>
But lxml
can already do that; the real utility is in loading multi-part
schemas from locations other than the filesystem, which lxml
can’t do.
We can load multi-part schemas stored in Python packages (which may be stored in zip files on disk). Here’s how to load one of the schemas from rnginline’s test suite:
>>> import rnginline
>>> from rnginline.urlhandlers import pydata
>>> url = pydata.makeurl('rnginline.test',
... 'data/testcases/external-ref-1/schema.rng')
>>> url
'pydata://rnginline.test/data/testcases/external-ref-1/schema.rng'
>>> rnginline.inline(url)
<lxml.etree.RelaxNG object at ...>
Data Sources¶
The first argument to inline()
defines the location to load the top-level
schema from. It can be a filesystem path, a URL, a file-like object or an
lxml.etree
element.
If you don’t want inline
to guess which input you’re providing, you can pass
the input as a specific type using one of the url
, path
,
file
or etree
keyword args instead.
URLs¶
When you use a URL as the input, it’s retrieved using the same machinery that
fetches external schemas during the inlining process. By default two types of
URLs are supported.
file:
URLs referencing the local filesystem, and pydata:
URLs
referencing data in a Python package. Note that the pydata:
scheme is a
proprietary/unregistered scheme created for use in rnginline.
Note
You can add support for URLs other than these, see the URL Handlers section for details.
>>> rnginline.inline('pydata://rnginline.test/data/testcases/include-1/schema.rng')
<lxml.etree.RelaxNG object at ...>
>>> from rnginline import urlhandlers
>>> url = urlhandlers.pydata.makeurl('rnginline.test', 'data/testcases/include-1/schema.rng')
>>> url
'pydata://rnginline.test/data/testcases/include-1/schema.rng'
>>> rnginline.inline(url)
<lxml.etree.RelaxNG object at ...>
Filesystem Paths¶
When you pass a filesystem path, it’s converted into a scheme-less URL path which is resolved against the default base URL, which by default is the current working directory.
>>> os.link('schema.rng', 'Not valid URL path.rng')
>>> rnginline.inline('Not valid URL path.rng')
<lxml.etree.RelaxNG object at ...>
>>> url = urlhandlers.file.makeurl('Not valid URL path.rng')
>>> url
'Not%20valid%20URL%20path.rng'
>>> rnginline.inline(url)
<lxml.etree.RelaxNG object at ...>
>>> os.unlink('Not valid URL path.rng')
File-like Objects¶
You may pass a file-like object as the input source. URLs inside the input’s
schema document will be relative to the default base URI (current directory)
unless you use the base_uri
keyword arg to inline()
to specify a the
base URI of the file object.
>>> os.mkdir('foo')
>>> os.chdir('foo')
>>> with open('../schema.rng') as f:
... # schema.rng references external.rng, which would fail to
... # resolve unless we provide a base URI
... rnginline.inline(f, base_uri='../schema.rng')
<lxml.etree.RelaxNG object at ...>
>>> os.chdir('..')
>>> os.rmdir('foo')
lxml.etree
Element¶
You can pass pre-parsed XML as an lxml.etree
element. The base URI of the
elements in the document is respected, and you should most likely ensure it’s
defined to be something sensible to allow references to external files to be
resolved correctly.
The base URI of an element is by default the URL of the location the parser read
the document from. It can be overridden from within the XML document using
the xml:base
attribute as well.
>>> os.mkdir('foo')
>>> os.chdir('foo')
>>> from lxml import etree
>>> doc = etree.parse('../schema.rng')
>>> doc.docinfo.URL
'../schema.rng'
>>> rnginline.inline(doc)
<lxml.etree.RelaxNG object at ...>
>>> with open('../schema.rng') as f:
... schema_content = f.read()
>>> elem = etree.fromstring(schema_content)
>>> elem.getroottree().docinfo.URL is None
True
>>> rnginline.inline(elem, base_uri='../schema.rng')
<lxml.etree.RelaxNG object at ...>
>>> elem = etree.fromstring(schema_content, base_url='../schema.rng')
>>> rnginline.inline(elem)
<lxml.etree.RelaxNG object at ...>
>>> os.chdir('..')
>>> os.rmdir('foo')
Note
If you use etree.XML()
/etree.fromstring()
, the XML won’t have
a base URI set unless you use the base_url
keyword arg.
URL Handlers¶
URLs encountered in <include href="…">
/ <externalRef href="…">
elements
are fetched using the URL Handlers registered with the Inliner whose
inline()
method has been called. As mentioned above, handlers for file:
and pydata:
URLs are provided and activated by default.
Handlers for other URL schemes can be created and used quite easily. Say you wanted to inline a schema referencing sub parts via HTTP. You could do it like this:
>>> from rnginline.urlhandlers import ensure_parsed
>>> import requests # http://python-requests.org
>>> class HTTPUrlHandler(object):
... def can_handle(self, url):
... print('Calling can_handle() w/ {}'.format(url))
... return ensure_parsed(url).scheme == 'http'
...
... def dereference(self, url):
... print('Calling dereference() w/ {}'.format(url))
... return requests.get(url).content
>>> from http.server import HTTPServer, SimpleHTTPRequestHandler
>>> import threading
>>> # Start an HTTP server serving the schemas in our cwd
>>> httpd = HTTPServer(('localhost', 8000), SimpleHTTPRequestHandler)
>>> threading.Thread(target=httpd.serve_forever).start()
>>> rnginline.inline('http://localhost:8000/schema.rng',
... handlers=[HTTPUrlHandler()])
Calling can_handle() w/ http://localhost:8000/schema.rng
Calling dereference() w/ http://localhost:8000/schema.rng
Calling can_handle() w/ http://localhost:8000/external.rng
Calling dereference() w/ http://localhost:8000/external.rng
<lxml.etree.RelaxNG object at ...>
>>> httpd.shutdown()
Example Files¶
The preceding examples in this section assume the following files exist in the directory the examples are run from.
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<include href="external.rng">
<!-- Override foo -->
<define name="foo">
<element name="foo">
<value>abc</value>
</element>
</define>
</include>
<start>
<element name="root">
<ref name="foo"/>
<ref name="bar"/>
</element>
</start>
</grammar>
<rng:grammar xmlns:xyz="x:/my/ns" xmlns:rng="http://relaxng.org/ns/structure/1.0">
<rng:define name="foo">
<rng:element name="foo">
<rng:notAllowed/> <!-- No foo for you! -->
</rng:element>
</rng:define>
<rng:define name="bar">
<rng:element name="xyz:bar">
<rng:text/>
</rng:element>
</rng:define>
</rng:grammar>