Sparkles everywhere, CubicWeb gets fizzy

http://www.logilab.org/file/9845/raw/sparkling.jpg

Last week, we finally took a few days to dive into SPARQL in order to transform any CubicWeb application into a potential SPARQL endpoint.

The first step was to get a parser. Fortunately the w3c provides a grammar definition and around 200 test cases. There was a few interesting options around there: we tried to reuse rdflib, rasqal, the sparql.g version designed for antlr3 and SimpleParse but after two days of work, we had nothing that worked well enough. We decided it was not worth it and switched to yapps since we knew yapps and rql already had a dependency on it.

Maybe we'll consider changing the parser at some point later but the priority was to get something working as soon as we could and we finally came up with a version of fyzz passing 90% of the W3C test suite (of course, there might be some false positives).

Fyzz parses the SPARQL query and generates something we decided to call an AST although it's still a bit rough for now. Fyzz understands simple triples, distincts, limits, offsets and other basic functionalities.

Please note that fyzz is totally independent of cubicweb and it can be reused by any project.

Here's an example of how to use fyzz:

>>> from fyzz.yappsparser import parse
>>> ast = parse("""PREFIX doap: <http://usefulinc.com/ns/doap#>
... SELECT ?project ?name WHERE {
...    ?project a doap:Project;
...         doap:name ?name.
... }
... ORDER BY ?name LIMIT 5 OFFSET 10
... """)
>>> print ast.selected
[SparqlVar('project'), SparqlVar('name')]
>>> print ast.prefixes
{'doap': 'http://usefulinc.com/ns/doap#'}
>>> print ast.orderby
[(SparqlVar('name'), 'asc')]
>>> print ast.limit, ast.offset
5 10
>>> print ast.where
[(SparqlVar('project'), ('', 'a'), ('http://usefulinc.com/ns/doap#', 'Project')),
(SparqlVar('project'), ('http://usefulinc.com/ns/doap#', 'name'), SparqlVar('name'))]

This AST is then processed and transformed into a RQL query which can finally be processed by CubicWeb directly.

Here's what can be done in cubicweb-ctl shell session (of course, this can also be done in the web application) of our forge cube:

>>> from cubicweb.spa2rql import Sparql2rqlTranslator
>>> query = """PREFIX doap: <http://usefulinc.com/ns/doap#>
... SELECT ?project ?name WHERE {
...    ?project a doap:Project;
...         doap:name ?name.
... }
... ORDER BY ?name LIMIT 5 OFFSET 10
... """
>>> qinfo = translator.translate(query)
>>> rql, args = qinfo.finalize()
>>> print rql, args
Any PROJECT, NAME ORDERBY NAME ASC LIMIT 5 OFFSET 10 WHERE PROJECT name NAME, PROJECT is Project {}

From the above example, we can notice two things. First, for cubicweb to understand the doap namespace, we have to declare the correspondance between the standard doap vocabulary and our internal schema, this is done with yams.xy:

>>> from yams import xy
>>> xy.register_prefix('http://usefulinc.com/ns/doap#', 'doap')
>>> xy.add_equivalence('Project', 'doap:Project')
>>> xy.add_equivalence('Project name', 'doap:Project doap:name')

Secondly, for now, we notice that the case is not preserved during the transformation : ?project becomes PROJECT in the rql query. This is probably something that we'll need to tackle quickly.

We've also add a few views in CubicWeb to wrap that and it will be available in the upcoming version 3.4.0 and is already available through our pulic mercurial repository.

The door is now open, the path is still long, stay tuned !

image under creative commons by beger (original)