Query Formulation is a complex task, especially when building a federated query targeting multiple, heterogeneous and complex datasets. Granatum ReVeaLD aims at simplifying this process, by allowing any user to select any concepts whose instances he wishes to retrieve, as well as literal attributes of these instances. Moreover he can set constraints on either the concepts or the literal attributes, for example, only those drugs which treat diseases, OR molecules which have molecular weight under 400. This selection process, can be done using a very intuitive interface, which provides visual cues about possible relationships and attributes of pre-defined concepts. Each step of the process entails a dynamic reload of the browser page (new URL parameters) enabling the users to share their queries by simple URL exchange (temporary solution currently).
For evaluating the intuitiveness and interactivity of the Granatum ReVeaLD platform, we had provided the evaluators, who were mainly bio-medical researchers with 5 tasks. As ReVeaLD is a knowledge discovery platform from the Linked Bio-medical Data Space, the evaluators were asked to formulate five real-life queries, for example,
We kept the evaluation process as a game instead of a lame survey procedure, so as to keep the process competitive, and encouraging more users to actually participate in the evaluations. The users were scored according to the number of correctly selected nodes and constraints, with negative marking for an node/constraint which was not required. The final ranking was done on the basis of the score and the time taken to formulate the query. As the URL in the address bar, undergoes a change on every step of the query formulation process, we decided to use the Google Analytics, which logs almost everything from the time taken on any particular page on the website, the number of views on any page, to the client system configuration of the users.
The TABLE_ID is in the format of ga:xxxx where xxxx is the id of your profile on Google Analytics. In this construct, I am retrieving all the possible steps taken after a particular QUERY_STEP_URL, within a given time span. It is worth noting that only 10 concurrent requests are allowed at a time, and 10,000 requests per profile. The first limitation might cause a problem, if you are using a simple ‘for’ loop. An approach we took was to use the Window setInterval() method, and call the construct from within it, allowing the iteration after 3000 milliseconds or so.
Once we had the data about all the possible query steps taken in the evaluation, and the number of users jumping from one query step, to the next query step, the remaining steps were simple for me. We created a data structure where all this query steps are to be rendered as ‘circular nodes’ whose radius is the square root of the time taken on each query step (in seconds) and are grouped according to the task (separate colors). Hovering above the node gives the description of the node, fetched by running a simple parser on the URL Parameters. It also indicates the time taken on the step. Separate nodes are linked together utilizing on the information we fetched previously, depending on the next query steps from one step. The thickness of the ‘edge’ is proportionate to the number of ‘pageViews’ on the next step, as a direct consequence of one particular step. These nodes can be arranged in force-directed layout (as in our case) or can be rendered using the Sankey Diagram (D3JS uses the same data structure for both layouts)
The final visualization can be seen in a crude layout as shown in the image below. You can easily generalize the main paths taken to formulate the query, and on which query step, do the users divert. I would soon link the interactive version of this visualization, but it is just the default force-directed layout bundled with D3JS.
We never did use this information for the presentation of evaluation results, because it was overkill and beyond the previously-decided scope of the evaluation, it provided me with an unusual idea of developing interactive site-maps based on Google Analytics Tracking Data and a simple Force-Directed Graph Layout using D3JS (as the Visitor Flow can only be accessed by users authenticated on the Analytics Account). These sitemaps would allow users to know which pages are ‘trending’ depending on the user’s region and various other parameters, as set by the moderators. It can also allow a user to determine which article to read next, based on popular user opinion. It can filter and group based on user requirements and customization and on clicking a node, you can visit the represented page. For the Analytics to generate accurate results one would need to set a training set initially for the API to log. Also, as it takes time to build the data structure for the visualization, it would be feasible to build it on regular intervals without manual intervention and save it on a local file, using server side PHP or Java. This functionality can be incorporated as a Drupal or WordPress Plugin, and integrated in sites with large content bases, accessed from various regions of the world.