Getting through the first task

Hello, I’m here again with some good news. I finally completed the first task, I could successfully traverse the DOM with the help of the DOMTraverser class which is a Pre-order depth-first DOM traversal helper.

The DOMTraverser is a class that visits every node in the DOM, typically, as you visit the node you will want to do different things depending on the task at hand. The DOMTraverser has methods one of which is the addHandler(nodeName, action). With this method the DOMTraverser will pass a node (the ‘a’ node in this case) to a handler that registers for them. The handle will be called once for every node you register for. Also since we will register a handler for ‘a’ nodes, we will only get ‘a’ nodes and not the ‘p’ nodes or something else. In our case we want to check if it (the <a> tag) is an external link (check the next paragraph for more info) that has a wikilink in it, and if that is the case we output the outerHTML and the DSR entries for the external link. let me walk you through the process.

Here was my task: Write a standalone file say test.js that reads HTML from an input HTML file say test.html, parses the HTML to the DOM and use the DOMTraverser to process only ‘a’ nodes, and for each of the external links output the outerHTML and the DSR value. Check out what external links are here and what wikilinks are here.

Great, with that stated lets walk through my solution. First we have to read from a file test.html

const fs = require('fs');
let htstring = fs.readFileSync('/home/doudouf/parsoid/lib/wt2html/pp/processors/test.html', 'utf8');

Next is to parse the string

let dom = ContentUtils.ppToDOM(htstring);

The DOMTraverser accepts an env object as input

const env = new MockEnv({});
const t = new DOMTraverser(env)

The project is about detecting the use of links inside links and adding this new category. Links inside links occurs when we nest <a> tags (wikilinks in external links) for example:

[ This is [[Google]]'s search page] 

From above we see a wikilink [[Google]] inside an external link. If we parse this we will obtain:

<p data-parsoid='{"dsr":[0,52,0,0]}'>
<a rel="mw:ExtLink" class="external text" href="" data-parsoid='{"targetOff":19,"contentOffsets":[19,51],"dsr":[0,52,19,1]}'>This is </a>
<a rel="mw:WikiLink" href="./Google" title="Google" data-parsoid='{"stx":"simple","a":{"href":"./Google"},"sa":{"href":"Google"},"dsr":[27,27,2,2],"misnested":true}'>Google</a>
<span data-parsoid='{"dsr":[27,27],"misnested":true}'>'s search page</span>

From above we see that when we have two successive <a> tags then we can suspect that a wikilink may have been nested into an external link. Therefore if we have a node (which is an <a> tag) we can first check if node.nextSibling exist, and if it does exist we check if node.nextSibling is an ‘A ‘ node (<a> tag).

After traversing the DOM, It’ll be good to write a handler function to be able to check if a link is an external one or not. From above we see that external links have a rel attribute of ‘mw:ExtLink’ while wikilink have a rel attribute of ‘mw:WikiLink’. Also, I tried passing some examples of wikilinks in external links, wikilinks beside wikilinks, external links beside wikilinks and external links beside external links. With all this I noticed that in the case of a wikilink in an external link, dsr[1] of the external link will be greater than that of the wikilink. In other scenarios, this was not true. With all this said, we now have three conditions to be able to conclude if a we have a wikilink in an external link.

let myHandler = (node) => {    
  let sibling = node.nextSibling;
if (sibling !== null) {
if (node.getAttribute('rel') === 'mw:ExtLink' &&
sibling.nodeName === 'A' &&
sibling.getAttribute('rel') === 'mw:WikiLink' ) {
if (DOMDataUtils.getDataParsoid(node).dsr[1] >
DOMDataUtils.getDataParsoid(sibling).dsr[1]) {   console.log(`wikilink-in-extlink
DSR: [${DOMDataUtils.getDataParsoid(node).dsr}]
OuterHTML: ${node.outerHTML}`);
return true;
t.addHandler('a', myHandler);

Great! That is it, remember that all this is in a standalone file, I now have to migrate the code to lib/wt2html/pp/processors/Linter.js file. this task took more time than expected because of the challenges I faced but I am looking forward to the day I will give you the link to a wiki page for help on the new category added. See you next time.

Related posts

3 Thoughts to “Getting through the first task”

  1. Perside

    Great one. Was a pleasure reading

  2. Hello. I have checked your and i see you’ve got some
    duplicate content so probably it is the reason that you don’t rank hi
    in google. But you can fix this issue fast.
    There is a tool that rewrites articles like human, just search in google:
    miftolo’s tools

    1. Fadmin

      Hello,thanks for commenting. Please how can I implement
      the miftolo’s tools.

Leave a Comment