- 29 Oct 2024
- 5 Minutes to read
- Print
- DarkLight
- PDF
Create an XML Transformer
- Updated on 29 Oct 2024
- 5 Minutes to read
- Print
- DarkLight
- PDF
The Brainspace XML Connector offers a great deal of flexibility for ingesting a wide variety of custom data formats. You will need to first create an XML Transformer that is specifically made for the XML File that will be imported to format the data for ingestion into Brainspace.
How to Create an XML Transformer
You will need an XML Editor to allow for easy creation of your XML Transformer. A common editor for Windows is NotePad++ (https://notepad-plus-plus.org/) - use the XML Plugin to display your contents correctly. To install the plugin, go to Plugins > Plugins Manager > Show Plugins Manager and install XML Tools. To enable XML Tools, use Plugins -> XML Tools -> Pretty Print (libXML) or Ctrl+Alt+Shift+B as a keyboard command.
On Mac you can use Sublime Text (https://www.sublimetext.com/) which can be evaluated for free. Sublime Text will automatically format and markup the XML as you'll see in this guide, similar to the XML plugin for NotePad++.
--------------------------------
The function of the XML Transformer is to take the content flags (such as <content>) of the XML file and translate them into something that can be processed in Brainspace.
Here's a sample XML File that has data for import into Brainspace:
#(/data/brainspace/example-data.xml) <DOC>
<id>33254</id>
<type>E-mail</type>
<domain>http://www.example.com</domain>
<title>Here's an Example</title>
<link>http://www.example.com/33254</link>
<author>Thomas P.</author>
<publishedat>01-01-2018 18:45:28</publishedat>
<description>Example</description>
<content>I made an example post as a demonstration!</content>
<updatetime>01-02-2018 16:50:02</updatetime>
<hash>027ad5f56936cb6a31b574dc7a49b300</hash>
</DOC>
<DOC>
<id>33255</id>
<type>RSS</type>
<domain>http://www.testing.com</domain>
<title>This is a Test</title>
<link>http://www.example.com/33255</link>
<author>Sam E.</author>
<publishedat>01-01-2018 19:15:44</publishedat>
<description>Test</description>
<content>Please ignore, this is a test.</content>
<updatetime>01-01-2018 19:15:44</updatetime>
<hash>fa6a5a3224d7da66d9e0bdec25f62cf0</hash>
</DOC>
As you can see in the XML file above, there are a few key things that stand out. There seems to be data divided into chunks by <DOC> and </DOC> flags, and both chunks have common flags within them, such as <id>, <type>, <domain>, <title>, etc. It's important to note that these flags, the labels within each <>, are unique to your specific XML file, so knowing your data will help in configuring your XML Transformer.
In the example above, the <DOC> flag marks the beginning of a full entry or post, while the </DOC> flag marks the end of that particular entry. Everything between <DOC> and </DOC> should be thought of as a document (hence DOC) with the content inside marked with other descriptive flags. It's important to note as well, that every block of content (that's each chunk of <DOC> and </DOC>) will have the same identifying flags as this is the layout, or format, of your data.
A great example is the <title> flag, which acts as the Subject or (as the flag suggests) Title of that particular content. Each <DOC> will have a <title>, and you can use that to make sure Brainspace handles that information correctly. This is where the XML Transformer comes into play.
Here's another basic default template you can use as an XML Transformer file:
#(/var/lib/brains/.brainspace/transformers/example-transformer.xsl) <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/> <!-- Remove empty space -->
<xsl:template match="doc">
<doc>
<id><xsl:value-of select="id"/></id>
<date><xsl:value-of select="date"/></date>
<from><xsl:value-of select="from"/></from>
<to><xsl:value-of select="to"/></to>
<bodytext><xsl:value-of select="bodytext"/></bodytext>
<title><xsl:value-of select="subject"/></title>
<ticketType><xsl:value-of select="ticketType"/></ticketType>
<organization><xsl:value-of select="organization"/></organization>
</doc>
</xsl:template>
</xsl:stylesheet>
Think of the XML Transformer as a filter or translator for your XML file, so only the information you want is brought into Brainspace. Let's further explore the important parts that should be modified for your data.
<xsl:template match="doc">
This 'template match' line is searching for the term "doc" to define the beginning and end of each full block of content. "doc" in this specific XML file is marked <doc> for the beginning and </doc> for the end or each set of content. Let's take a look at the XML Transformers <doc> chunk:
<doc>
<id><xsl:value-of select="id"/></id>
<date><xsl:value-of select="date"/></date>
<from><xsl:value-of select="from"/></from>
<to><xsl:value-of select="to"/></to>
<bodytext><xsl:value-of select="bodytext"/></bodytext>
<title><xsl:value-of select="subject"/></title>
<ticketType><xsl:value-of select="ticketType"/></ticketType>
<organization><xsl:value-of select="organization"/></organization>
</doc>
Looks a little familiar right? We see flags such as <date>, <to>, and <from>, and within the flags there's some extra text we might not be familiar with; but that's okay. Our main priority is the flags, such as <date>, and quoted content, such as "date". Lets take the first line as an example:
<id><xsl:value-of select="id"/></id>
Let's break down what this line is doing. First, the <id> flag is what Brainspace is going to label this field once it's ingested, and select="id" is the flag identifier that the XML Transformer is looking for to match to our XML Data File. This works out great because what Brainspace is looking for and what my XML file has already lines up for <id>! Great! But what about this next line?
<date><xsl:value-of select="date"/></date>
Here we see that this will be the DATE field within Brainspace, and it's looking for a <date> flag in your XML file. However, in the XML file, there is no <date> flag, and it looks like there are two flags that can be dates, <updatetime> and <publishedat>. This is completely okay and the entire purposes of the XML Transformer. If you want the date that the post was created, you can use the <publishedat> flag. Simply modify that entry in the XML Transformer file so Brainspace will see it:
<date><xsl:value-of select="publishedat"/></date>
Whether you're new to Brainspace or a seasoned pro at creating Datasets, it's important to remember that Brainspace is optimized for E-mail, which is why you see flags for <to>, <from>, and <title>. This doesn't mean that the data you're bringing in needs to be e-mails, and in-fact the XML we're importing looks to be more centered around news or notifications contributed in different ways. With this in mind, you'll want to label your flags in a way that will select the right context when importing:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/> <!-- Remove empty space -->
<xsl:template match="doc">
<doc>
<id><xsl:value-of select="id"/></id>
<date><xsl:value-of select="publishedat"/></date>
<from><xsl:value-of select="author"/></from>
<to><xsl:value-of select="domain"/></to>
<bodytext><xsl:value-of select="content"/></bodytext>
<subject><xsl:value-of select="title"/></subject>
<method><xsl:value-of select="type"/></method>
<description><xsl:value-of select="description"/></description>
</doc>
</xsl:template>
</xsl:stylesheet>
In this XML Transformer there are a number of changes to some of the flag names, such as switching<subject><xsl:value-of select="title"/></subject>. Why? As mentioned previously, Brainspace works great with E-mail, and when you're selecting context within Brainspace you'll see Subject as a field. The flag has been changed from <title> to <subject> to better fit Brainspace, and the XML Transformer is telling Brainspace "If you see a flag in my XML file that is <title>, that means contextually it's a <subject> in my data." The XML Transformer template flags, such as <from>, <ticketType>, <organization>, etc, can be changed to whatever you like to make it easier for you to match up to Brainspace's ingest terms.
As an example, look at the <bodytext> and <method> flags. If you want the <content> to match up to Body Text in Brainspace, keep the label <bodytext> to make it easier when matching it up in Brainspace. The flag <method> doesn't really have a Brainspace equivalent, which is acceptable. You set that as an Enumeration or just Text, as long as you know what your Data is and what you want Brainspace to do with it, you can setup your XML Transformer to handle it how you see fit.
Let’s take a final look at a <DOC> section from the XML file.
<DOC>
<id>33255</id>
<type>RSS</type>
<domain>http://www.testing.com</domain>
<title>This is a Test</title>
<link>http://www.example.com/33255</link>
<author>Sam E.</author>
<publishedat>01-01-2018 19:15:44</publishedat>
<description>Test</description>
<content>Please ignore, this is a test.</content>
<updatetime>01-01-2018 19:15:44</updatetime>
<hash>fa6a5a3224d7da66d9e0bdec25f62cf0</hash>
</DOC>
Comparing the XML Transformer and XML Data file, you can see that <updatetime> and <hash> are not in the XML Transformer file. If required you could easily add, for example <hash>, with an additional line like <hash><xsl:value-of select="hash"/></hash> and Brainspace will bring in Hash. If hash doesn't seem like useful information for your dataset, you can omit it from the XML Translator so it's not imported into Brainspace.