Dataset Build Advanced Configuration
  • 29 Oct 2024
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Dataset Build Advanced Configuration

  • Dark
    Light
  • PDF

Article summary

When adding or configuring a dataset build, Advanced Configuration provides a granular set of options for tuning ingested text to refine the dataset analytics. Modifying these settings can drastically change the results of the analytics within your Brainspace visualizations.

  • Filter Words – This option allows you to upload a list of terms that will be excluded from the analytics. These should be terms that are common within your dataset but provide little insight into your data.

  • EMT Containment ThresholdThis determines if an email is considered to be part of the inclusive email. Inclusive emails are the messages within an email thread that represent the overall conversation.

  • Setting this to 1 would mean an email’s content would need to be 100% contained within the inclusive email. This is the most conservative setting but will provide the greatest number of inclusive emails for review.

    Note

    Lowering this setting (to .8 for example) allows for a smaller amount of a message’s content to be contained within a message for it to be considered part of the inclusive email. This can accelerate the review, but the results could be less reliable.

  • EMT Related Threshold -This determines if an email is considered to be part of an email thread. An email thread is the group of emails comprised of the original message, responses and forwarded messages.

  • Setting this to 1 would mean an email’s content would need to be 100% contained within another email for it to be considered "related" to that email and get assigned to the same thread. This is the most conservative setting but will create the greatest number of email threads for review.

    Note

    Lowering this setting (to .8 for example) allows for a smaller amount of email threads. This can accelerate the review, but the results could be less reliable.

  • EMT Enhanced BCC Handling – If this is set, Brainspace will mark all emails with BCC populated as an inclusive email.

  • *Boilerplate Max Lines – The maximum number of lines in a group of text that Brainspace will consider to be a boilerplate (for example, a confidentiality notice in an email).

  • *Boilerplate Min Frequency – The minimum number of times a group of text must occur across documents in a dataset before it is considered a boilerplate.

    Note

    A boilerplate is a set of repeated lines that Brainspace can filter out prior to natural language processing.

  • Optional Analytics – These settings allow you to disable the below features of Brainspace’s analytics. Please note that for this, changing these will disrupt major functionality.

    • Brains, Clustering – Concept Searching.

    • Email Threading - A feature in Brainspace that determines unique messages, belonging to the same email thread, marking the unique content of each message, and determining the sort order and hierarchy of the messages in each thread.

    • Graph Data – Communication and conversation analysis.

  • Language & Stop Words – Words that are filtered out prior to natural language processing. Typically, these are more universal terms such as articles or prepositional phrases.


ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence