Native Ingest Troubleshooting
  • 29 Oct 2024
  • 1 Minute to read
  • Dark
    Light
  • PDF

Native Ingest Troubleshooting

  • Dark
    Light
  • PDF

Article summary

In order to view native under the "view native doc" slider, the natives have to still live on the server.  Example, if a .pst file is missing for an attachment, the attachment will not load (shows a 404 via developer tools) because of the way the path is configured by the application (/./native/andy_zipper_000_1_1.pst/2197988/NGLPlaye.xls).

Support mime types for ingest are located in the /var/lib/brains/.brainspace/plugins/natives-plugin-6.1.0/native-doc-whitelist.txt file.

All files which were not imported and the reason why they were not imported are located in the ingestion-error-report.csv file located in /data/brainspace/builds/<dataset hash>/reports/plugin folder. Example output:

"/./native/fitswcs_maps.tar.gz/fits_wcs.tar/1904-66_COO.fits",b3518b4,"Document mime-type not in whitelist, see native-doc-whitelist.txt for list of whitelisted mime-types . Document mime-type: 'application/fits' Document fileName: '1904-66_COO.fits'"
"/./native/mjt-cal-complex-no-dupe.pst/2097508/",6550229,"Document mime-type not in whitelist, see native-doc-whitelist.txt for list of whitelisted mime-types . Document mime-type: 'application/octet-stream' Document fileName: ''"

Additional information on failed to import can be found in the ingestion-error-detail-report.txt in the same folder as the ingestion-error-report.csv file.  Example output:

-------- Native Ingest Error Detail Report --------
Report created: 2018-07-30T16:00:41.919Z
Total count: 14554 documents
Error count: 114 documents
Sent to Analytics count: 14440 documents

-------- Exception Counts ---------
NativeWhiteListException count : 111 exceptions
TikaException count : 3 exceptions

-------- Stack Traces (Stack Trace ID : Stack Trace) ---------
929d93e : com.brainspace.natives.NativeWhiteListException: Document mime-type not in whitelist, see native-doc-whitelist.txt for list of whitelisted mime-types . Document mime-type: 'application/xml' Document fileName: 'XMLClien.xml'
at com.brainspace.natives.command.commands.IngestCommand.lambda$whiteListCheck$31085786$1(IngestCommand.java:275)
at io.vavr.control.Try.mapTry(Try.java:616)
at com.brainspace.natives.command.commands.IngestCommand.whiteListCheck(IngestCommand.java:259)
at com.brainspace.natives.command.commands.IngestCommand.lambda$ingest$1(IngestCommand.java:93)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.Streams$StreamBuilderImpl.forEachRemaining(Streams.java:419)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
at com.brainspace.natives.extractors.PstExtractor.tryAdvance(PstExtractor.java:127)
at java.util.Spliterator.forEachRemaining(Spliterator.java:326)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.brainspace.natives.command.commands.IngestCommand.ingest(IngestCommand.java:104)
at com.brainspace.natives.command.NativeIngestCommand.routeCommand(NativeIngestCommand.java:85)
at com.brainspace.natives.command.NativeIngestCommand.main(NativeIngestCommand.java:55)

Mime type is used and not just file extension. Tested with png mime types but renamed them to as .fits. These files do not show up in the ingestion error report but showed up as not analyzed documents due to not having data in the body text like the other .png ticket.

If there is an issue ingesting a PST file:

There is a built in office 365 tool to use to verify the PST file. It provides details of the number of folders along with the number of documents which are included in the pst.

PS C:\Program Files (x86)\Microsoft Office\root\Office16> pwd

Path
----
C:\Program Files (x86)\Microsoft Office\root\Office16

Application name is SCANPST.EXE.

Output on the sample PST file:


ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence