- 29 Oct 2024
- 1 Minute to read
- Print
- DarkLight
- PDF
Native Ingest Troubleshooting
- Updated on 29 Oct 2024
- 1 Minute to read
- Print
- DarkLight
- PDF
In order to view native under the "view native doc" slider, the natives have to still live on the server. Example, if a .pst file is missing for an attachment, the attachment will not load (shows a 404 via developer tools) because of the way the path is configured by the application (/./native/andy_zipper_000_1_1.pst/2197988/NGLPlaye.xls).
Support mime types for ingest are located in the /var/lib/brains/.brainspace/plugins/natives-plugin-6.1.0/native-doc-whitelist.txt file.
All files which were not imported and the reason why they were not imported are located in the ingestion-error-report.csv file located in /data/brainspace/builds/<dataset hash>/reports/plugin folder. Example output:
"/./native/fitswcs_maps.tar.gz/fits_wcs.tar/1904-66_COO.fits",b3518b4,"Document mime-type not in whitelist, see native-doc-whitelist.txt for list of whitelisted mime-types . Document mime-type: 'application/fits' Document fileName: '1904-66_COO.fits'"
"/./native/mjt-cal-complex-no-dupe.pst/2097508/",6550229,"Document mime-type not in whitelist, see native-doc-whitelist.txt for list of whitelisted mime-types . Document mime-type: 'application/octet-stream' Document fileName: ''"
Additional information on failed to import can be found in the ingestion-error-detail-report.txt in the same folder as the ingestion-error-report.csv file. Example output:
-------- Native Ingest Error Detail Report --------
Report created: 2018-07-30T16:00:41.919Z
Total count: 14554 documents
Error count: 114 documents
Sent to Analytics count: 14440 documents
-------- Exception Counts ---------
NativeWhiteListException count : 111 exceptions
TikaException count : 3 exceptions
-------- Stack Traces (Stack Trace ID : Stack Trace) ---------
929d93e : com.brainspace.natives.NativeWhiteListException: Document mime-type not in whitelist, see native-doc-whitelist.txt for list of whitelisted mime-types . Document mime-type: 'application/xml' Document fileName: 'XMLClien.xml'
at com.brainspace.natives.command.commands.IngestCommand.lambda$whiteListCheck$31085786$1(IngestCommand.java:275)
at io.vavr.control.Try.mapTry(Try.java:616)
at com.brainspace.natives.command.commands.IngestCommand.whiteListCheck(IngestCommand.java:259)
at com.brainspace.natives.command.commands.IngestCommand.lambda$ingest$1(IngestCommand.java:93)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.Streams$StreamBuilderImpl.forEachRemaining(Streams.java:419)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
at com.brainspace.natives.extractors.PstExtractor.tryAdvance(PstExtractor.java:127)
at java.util.Spliterator.forEachRemaining(Spliterator.java:326)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.brainspace.natives.command.commands.IngestCommand.ingest(IngestCommand.java:104)
at com.brainspace.natives.command.NativeIngestCommand.routeCommand(NativeIngestCommand.java:85)
at com.brainspace.natives.command.NativeIngestCommand.main(NativeIngestCommand.java:55)
Mime type is used and not just file extension. Tested with png mime types but renamed them to as .fits. These files do not show up in the ingestion error report but showed up as not analyzed documents due to not having data in the body text like the other .png ticket.
If there is an issue ingesting a PST file:
There is a built in office 365 tool to use to verify the PST file. It provides details of the number of folders along with the number of documents which are included in the pst.
PS C:\Program Files (x86)\Microsoft Office\root\Office16> pwd
Path
----
C:\Program Files (x86)\Microsoft Office\root\Office16
Application name is SCANPST.EXE.
Output on the sample PST file: