相关文章推荐
愉快的核桃  ·  elastic search in ...·  1 月前    · 
愉快的核桃  ·  Reporting and sharing ...·  1 月前    · 
愉快的核桃  ·  Search a PDF file ...·  1 月前    · 
愉快的核桃  ·  How to index the PDF ...·  1 月前    · 
愉快的核桃  ·  Using HBase ...·  1 月前    · 

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Hey guy's

Is there any package or code sample for implementing elastic search for content and pdf?

Regards Dhanesh:)

Only for content see https://our.umbraco.com/packages/website-utilities/novicellexamineelasticsearch/ however you could tap into media save events and then inject pdf items into index?

Regards

Ismail

Yeah so you dont have to write all the code to get the umbraco content into elastic. You also dont have to write the client to query it either.

Basically we have examine which is the search and indexing in umbraco. It wraps around lucene.net. Examine is extensible.

The package link i sent you is an examine provider for elastic search so instead of the engine under being lucene its now elastic (although elastic is powered also by lucene). There is also an azure search provider examinex although that one is paid.

The elastic one only does content and media however for the media it only does stub information like filename size extension not the actual content of the media. So in theory you can use examine events and test is current item being indexed media item, if it is then test is it pdf and if it is extract the pdf lib of your choice then inject the extracted content in. That way you can get actual pdf content.

Regards

Ismial

I have done pdf extraction before see https://our.umbraco.com/packages/website-utilities/cogumbracoexaminemediaindexer/ you could look at the code for this and the libs i used and then use that.

I did create a composition for examine pdf indexer which uses textsharp and i swapped out the textsharp engine with apache tika. Apache tika can extract most file formats, its a bit on the heavy side as its written in java and uses IKVM but it works really well. See https://www.nuget.org/packages/TikaOnDotNet/