How to get the short intro from an attachment (vpp file)

Fredrik von Werder
Member since: 2005
 

Hi,

I would like to fetch the first lets say 150 chars from a file hit (Word, PDF).

If I was searching by using a searchword, then I have the x.SearchAttachment().AsHighlighted(...), but if no search query was used, then I would just like to get the introduction of the file instead...

(and I do not mean the SearchSummary())

Is this possible?

#61809 Oct 02, 2012 13:45
  •  

    Hi Fredrik,

    Later versions of the .NET API has an AsCropped method for Attachments (it always has for strings), so you can do x.SearchAttachment().AsCropped. See an example (although for a string field) in my reply here.

    #61810 Oct 02, 2012 14:06
  • Fredrik von Werder
    Member since: 2005
     

    Yes nu finns den där AsCropped() som gör utrdag utan att man har ett sökord.

    Dock, för PDF filer så blir det inget roligt utdrag, verkar mest bli tomt eller siffran "0". 

    Är det jag som tänkt fel eller är det en bugg möjligtvis?

    Däremot, när man har ett sökord, så funkar AsHighlighted() fint.

    #63026 Nov 07, 2012 13:47
  • Fredrik von Werder
    Member since: 2005
     

    bump

    No search phrase provided

    Excerpt = x.SearchAttachment().AsCropped(2000) (the PDF contains text and images)

    gives me nothing, or just a "0"

    Is this a bug or am I doing it the wrong way?

     

     

    #63326 Nov 15, 2012 10:11
  • Fredrik von Werder
    Member since: 2005
     
    #63327 Nov 15, 2012 10:11
  •  

    Hi,

    It seems to be working for me with pdfs. All the tests we have for cropping pdfs works as well. Is it a specific pdf that causes the problem or is it all pdfs that you have tried? 

    PDF indexing is hard and some pdfs have really weird indentations when extracting the content. Are you sure there is always 0 or could it be whitespaces?

    #63372 Nov 16, 2012 9:57
  • Fredrik von Werder
    Member since: 2005
     

    Could be whitespaces, but shouldn't the AsCropped method take care of that?

    Remember that this is only happening when not providing a search term

    #64420 Dec 20, 2012 11:22
  •  

    For attachments where the "cropped" text contains new lines the AsCropped function might fail. We will fix this issue in the backend (it won't require any action by the affected users).

    /Henrik

    #64773 Jan 09, 2013 16:11