Best way to index multiple attachments?

Fredrik Schultz
Member since: 2001
 

To index a single attachment together with a page by adding a property of the Attachment type to the page model - like this - works well:
http://world.episerver.com/documentation/Items/Developers-Guide/EPiServer-Find/9/DotNET-Client-API/File-attachments/

But how to handle multiple attachments when they are dynamic in numbers?

(The reason for this question is to get a search hit on a page also from the contents of the files in a LinkItemCollection.)

#143331 Jan 20, 2016 13:54
  • Kristoffer Olsson
    Member since: 2012
     

    All right.
    I know that this question is very old. (relativelly speaking) 
    But I just had this problem and could not find any solutions to it while searching. Creating a IEnumerable<Attachment> is out of the picture so something else had to be done.

    The solution that I came up with is not the most straight forward one. But it's a solution that fits my immediate needs.

    What I finally settled on was to create a "In memory ziparchive" and attach all my documents to that zipfile. Once all documents where added to the "virtual zipfile" I attached the Zipfile 

    pretty much like this 

    page.Attachments = new Attachment(() => ZipFileStream)

    The reason for creating a Zip file and not just simply stream all the files into a single MemoryStream is because that only works for plain text files. So I had to go down the route of a in memory zip archive

    Following is a Rough implementation of the solution that I propose and is basically what we ended up using

    private void ContentEvents_PublishedContent(object sender, EPiServer.ContentEventArgs e)
            {
                if (e.Content is PageWithMultipleDocuments page)
                {
                    var paths = new List<string>() {@"F:\randomplaintext.txt", @"F:\randomplaintext2.txt", @"F:\randompdf2.pdf" };
                    var zipFile = GetZipedAttachments(paths);
                    page.Attachments = new Attachment(() => zipFile);
                    SearchClient.Instance.Index(page);
                    zipFile.Dispose();
                }
            }
    
            private Stream GetZipedAttachments(List<string> paths)
            {
                if (paths.Any(File.Exists))
                {
                    var outFile = new MemoryStream();
    
                    var zipArchive = new ZipArchive(outFile, ZipArchiveMode.Create, false);
                    foreach (var path in paths.Where(File.Exists))
                    {
                        var entry = zipArchive.CreateEntry(path, CompressionLevel.Fastest);
                        using (var entryStream = entry.Open())
                        {
                            var fileBytes = File.ReadAllBytes(path);
                            entryStream.Write(fileBytes, 0, fileBytes.Length);
                            entryStream.Close();
                        }
                    }
    
                    outFile.Position = 0;
                    return outFile;
                }
                return null;
            }

    Yes, I am aware that zipFile might be null and maybe a Null Check would be in order.

    Edit: 
    Instead of using a published event listener as exemplified in the example above. This is a better way to handle it:

    SearchClient.Instance.Conventions.ForInstancesOf<PageWithMultipleDocuments>()
                    .IncludeField(x => GetAttachments(x));
    
    // GetAttachments basically returns new Attachment(() => zipFile)



    #188673 Edited, Feb 28, 2018 22:14
  • Fredrik Schultz
    Member since: 2001
     

    Interesting solution, thanks for sharing!

    #188688 Mar 01, 2018 8:45