Best way to index multiple attachments?

Vote:
 

To index a single attachment together with a page by adding a property of the Attachment type to the page model - like this - works well:
http://world.episerver.com/documentation/Items/Developers-Guide/EPiServer-Find/9/DotNET-Client-API/File-attachments/

But how to handle multiple attachments when they are dynamic in numbers?

(The reason for this question is to get a search hit on a page also from the contents of the files in a LinkItemCollection.)

#143331
Jan 20, 2016 13:54
Vote:
 

All right.
I know that this question is very old. (relativelly speaking) 
But I just had this problem and could not find any solutions to it while searching. Creating a IEnumerable<Attachment> is out of the picture so something else had to be done.

The solution that I came up with is not the most straight forward one. But it's a solution that fits my immediate needs.

What I finally settled on was to create a "In memory ziparchive" and attach all my documents to that zipfile. Once all documents where added to the "virtual zipfile" I attached the Zipfile 

pretty much like this 

page.Attachments = new Attachment(() => ZipFileStream)

The reason for creating a Zip file and not just simply stream all the files into a single MemoryStream is because that only works for plain text files. So I had to go down the route of a in memory zip archive

Following is a Rough implementation of the solution that I propose and is basically what we ended up using

private void ContentEvents_PublishedContent(object sender, EPiServer.ContentEventArgs e)
        {
            if (e.Content is PageWithMultipleDocuments page)
            {
                var paths = new List<string>() {@"F:\randomplaintext.txt", @"F:\randomplaintext2.txt", @"F:\randompdf2.pdf" };
                var zipFile = GetZipedAttachments(paths);
                page.Attachments = new Attachment(() => zipFile);
                SearchClient.Instance.Index(page);
                zipFile.Dispose();
            }
        }

        private Stream GetZipedAttachments(List<string> paths)
        {
            if (paths.Any(File.Exists))
            {
                var outFile = new MemoryStream();

                var zipArchive = new ZipArchive(outFile, ZipArchiveMode.Create, false);
                foreach (var path in paths.Where(File.Exists))
                {
                    var entry = zipArchive.CreateEntry(path, CompressionLevel.Fastest);
                    using (var entryStream = entry.Open())
                    {
                        var fileBytes = File.ReadAllBytes(path);
                        entryStream.Write(fileBytes, 0, fileBytes.Length);
                        entryStream.Close();
                    }
                }

                outFile.Position = 0;
                return outFile;
            }
            return null;
        }

Yes, I am aware that zipFile might be null and maybe a Null Check would be in order.

Edit: 
Instead of using a published event listener as exemplified in the example above. This is a better way to handle it:

SearchClient.Instance.Conventions.ForInstancesOf<PageWithMultipleDocuments>()
                .IncludeField(x => GetAttachments(x));

// GetAttachments basically returns new Attachment(() => zipFile)



#188673
Edited, Feb 28, 2018 22:14
Vote:
 

Interesting solution, thanks for sharing!

#188688
Mar 01, 2018 8:45
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.