Understanding big Lucene index by inspecting a portion of it

· Read in about 1 min · (212 words) ·

I was wondering if I could get a sample out of many huge Lucene indexes and inspect them with Lukeall on my machine. I quickly realized, that copying such indexes over network would be time consuming.

First I googled for a ready-made solution so that I could copy on a few documents from the whole index into a separate ( small ) index. That way I could quickly understand the document structure. I came across this blog which mentions only how to backup a Lucene index. My use-case is to get only a portion of it. However, it also mentions how to use Lukeall to export an index ( or a portion of it ) in XML format. That seemed to be in right direction. But, there is no way I could import it back into a Lucene index. Well, that is was a stumbling block. There is a defect open in Lukeall for precisely this feature here.

For my purpose I have created a Scala script to copy first few documents of an index into an output index. The script is located here for now.  So, now I can get the first few documents from an index on a remote machine and inspect them locally.

Are there any better ways ?