Best way to search files on remote server?

First_Thunder@lemmy.zip · 1 month ago

Best way to search files on remote server?

lsjw96kxs@sh.itjust.works · 1 month ago

Maybe take a look at paperless-ngx, it will take care of the OCR for you and make it searchable. Just not sure if it will show the path correctly.

VoxAliorum@lemmy.ml · 1 month ago

Search them for words? Try pdfgrep with recursive - very easy to setup and try. If you feel like that’s taking too long, you probably need to accept some simplifications/helper structures.

greyfox@lemmy.world · 1 month ago

If you want the search to be flexible like handling things like root stemming (i.e. for matching words that are pluralized etc) you might want to put the text into an Elasticsearch database.

You might run into problems with the field length if these are long documents. A possible solution to that would be an putting each page into its own field inside of the document.

If this is for a non tech user to search, the Kibana interface should be relatively easy for anyone to use.

Father_Redbeard@lemmy.ml · 1 month ago

Would Papra work for you? I like it better than Paperless-NGX personally, which others have mentioned. But I’ll admit I’m not sure it’ll fit in your use case as I’m feeding it newly scanned documents for mine rather than existing file/folder hierarchy.

georift@piefed.social · 1 month ago

Might be a little heavy handed for your needs but I’ve found paperless-ngx to be amazing.