Technology/Tool to capture metadata across multiple repositories with millions of documents


Minu Mittal
 

Looking for metadata Technology/Tool to automate picking metadata from the document repositories, to help index and fill up the tags/fields with at least 85% accuracy. Have heard a great deal of use of such tools in Consumer Goods and Healthcare Industries. Welcome all thoughts/proposals. 


Tammy Bearden
 

Hello Minu. 

Microsoft Project Cortex will help with that, but it is currently in Preview...few months to go yet. 

We took a serious look at SmartLogic, BA Insight, and Synaptica last year...listed here in order of my preference from their demos. 

I've opted to using Microsoft Power Automate (fka Flow) to do some of this auto-classification, and I'm eager to see what we'll be able to do with Project Cortex later this year. 

Good luck,

Tammy

On Thursday, April 30, 2020, 04:55:24 PM CDT, Minu Mittal <minu.mittal@...> wrote:


Looking for metadata Technology/Tool to automate picking metadata from the document repositories, to help index and fill up the tags/fields with at least 85% accuracy. Have heard a great deal of use of such tools in Consumer Goods and Healthcare Industries. Welcome all thoughts/proposals. 


Daan Boom
 

Dear Mitta:

The Association of Intelligent Information Management AIIM just organized a webinar on this subject, including a brief report on data capturing methods, tools. I was part of a team that did a major document repository conversion project with satisfactory results (not 100%). Also for new added documents Metadata will be pulled as, in our experience, document creators don’t donut or do it very badly. You may check the website for details: AIIM.org

Daan


On May 1, 2020, at 5:55 AM, Minu Mittal <minu.mittal@...> wrote:

Looking for metadata Technology/Tool to automate picking metadata from the document repositories, to help index and fill up the tags/fields with at least 85% accuracy. Have heard a great deal of use of such tools in Consumer Goods and Healthcare Industries. Welcome all thoughts/proposals. 


David Eddy
 

Minu -

Three questions:

1/ - what do you consider to be (or not) a "document?"

2/- what is technical environment(s)?

3/ - how big (employees) is the organization & how old is it?

- David


 

Hi Minu,

together with René Zäch, we have created a tool that uses big data semantic search for semi-automatic tagging. The technology enables us to tag large amounts of Office Documents - mostly legacy documents in short time.

We stopped looking for a technology that does it all alone and switched to a concept, where the technology does 95% and a SME person the rest. I works great.

Happy to share more info. We call this tool «Advanced Search Tool» More on the bottom of https://www.aht.ch

Regards,
Pavel
 

Looking for metadata Technology/Tool to automate picking metadata from the document repositories, to help index and fill up the tags/fields with at least 85% accuracy. Have heard a great deal of use of such tools in Consumer Goods and Healthcare Industries. Welcome all thoughts/proposals. 


Tom Reamy
 

Minu,

 

My company has done quite a few text analytics software evaluation projects helping companies select the best fit for them (and I’ve written a book on the field – Deep Text).  That experience leads me to a couple of major points.

 

First, there is no such thing as the best text analytics software.  We partner with a number of leading vendors and even they will admit that.  What is best for your organization depends on a variety of factors – the nature of your content (size, speed, variability, etc.,), the nature of your projects metadata (data like people, organizations, or subject aboutness – much harder but often more important), semantic resources (taxonomies, ontologies, metadata design, etc.,) your publishing process (where, when, and how to apply the metadata), your internal resources (IT, SME’s, etc.), and a number of other factors including price.

 

Second, the software does not generate the metadata.  If anyone tells you that their software can just be pointed at a repository and it will automatically tag your content, your best response is to run from the room.  They all require a significant development process especially to achieve 85%+ accuracy.  In other words, they require the development of rules – either machine learning (relatively quick to develop but unlikely to achieve your target accuracy) or linguistic-based rules which typically take more effort although not as much as ML vendors claim but which, in my experience, are the only way to achieve 85% accuracy.  We have been getting 95% accuracy in the last couple of projects we’ve done.

 

Lastly, I’d add a few vendors to Tammy’s list (all good) not in order of preference: SAS, Expert System, MeaningCloud, Microsoft, Amazon, and others depending on your particulars. The number of vendors is quite large and range very cheap but mostly useless to the over-priced. While broad studies like the AIIM one are a good place to start, the real questions are how those generic considerations apply to your organization.  Also, in my experience, the last step should always be one or more POC’s.

 

I hope this helps and I’d be happy to have a further, more in-depth, conversation – best is tomr@...

 

 

Tom Reamy

Chief Knowledge Architect

Author: Deep Text

KAPS Group, LLC

www.kapsgroup.com

510-922-9554 (O)

510-333-2458 (M)

 

 

 

From: SIKM@groups.io [mailto:SIKM@groups.io] On Behalf Of Minu Mittal
Sent: Thursday, April 30, 2020 2:55 PM
To: SIKM@groups.io
Subject: [SIKM] Technology/Tool to capture metadata, across multiple repositories with million of documents

 

Looking for metadata Technology/Tool to automate picking metadata from the document repositories, to help index and fill up the tags/fields with at least 85% accuracy. Have heard a great deal of use of such tools in Consumer Goods and Healthcare Industries. Welcome all thoughts/proposals. 


Atsu Sename
 

Thanks Tom,

I am very interested in what you are doing.
Is there a possibility of partnership with KAPS Group?

Regards.

Sociologue-Spécialiste en gestion des connaissances

Responsable Formation à L'Agence Nationale du Volontariat au Togo (ANVT)

Email: senameatsu@...
Cel   :(228)90925617 
Skype : nardy.sename1
Lomé - Togo

La connaissance est le coeur du développement durable





On Fri, May 1, 2020 at 4:02 PM +0000, "Tom Reamy" <kapsgro@...> wrote:

Minu,

 

My company has done quite a few text analytics software evaluation projects helping companies select the best fit for them (and I’ve written a book on the field – Deep Text).  That experience leads me to a couple of major points.

 

First, there is no such thing as the best text analytics software.  We partner with a number of leading vendors and even they will admit that.  What is best for your organization depends on a variety of factors – the nature of your content (size, speed, variability, etc.,), the nature of your projects metadata (data like people, organizations, or subject aboutness – much harder but often more important), semantic resources (taxonomies, ontologies, metadata design, etc.,) your publishing process (where, when, and how to apply the metadata), your internal resources (IT, SME’s, etc.), and a number of other factors including price.

 

Second, the software does not generate the metadata.  If anyone tells you that their software can just be pointed at a repository and it will automatically tag your content, your best response is to run from the room.  They all require a significant development process especially to achieve 85%+ accuracy.  In other words, they require the development of rules – either machine learning (relatively quick to develop but unlikely to achieve your target accuracy) or linguistic-based rules which typically take more effort although not as much as ML vendors claim but which, in my experience, are the only way to achieve 85% accuracy.  We have been getting 95% accuracy in the last couple of projects we’ve done.

 

Lastly, I’d add a few vendors to Tammy’s list (all good) not in order of preference: SAS, Expert System, MeaningCloud, Microsoft, Amazon, and others depending on your particulars. The number of vendors is quite large and range very cheap but mostly useless to the over-priced. While broad studies like the AIIM one are a good place to start, the real questions are how those generic considerations apply to your organization.  Also, in my experience, the last step should always be one or more POC’s.

 

I hope this helps and I’d be happy to have a further, more in-depth, conversation – best is tomr@...

 

 

Tom Reamy

Chief Knowledge Architect

Author: Deep Text

KAPS Group, LLC

www.kapsgroup.com

510-922-9554 (O)

510-333-2458 (M)

 

 

 

From: SIKM@groups.io [mailto:SIKM@groups.io] On Behalf Of Minu Mittal
Sent: Thursday, April 30, 2020 2:55 PM
To: SIKM@groups.io
Subject: [SIKM] Technology/Tool to capture metadata, across multiple repositories with million of documents

 

Looking for metadata Technology/Tool to automate picking metadata from the document repositories, to help index and fill up the tags/fields with at least 85% accuracy. Have heard a great deal of use of such tools in Consumer Goods and Healthcare Industries. Welcome all thoughts/proposals.