KM for Data Science #data-science #knowledge-graph


Abhi Karale
 

What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.


Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 


There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 


I was wondering if anyone was aware of current tools that are used for this purpose. 


Douglas Weidner
 

Here's a good question, which is a technique that is part of one of KM's other domains - innovation: "Ask Why five times."

If there was a primary data science function, would it be gathering/manipulating data, or analyzing information to uncover insights, aka Knowledge, or other?.

If there is a primary DS function, that is where I would focus my initial KM efforts. 

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 

What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.


Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 


There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 


I was wondering if anyone was aware of current tools that are used for this purpose. 


Murray Jennex
 

Good points Doug.  I tell my students and colleagues that actually KM is the leading discipline, that Knowledge Systems, KS, has replaced IS as the core discipline, and that all other disciplines are really subsets of KM/KS.  I look at data scientists in the same light.  It isn't how data scientists manage knowledge that is important, it is how KM/KS uses data scientists to help discover knowledge.  Managing the resulting knowledge is then through KM/KS....murray jennex


-----Original Message-----
From: Douglas Weidner douglas.weidner@... [sikmleaders] To: sikmleaders
Sent: Tue, May 14, 2019 8:43 am
Subject: Re: [sikmleaders] KM for Data Science



Here's a good question, which is a technique that is part of one of KM's other domains - innovation: "Ask Why five times.."

If there was a primary data science function, would it be gathering/manipulating data, or analyzing information to uncover insights, aka Knowledge, or other?.

If there is a primary DS function, that is where I would focus my initial KM efforts. 

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 
What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.

Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 

There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 

I was wondering if anyone was aware of current tools that are used for this purpose. 



Richard Vines <plessons@...>
 

Whilst I agree with you Murray (and I do), for me, this matter raises the prospect of what might be a significant fear going forward. 

In the trenches, perhaps as a result of the forces of mass scale self interests, the application of large scale vendors across agencies through contracts and licensing arrangements by default, rather than design, do not really create the conditions to generate a level playing field where knowledge discourse can result in respectful engagement between subject matter experts, users / citizens, software engineers and those with knowledge brokering interests to mediate types of trans disciplinary approach to informatics (minor examples, perhaps emerging examples here and here in the social sciences). Of course, this is a generalised statement and indicative of my own fears. 

... fears that contextual knowledge itself is incrementally being marginalised by belief systems in quasi utopian forms of machine learning, that are designed not to augment human interpretative intelligence but to replace it. Are we seeing early signs of the emergence of a profound knowledge-based crisis? [DEADLY BOEING CRASHES RAISE QUESTIONS ABOUT AIRPLANE AUTOMATION].

In such a crisis, perhaps IS / KS or whatever (it does not matter which) are being subjugated to the functions of marketing and the historical forces related to communications / public relations now going back a 100 years.

So your views of KM may clearly have a role, but perhaps these too, in a very different way, might have slightly utopian objectives in this real world of change and adaption? 



Richard




----- Original Message -----
From:
sikmleaders@...

To:

Cc:

Sent:
Tue, 14 May 2019 23:09:24 +0000 (UTC)
Subject:
Re: [sikmleaders] KM for Data Science


 

Good points Doug.  I tell my students and colleagues that actually KM is the leading discipline, that Knowledge Systems, KS, has replaced IS as the core discipline, and that all other disciplines are really subsets of KM/KS.  I look at data scientists in the same light.  It isn't how data scientists manage knowledge that is important, it is how KM/KS uses data scientists to help discover knowledge.  Managing the resulting knowledge is then through KM/KS....murray jennex


-----Original Message-----
From: Douglas Weidner douglas.weidner@... [sikmleaders]
To: sikmleaders
Sent: Tue, May 14, 2019 8:43 am
Subject: Re: [sikmleaders] KM for Data Science



Here's a good question, which is a technique that is part of one of KM's other domains - innovation: "Ask Why five times.."

If there was a primary data science function, would it be gathering/manipulating data, or analyzing information to uncover insights, aka Knowledge, or other?.

If there is a primary DS function, that is where I would focus my initial KM efforts. 

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 
What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.

Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 

There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 

I was wondering if anyone was aware of current tools that are used for this purpose. 



Murray Jennex
 

My views probably have a utopian point to them but I state them the way I do to evoke reaction and discussion.  I teach in a MIS department and I tell them IS is dean and KS is king, not that I totally believe it, but to get them to realize the world is changing.

It is also why I said earlier that my greatest fear was the push for political correctness over riding the quest for knowledge.  Your discussion below is sort of what I really had in mind.  To go further, I think the mantra of climate change is shaping what we consider knowledge.  Climate people tell us nuclear energy is dead, yet the US produced more electricity from nuclear power in 2018 than we ever have and China is building 8 plants while India is building 6.  Also, there is now concern that with shutting down nuclear plants countries will not be able to meet their carbon goals.  I see this as one of many cases where political correctness has pushed away knowledge and experience and ultimately making what I fear is a real issue much worse.....murray jennex


-----Original Message-----
From: 'Richard Vines' plessons@... [sikmleaders]
To: sikmleaders
Sent: Tue, May 14, 2019 4:56 pm
Subject: Re: [sikmleaders] KM for Data Science



Whilst I agree with you Murray (and I do), for me, this matter raises the prospect of what might be a significant fear going forward. 

In the trenches, perhaps as a result of the forces of mass scale self interests, the application of large scale vendors across agencies through contracts and licensing arrangements by default, rather than design, do not really create the conditions to generate a level playing field where knowledge discourse can result in respectful engagement between subject matter experts, users / citizens, software engineers and those with knowledge brokering interests to mediate types of trans disciplinary approach to informatics (minor examples, perhaps emerging examples here and here in the social sciences). Of course, this is a generalised statement and indicative of my own fears. 

... fears that contextual knowledge itself is incrementally being marginalised by belief systems in quasi utopian forms of machine learning, that are designed not to augment human interpretative intelligence but to replace it. Are we seeing early signs of the emergence of a profound knowledge-based crisis? [DEADLY BOEING CRASHES RAISE QUESTIONS ABOUT AIRPLANE AUTOMATION].

In such a crisis, perhaps IS / KS or whatever (it does not matter which) are being subjugated to the functions of marketing and the historical forces related to communications / public relations now going back a 100 years.

So your views of KM may clearly have a role, but perhaps these too, in a very different way, might have slightly utopian objectives in this real world of change and adaption? 



Richard




----- Original Message -----
From:
sikmleaders@...

To:

Cc:

Sent:
Tue, 14 May 2019 23:09:24 +0000 (UTC)
Subject:
Re: [sikmleaders] KM for Data Science


 
Good points Doug.  I tell my students and colleagues that actually KM is the leading discipline, that Knowledge Systems, KS, has replaced IS as the core discipline, and that all other disciplines are really subsets of KM/KS.  I look at data scientists in the same light.  It isn't how data scientists manage knowledge that is important, it is how KM/KS uses data scientists to help discover knowledge.  Managing the resulting knowledge is then through KM/KS....murray jennex


-----Original Message-----
From: Douglas Weidner douglas.weidner@... [sikmleaders]
To: sikmleaders
Sent: Tue, May 14, 2019 8:43 am
Subject: Re: [sikmleaders] KM for Data Science



Here's a good question, which is a technique that is part of one of KM's other domains - innovation: "Ask Why five times.."

If there was a primary data science function, would it be gathering/manipulating data, or analyzing information to uncover insights, aka Knowledge, or other?.

If there is a primary DS function, that is where I would focus my initial KM efforts. 

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 
What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.

Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 

There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 

I was wondering if anyone was aware of current tools that are used for this purpose. 





Douglas Weidner
 

Sounds good to me!

Douglas

On Tue, May 14, 2019 at 7:15 PM Murray Jennex murphjen@... [sikmleaders] <sikmleaders@...> wrote:
 

Good points Doug.  I tell my students and colleagues that actually KM is the leading discipline, that Knowledge Systems, KS, has replaced IS as the core discipline, and that all other disciplines are really subsets of KM/KS.  I look at data scientists in the same light.  It isn't how data scientists manage knowledge that is important, it is how KM/KS uses data scientists to help discover knowledge.  Managing the resulting knowledge is then through KM/KS....murray jennex


-----Original Message-----
From: Douglas Weidner douglas.weidner@... [sikmleaders] <sikmleaders@...>
To: sikmleaders <sikmleaders@...>
Sent: Tue, May 14, 2019 8:43 am
Subject: Re: [sikmleaders] KM for Data Science



Here's a good question, which is a technique that is part of one of KM's other domains - innovation: "Ask Why five times.."

If there was a primary data science function, would it be gathering/manipulating data, or analyzing information to uncover insights, aka Knowledge, or other?.

If there is a primary DS function, that is where I would focus my initial KM efforts. 

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 
What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.

Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 

There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 

I was wondering if anyone was aware of current tools that are used for this purpose. 



Douglas Weidner
 

I also agree, but recognize 'contractors/vendors' can have both good and bad influences.

One might be able to prove that KM was largely initiated by repository vendors in the mid-1990s.

It is our fault (KMers), that we let that single KM domain dominate much of KM for many years, even today in many accounts.

But few have a solid enough comprehension of KM to stand up against such parochial interests. 
And, you can't blame a KM Repository vendor from claiming KM is a repository.

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 7:56 PM 'Richard Vines' plessons@... [sikmleaders] <sikmleaders@...> wrote:
 

Whilst I agree with you Murray (and I do), for me, this matter raises the prospect of what might be a significant fear going forward. 


In the trenches, perhaps as a result of the forces of mass scale self interests, the application of large scale vendors across agencies through contracts and licensing arrangements by default, rather than design, do not really create the conditions to generate a level playing field where knowledge discourse can result in respectful engagement between subject matter experts, users / citizens, software engineers and those with knowledge brokering interests to mediate types of trans disciplinary approach to informatics (minor examples, perhaps emerging examples here and here in the social sciences). Of course, this is a generalised statement and indicative of my own fears. 

... fears that contextual knowledge itself is incrementally being marginalised by belief systems in quasi utopian forms of machine learning, that are designed not to augment human interpretative intelligence but to replace it. Are we seeing early signs of the emergence of a profound knowledge-based crisis? [DEADLY BOEING CRASHES RAISE QUESTIONS ABOUT AIRPLANE AUTOMATION].

In such a crisis, perhaps IS / KS or whatever (it does not matter which) are being subjugated to the functions of marketing and the historical forces related to communications / public relations now going back a 100 years.

So your views of KM may clearly have a role, but perhaps these too, in a very different way, might have slightly utopian objectives in this real world of change and adaption? 



Richard




----- Original Message -----

To:
<sikmleaders@...>
Cc:

Sent:
Tue, 14 May 2019 23:09:24 +0000 (UTC)
Subject:
Re: [sikmleaders] KM for Data Science


 

Good points Doug.  I tell my students and colleagues that actually KM is the leading discipline, that Knowledge Systems, KS, has replaced IS as the core discipline, and that all other disciplines are really subsets of KM/KS.  I look at data scientists in the same light.  It isn't how data scientists manage knowledge that is important, it is how KM/KS uses data scientists to help discover knowledge.  Managing the resulting knowledge is then through KM/KS....murray jennex


-----Original Message-----
From: Douglas Weidner douglas.weidner@... [sikmleaders] <sikmleaders@...>
To: sikmleaders <sikmleaders@...>
Sent: Tue, May 14, 2019 8:43 am
Subject: Re: [sikmleaders] KM for Data Science



Here's a good question, which is a technique that is part of one of KM's other domains - innovation: "Ask Why five times.."

If there was a primary data science function, would it be gathering/manipulating data, or analyzing information to uncover insights, aka Knowledge, or other?.

If there is a primary DS function, that is where I would focus my initial KM efforts. 

Douglas Weidner
Chief CKM Instructor

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 
What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.

Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 

There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 

I was wondering if anyone was aware of current tools that are used for this purpose. 



Guillermo A. Galdamez
 

Hi,

I am not familiar with any specific tools, but a while back, one of my colleagues wrote a brief article on applying semantic graphs to support data science efforts:

I know this doesn't directly addresses your question, but I hope you find it useful.

Best,

Guillermo

On Tue, May 14, 2019 at 11:05 AM abhinav.karale@... [sikmleaders] <sikmleaders@...> wrote:
 

What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.


Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 


There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 


I was wondering if anyone was aware of current tools that are used for this purpose. 


Frank Guerino
 

Hi Abhinov,

 

Most responses you’ll get (as you can see from those that have come in) will focus on general KM frameworks and practices so I’ll focus on something a little different, which are IT frameworks and tools that are known for successfully espousing better KM in software related areas.  And yes, Data Science is a software domain because tools like Excel do only so much.

 

While I don’t know enough about your situation, keep in mind that one of the big problems with Data Science (a.k.a. Informatics) is that it is often allowed to occur outside of IT organizations (i.e. in the Business), which often does not know about or follow general IT best practices, which include both explicit and implicit KM best practices.  The goal will be to move Data Science/Informatics professionals towards operating far more like software development teams in IT.

 

As a result of the above, the first thing I’d point you to are common System/Software Development Life Cycles (SDLCs).  There are two types:

 

  • Traditional Waterfall (used with big legacy applications and with high risk applications)
  • Agile (used with smaller, more modern, and low risk applications). (Most Data Science/Informatics fits here.)

 

Setting SDLC implementation and governance practices as your internal policies and standards and forcing Data Science/Informatics professionals to follow them just like all software developers will cause your internal KM maturity to spike.  For example, registering solutions in centralized repositories, documenting requirements, plans, design specs, design reviews, testing specs, deployment & support docs, etc. (and storing everything in centralized solutions that make cataloging, sharing and searching easy).

 

The second thing I’d point you to are actual SDLC related “tools/technologies” that facilitate better KM practices…

 

  • The Atlassian Technology Suite is used by most modern development teams…
    • Confluence: Built in centralized Wiki with very powerful documentation, persistence, linking, sharing and search features.
    • BitBucket (Coupled with FishEye for search): Source code control (built on Git) for historical registration, tracking, auditing, and reuse of Software Code
    • Jira: Issue/Defect/Task registration, assignment, sharing, tracking.
  • Artifactory: A centralized staging repository for deployable and reusable software artifacts (binaries, libraries, etc.) that espouses reuse and higher deployment, testing, and operational efficiencies.
  • HipChat: An integrated Instant Messaging platform (if you don’t already have a corporate alternative)
  • Bamboo: A continuous integration and continuous delivery (CI/CD) automation tool that is very much like Jenkins if you don’t already use it or something like it.

 

Establishing “consistent” policies and standards that require your Data Science/Informatics professionals to use the same tools for the same things at the same points in the SDLC will also cause your KM maturity to spike.

 

If your Data Science/Informatics professionals are, in fact, outside of IT, the challenge will be to force them to act like IT.  Just giving them the same tools that IT professionals use will yield “some” organic growth but not enough.  Forcing them to define and establish SDLC policies and standards will have to come from the top/leadership.

 

I hope this helps.

 

My Best,

 

Frank

--

Frank Guerino, Principal Managing Partner

The International Foundation for Information Technology (IF4IT)
http://www.if4it.com
1.908.294.5191 (M)

Guerino1_Skype (S)

 

 

From: SIKM Leaders on behalf of SIKM Leaders
Reply-To: SIKM Leaders
Date: Tuesday, May 14, 2019 at 10:57 AM
To: SIKM Leaders
Subject: [sikmleaders] KM for Data Science

 

 

What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.

 

Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough). 

 

There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc. 

 

I was wondering if anyone was aware of current tools that are used for this purpose. 


Stan Garfield