Re: KM for Data Science #data-science #knowledge-graph
Most responses you’ll get (as you can see from those that have come in) will focus on general KM frameworks and practices so I’ll focus on something a little different, which are IT frameworks and tools that are known for successfully espousing better KM in software related areas. And yes, Data Science is a software domain because tools like Excel do only so much.
While I don’t know enough about your situation, keep in mind that one of the big problems with Data Science (a.k.a. Informatics) is that it is often allowed to occur outside of IT organizations (i.e. in the Business), which often does not know about or follow general IT best practices, which include both explicit and implicit KM best practices. The goal will be to move Data Science/Informatics professionals towards operating far more like software development teams in IT.
As a result of the above, the first thing I’d point you to are common System/Software Development Life Cycles (SDLCs). There are two types:
- Traditional Waterfall (used with big legacy applications and with high risk applications)
- Agile (used with smaller, more modern, and low risk applications). (Most Data Science/Informatics fits here.)
Setting SDLC implementation and governance practices as your internal policies and standards and forcing Data Science/Informatics professionals to follow them just like all software developers will cause your internal KM maturity to spike. For example, registering solutions in centralized repositories, documenting requirements, plans, design specs, design reviews, testing specs, deployment & support docs, etc. (and storing everything in centralized solutions that make cataloging, sharing and searching easy).
The second thing I’d point you to are actual SDLC related “tools/technologies” that facilitate better KM practices…
- The Atlassian Technology Suite is used by most modern development teams…
- Confluence: Built in centralized Wiki with very powerful documentation, persistence, linking, sharing and search features.
- BitBucket (Coupled with FishEye for search): Source code control (built on Git) for historical registration, tracking, auditing, and reuse of Software Code
- Jira: Issue/Defect/Task registration, assignment, sharing, tracking.
- Artifactory: A centralized staging repository for deployable and reusable software artifacts (binaries, libraries, etc.) that espouses reuse and higher deployment, testing, and operational efficiencies.
- HipChat: An integrated Instant Messaging platform (if you don’t already have a corporate alternative)
- Bamboo: A continuous integration and continuous delivery (CI/CD) automation tool that is very much like Jenkins if you don’t already use it or something like it.
Establishing “consistent” policies and standards that require your Data Science/Informatics professionals to use the same tools for the same things at the same points in the SDLC will also cause your KM maturity to spike.
If your Data Science/Informatics professionals are, in fact, outside of IT, the challenge will be to force them to act like IT. Just giving them the same tools that IT professionals use will yield “some” organic growth but not enough. Forcing them to define and establish SDLC policies and standards will have to come from the top/leadership.
I hope this helps.
Frank Guerino, Principal Managing Partner
The International Foundation for Information Technology (IF4IT)
From: SIKM Leaders on behalf of SIKM Leaders
Reply-To: SIKM Leaders
Date: Tuesday, May 14, 2019 at 10:57 AM
To: SIKM Leaders
Subject: [sikmleaders] KM for Data Science
What tools do data science teams typically use to manage knowledge? The general problem is that data scientists struggle to find if any previous work was done on a particular problem, leading to re-writing code and re-discovering insights--both which waste time and resources.
Airbnb had a knowledge problem and wrote an article about it in 2016 (https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091) and even created an open-source tool called Knowledge Repo (which didn't really take off, probably because it wasn't rich enough).
There seems to be a need for more intelligent knowledge management platforms for data science teams that, for example, allow one to upload URLs, datasets, code, etc. in one place and can analyze package use, comments, etc.
I was wondering if anyone was aware of current tools that are used for this purpose.