Teaching data curation / knowledge management for data scientists #data-science #curation #learning


 

Hi everyone …

 

I’ve been developing a 12-month taught masters degree syllabus in data science. The course has the objective of training complete data scientists, who will learn how research works through the application of analytical tools to appropriate case-studies.

 

The links to the first four lessons are below (out of an eventual 20 plus 4 electives). They cover a solid grounding in data lifecycle management and I’d appreciate your thoughts and feedback on the work to date.

 

  • Lesson 1: Introduction to data as a science (view)
  • Lesson 2: Research and experiments with data (view)
  • Lesson 3: Probability, randomness, and the risk of de-anonymization (view)
  • Lesson 4: Sampling, data distribution, and secure data custody (view)

 

Each lesson is structured around, and defined by, the following four topics:

 

  • Ethics: determine the social and behavioural challenges posed by a research question;
  • Curation: establish the research requirements for data collection and management;
  • Analysis: investigate, explore and analyse research data;
  • Presentation: prepare and present the results of analysis to promote a response;

 

The initial pedagogy and lesson outcomes were funded with a small grant from the Gates Foundation, and I’ve been developing it slowly ever since as different clients fund the extension.

 

As background, Data as a Science is based on the Sloyd model of technical training. Each lesson starts with a research question, and progresses by teaching a complete, and practical, set of skills. Case-studies and tutorials are drawn from biomedical science and public health, and the course is accessible to anyone with an interest in data. If you’re interested in all the lesson outcomes, you can see them it the Github repository for the course: https://github.com/whythawk/data-as-a-science/issues

 

Thanks, and I hope you find it interesting.

 

Regards

 

Gavin

 

 

>--------------------<

Gavin Chait is a data scientist and development economist at Whythawk.

uk.linkedin.com/in/gavinchait | twitter.com/GavinChait | gavinchait.com

 


Frank Guerino
 

Hi Gavin,


Congratulations on the grant and your progress.  I think this is great and very useful.  I’ll definitely look into it.

 

My Best,


Frank

--

Frank Guerino, Principal Managing Partner

The International Foundation for Information Technology (IF4IT)
http://www.if4it.com
1.908.294.5191 (M)

Guerino1_Skype (S)

 

 

From: <main@SIKM.groups.io> on behalf of Gavin Chait <whythawk@...>
Reply-To: <main@SIKM.groups.io>
Date: Thursday, November 12, 2020 at 9:37 AM
To: <main@SIKM.groups.io>
Subject: [SIKM] Teaching data curation / knowledge management for data scientists

 

Hi everyone …

 

I’ve been developing a 12-month taught masters degree syllabus in data science. The course has the objective of training complete data scientists, who will learn how research works through the application of analytical tools to appropriate case-studies.

 

The links to the first four lessons are below (out of an eventual 20 plus 4 electives). They cover a solid grounding in data lifecycle management and I’d appreciate your thoughts and feedback on the work to date.

 

  • Lesson 1: Introduction to data as a science (view)
  • Lesson 2: Research and experiments with data (view)
  • Lesson 3: Probability, randomness, and the risk of de-anonymization (view)
  • Lesson 4: Sampling, data distribution, and secure data custody (view)

 

Each lesson is structured around, and defined by, the following four topics:

 

  • Ethics: determine the social and behavioural challenges posed by a research question;
  • Curation: establish the research requirements for data collection and management;
  • Analysis: investigate, explore and analyse research data;
  • Presentation: prepare and present the results of analysis to promote a response;

 

The initial pedagogy and lesson outcomes were funded with a small grant from the Gates Foundation, and I’ve been developing it slowly ever since as different clients fund the extension.

 

As background, Data as a Science is based on the Sloyd model of technical training. Each lesson starts with a research question, and progresses by teaching a complete, and practical, set of skills. Case-studies and tutorials are drawn from biomedical science and public health, and the course is accessible to anyone with an interest in data. If you’re interested in all the lesson outcomes, you can see them it the Github repository for the course: https://github.com/whythawk/data-as-a-science/issues

 

Thanks, and I hope you find it interesting.

 

Regards

 

Gavin

 

 

>--------------------<

Gavin Chait is a data scientist and development economist at Whythawk.

uk.linkedin.com/in/gavinchait | twitter.com/GavinChait | gavinchait.com