Re: Data-wrangling as a service #metadata #data-science
Hi Gavin,
Thanks for putting your solution out there. I'd be curious to
hear how you compare the features of your product to something
like OpenRefine?
I helped a client with something a bit similar a couple of years back. We were aggregating and cleaning metadata from a range of highly diverse article catalogues as part of a semantic auto-classifier engine. This was only a proof of concept so it was much less user-friendly, the wrangling was done through custom JSON configuration files and XPath selectors. With my project, there was always the intent to hand over the ongoing repository management responsibilities to a local staffer, but there just wasn't the expertise or commitment within the organisation to do so.
To be honest, my feeling is that the vast majority of people
would lack the data analysis experience to effectively configure any
wrangling tool, with a "no coding" interface or otherwise.
The tool does seem like a nice productivity booster for a data analyst/wrangler though, and you could hand off the data upload and validation process to an administrative person once the mappings were in place. To commercialise what you've got, I'd pitch it as more of a consultancy/maintenance engagement with SaaS provisioning of instances, rather than being a pure SaaS solution.
Happy to take the discussion offline if you'd like to talk
further.
Cheers,
Stephen.
PS. Curious to hear about your past Australian project since
that's where I'm based!
==================================== Stephen Bounds Executive, Information Management Cordelta E: stephen.bounds@... M: 0401 829 096 ====================================
Hi all, got a question and something to show …
One of the great challenges I face when implementing open data programs is the work involved in preparing data for public release. Often this becomes a blocker when persuading data owners to commit to their own project, and results in limited data release.
Every quarter, ongoing since 2016, I have collated about 300 different datasets from local authorities across the UK as part of my open data https://sqwyre.com project. This is a service mapping business history in every commercial property across the UK. This is not scraping. It is tedious and repetitive data wrangling, converting multiple files, in multiple formats, into a single schema to permit analysis, comparisons and further enrichment.
I have built a collection of tools that offers a simple collaborative drag 'n drop interface to support our data wranglers, and create a json file that permits the schema to be validated according to the https://frictionlessdata.io/ standard, collaboration on wrangling workflows, and the methodology to be repeated and tracked. My objective is wrangling simplicity, complete audit transparency, and at speed.
Here’s a brief video overview showing what that looks like in action: https://www.youtube.com/watch?v=HQw8IBLUnL4
I would like to extract this from my application and create a stand-alone SaaS drag 'n drop data wrangling tool useful for data owners and managers, journalists, and researchers looking to rapidly, and continuously, normalise and validate any messy spreadsheets using a simple series of steps, and without any coding knowledge.
Is this something that might be of interest to the SIKM community? What do you currently use, or recommend for simple, easy-to-use data wrangling? If you are interested in my system, is there anyone who may wish to see about potential collaboration?
Thanks
Gavin
>--------------------<
Gavin Chait is a data engineer and development economist at Whythawk.
uk.linkedin.com/in/gavinchait | twitter.com/GavinChait | gavinchait.com