Semantic Web + Big Data = ? #data-science #semantic


Matt Moore <innotecture@...>
 

If the conversation can remain civil, I want to push this issue further. Here are some thoughts.

1. Semantic tech has evolved a great deal of the last decade. I don't want to give the impression that I think this stuff has just appeared in the last year. However it's not top of mind as a distinct category for many organisations. E.g. analytics & BI, mobility & cloud are the top 3 CIO priorities for 2013 for Gartner. And in Australia, I don't encounter a lot of people who are conversant in it - even among the tech community. It's just not mainstream. My impression is that it's more mainstream in the US and the UK. But unlike Big Data, mobile devices, social media or cloud, it hasn't had its day in the sun. We haven't hit an inflection point yet here.

2. Within enterprises, we're starting to see things move from two different directions. The first is ECM implementations, esp. SharePoint. Most Australian organisations have, or are developing, some kind of ECM platform - with SharePoint as the clear market leader. We're starting to see taxonomy development becoming mainstream. Organisations are beginning to think about using tools that automate different part of the taxonomy management process. The other end is from the data analytics group - who traditionally have not handled unstructured data. Textual analytics tools (e.g. sentiment analysis, semantic analysis) are getting added into the mix. The former approach is centred around content management and the latter is centred around insight from content and they are slowly meeting in the middle (which reminds me, I should drop a line to the folks at Pingar, haven't spoken to them for ages). This involves thinking about master data management across
multiple kinds of systems - ECM (content), ERP (transactional) and others. I have yet to encounter many organisations in Australia that are mature in this space yet - and by "mature", I mean having done it in a systematic way for a while.

3. To me, Linked Open Data is something potentially very big (may be I'm wrong on this). But it is perceived in Australia as being only a government thing - and only a minority of people in govt are interested. So far.

[As regards conferences, I would expect a semantic tech conference to be all over big data like a bad rash and a data-oriented conference to be dipping its toe into semantic & taxonomy. One is the hot sibling that everyone wants to date right now and the other isn't]

But the question here is what does all this mean for people on this list? I'd love to hear some responses.


Thomas Vander Wal
 

This past Fall I was back at a semantic web conference for the first time in 6 years and found very little had changed in the core technologies, as they were still computationally expensive and interfaces quite poor. Many of the technologies that were put in the spotlight as "they have done it" used traditional sem web technology lightly, but used some natural language processing, pattern matching, and term trending for relevance. The RDF triple is incredibly slow and expensive approach.

Many of the intelligent tools around the web/mobile have been doing this for quite a while. Taking known data and its semantics and putting it into JSON or other structured strings, much like XRD has been pitching and doing for around a decade allows for much quicker parsing of data and analyzing it. The places where term validation is needed to double check (person names or product names) and some other bits can get checked against a semantics repository often using seb web practices to validate and add a contextual anchor.

I was working with a large US tech company helping them with search around employee contributed data and they were missing much of the metadata around the contributions would make it more relevant and useful. They had gone very simple as they had only been considering sem web for this and it made for a very slow experience. I walked them through what metadata was needed, how to store it, and how to verify it in their algorithms. They found a solution that was more than quick enough after they rewrote their algorithms and models to use it, particularly for disambiguation of same term and divergent meanings.

Sem web technologies have strong value, but not as much in the core, but in the verification.

Matt, related to conference rarely do I find the people who understand what is under technologies or behind buzzwords are at conferences, particularly at vendor space and enterprise focussed conferences. The folks with good knowledge aren't there, but those who can talk in generalities are. Personally, I like conferences that are in the defining and discovery phases as people who know and are passionate about the subject are there. Big Data is well beyond that phase.

--- In sikmleaders@yahoogroups.com, Matt Moore <innotecture@...> wrote:

If the conversation can remain civil, I want to push this issue further. Here are some thoughts.

1. Semantic tech has evolved a great deal of the last decade. I don't want to give the impression that I think this stuff has just appeared in the last year. However it's not top of mind as a distinct category for many organisations. E.g. analytics & BI, mobility & cloud are the top 3 CIO priorities for 2013 for Gartner. And in Australia, I don't encounter a lot of people who are conversant in it - even among the tech community. It's just not mainstream. My impression is that it's more mainstream in the US and the UK. But unlike Big Data, mobile devices, social media or cloud, it hasn't had its day in the sun. We haven't hit an inflection point yet here.

2. Within enterprises, we're starting to see things move from two different directions. The first is ECM implementations, esp. SharePoint. Most Australian organisations have, or are developing, some kind of ECM platform - with SharePoint as the clear market leader. We're starting to see taxonomy development becoming mainstream. Organisations are beginning to think about using tools that automate different part of the taxonomy management process. The other end is from the data analytics group - who traditionally have not handled unstructured data. Textual analytics tools (e.g. sentiment analysis, semantic analysis) are getting added into the mix. The former approach is centred around content management and the latter is centred around insight from content and they are slowly meeting in the middle (which reminds me, I should drop a line to the folks at Pingar, haven't spoken to them for ages). This involves thinking about master data management across
multiple kinds of systems - ECM (content), ERP (transactional) and others. I have yet to encounter many organisations in Australia that are mature in this space yet - and by "mature", I mean having done it in a systematic way for a while.

3. To me, Linked Open Data is something potentially very big (may be I'm wrong on this). But it is perceived in Australia as being only a government thing - and only a minority of people in govt are interested. So far.

[As regards conferences, I would expect a semantic tech conference to be all over big data like a bad rash and a data-oriented conference to be dipping its toe into semantic & taxonomy. One is the hot sibling that everyone wants to date right now and the other isn't]

But the question here is what does all this mean for people on this list? I'd love to hear some responses.


plessons@...
 

 

Interesting piece Matt. Got the discussion rolling. You invited responses.

Over the years, I have been frustrated as much as anyone else by the abstract thinking of terms like  “semantic web”… without seeing any really meaningful solutions. The metaphor of the "big data" and pregnancy pattern matching aside (which I have to say, I find a somewhat  curious and displaced metaphor in relation to possibilities for big data) ... I think we are now just beginning to see what is possible as emergent – for example - in some current “small data” visualisation techniques that surround the traditional archival, cultural and social informatics worlds that support social research data management.

A particular example is available here

See http://128.250.230.47:8007/#

(best to use through the use of google chrome browser at the moment) renders from an OAIPMH compliant data set of four different social science resource collections.

 [Note: acknowledgement to my fellow colleagues at the eScholarship Research Centre at the University of Melbourne for this example of an  emerging infrastructure application of “small data”, but which highlights the substantial opportunities of big data].

If one navigates through this contextual explorer, the John Arrowsmith collection of Australian historic maps is probably the easiest data set to understand what is going on here - it takes a bit of playing around to really get this relationship between the different renderings, and the navigation between the visualisation of metadata exploration, the text based context descriptions, entity relationships and temporal descriptions and the content.

This example is built on the application of the Encoded Archival Context for Corporations, Persons and Families (EAC CPF) XML standard which is one of the standards being applied through the Australian National Library’s TROVE metadata harvesting service.

Representatives of the big data community have the potential to learn much from public data management agencies such as Australia’s National Library. The opportunity is to take these core developments and apply them through new expressions of public knowledge management. THis is a bold claim, and may take several years, but I still reckon things will move in this direction despite my friction.

In the US, this sort of background infrastructure and thinking is, in part, being led out of the University of Virginia with the Social Networks and Archival Context Project. It is still influenced by the social research mindset, but these applications have potential in more mainstream ways.  I have always intuitively felt that this focus on context entities described through Encoded Archival Context (EAC) principles could do a great deal to overcome the problems of the complexities of traditional approaches to taxonomies, semantics of related semantic web type applications and even the problems of “semantic interoperability”.

But this takes the immediate focus of big data out of the bedroom and into the realm of public and even public/ private expressions of knowledge management, including into the realm of sustainability. This shift in focus has significant implications for the sorts of innovations that will be pioneered in the coming half decade.

So even with the choices available about where to focus on “big data”  means that we cannot avoid the drift into personal, organisational and societal  politics (and gains) - indeed gender  politics - with all that follows as a result.

KM and indeed “big data” continue to blessed with these fraught attributes which continues to impact on our resiliance to do reform , if that is the perspective we focus on. Big data Iincluding big e-science) has major ramifications for how we manage our economies in sustainable ways.

Richard

 

Richard




Matt Moore <innotecture@...>
 

Richard,
 
Interesting and detailed comments that I will respond to at length when I can. In the interim, I would note that AGIMO have just released a big data strategy paper & call for submissions: http://agimo.gov.au/2013/03/15/released-big-data-strategy-issues-paper/
 
Cheers,
 
Matt

From: "plessons@..."
To: sikmleaders@...
Sent: Friday, 15 March 2013 7:47 AM
Subject: Re: [sikmleaders] Semantic Web + Big Data = ?
 
 
Interesting piece Matt. Got the discussion rolling. You invited responses.
Over the years, I have been frustrated as much as anyone else by the abstract thinking of terms like  “semantic web”… without seeing any really meaningful solutions. The metaphor of the "big data" and pregnancy pattern matching aside (which I have to say, I find a somewhat  curious and displaced metaphor in relation to possibilities for big data) ... I think we are now just beginning to see what is possible as emergent – for example - in some current “small data” visualisation techniques that surround the traditional archival, cultural and social informatics worlds that support social research data management.
A particular example is available here
(best to use through the use of google chrome browser at the moment) renders from an OAIPMH compliant data set of four different social science resource collections.
 [Note: acknowledgement to my fellow colleagues at the eScholarship Research Centre at the University of Melbourne for this example of an  emerging infrastructure application of “small data”, but which highlights the substantial opportunities of big data].
If one navigates through this contextual explorer, the John Arrowsmith collection of Australian historic maps is probably the easiest data set to understand what is going on here - it takes a bit of playing around to really get this relationship between the different renderings, and the navigation between the visualisation of metadata exploration, the text based context descriptions, entity relationships and temporal descriptions and the content.
This example is built on the application of the Encoded Archival Context for Corporations, Persons and Families (EAC CPF) XML standard which is one of the standards being applied through the Australian National Library’s TROVE metadata harvesting service.
Representatives of the big data community have the potential to learn much from public data management agencies such as Australia’s National Library. The opportunity is to take these core developments and apply them through new expressions of public knowledge management. THis is a bold claim, and may take several years, but I still reckon things will move in this direction despite my friction.
In the US, this sort of background infrastructure and thinking is, in part, being led out of the University of Virginia with the Social Networks and Archival Context Project. It is still influenced by the social research mindset, but these applications have potential in more mainstream ways.  I have always intuitively felt that this focus on context entities described through Encoded Archival Context (EAC) principles could do a great deal to overcome the problems of the complexities of traditional approaches to taxonomies, semantics of related semantic web type applications and even the problems of “semantic interoperability”.
But this takes the immediate focus of big data out of the bedroom and into the realm of public and even public/ private expressions of knowledge management, including into the realm of sustainability. This shift in focus has significant implications for the sorts of innovations that will be pioneered in the coming half decade.
So even with the choices available about where to focus on “big data”  means that we cannot avoid the drift into personal, organisational and societal  politics (and gains) - indeed gender  politics - with all that follows as a result.
KM and indeed “big data” continue to blessed with these fraught attributes which continues to impact on our resiliance to do reform , if that is the perspective we focus on. Big data Iincluding big e-science) has major ramifications for how we manage our economies in sustainable ways.
Richard
 
Richard


Matt Moore <innotecture@...>
 

Tom,
 
Good stuff! Some of the technical details went over my head.
 
"Sem web technologies have strong value, but not as much in the core, but in the verification." That I think is interesting. Do you have a specific example of this?
 
BTW The session I was in with the data analytics guys was this one: http://www.meetup.com/datarati/events/101898682/
 
Now Tom is a hardcore geek (and I say that with awe rather than derision) with a background in statistics and modelling. A few things interested me about his session. He'd built an ontology on top of a disparate set of data systems in order to fuse the data together and also to explain the insights generated in KPI terms relevant to different departments*.
 
*And he also referenced the Cynefin framework and some sensemaking malarky.
 
Cheers,
 
Matt


plessons@...
 

 

Thanks Matt, 


Within the spirit of the heading: Semantic Web + big data =? ...........


As always, a short and to the point response (below) and embedded with somewhat uncharacteristic insights into your world of interests. Interesting discussion paper by AGIMO – in fact, I thought a very good one.  It is a tough landscape for government agencies like AGIMO to have to contend with – this trend towards open data, whilst recognising clearly how commitments to such noble values can also have unintended consequences. This will test us all, including governments in the coming decades.


For my part, I think we will need to develop new types of trust networks that cascade through our economies in order  for the benefits of big data to be realised down to the level of “citizen”. But, notions of cascading trust networks can be mightily misunderstood as well. At heart, I would argue that this should be a KM issue because the truism that knowledge gives access to power (and potentially privilege) is true. KM has to be able to develop a coherent approach to the competing interests of different knowledge discourse, because without this, complex systems have a propensity to evolve into “laws of the jungle” or “impositions of hierarchy”. So idealised notions of a semantic web cannot necessarily by pass this this challenge of entrenched cultural differences.


KM is therefore no panacea for expressions of an ”ideal state” - or a semantic nanny state. Quite the reverse – if it is to develop effectively as a discipline it has to be grounded in the mess, the chaos and the ambiguity of competing interests (differences) and competition and the expressions of these at all levels of hierarchy, including at local, regional, national and international levels. So, at one level, there is a relationship between KM and economics as Patrick Lambe has elegantly argued last year.


But curiously I think there is a stronger relationship between KM and the principles of big data with its link back to different and evolving expressions of semantic and syntactic mark up.  However,  I am not convinced the world of W3, RDF and OWL are on the right track and I hope there is not a stampede of corporate vendors into such approaches based on any perceived benefits of “data analytics” and slick marketing.


The AGIMO discussion paper appears to be quite considered about these matters. I hope, as it seems UTS is doing, that we can build the capacity of our educational institutions to better equip graduates to realise the benefits of big data. I don’t mean to be trite about this, but our collective futures might well depend on us getting this right. 


Thanks for the heads up and the contribution to "civic discussion".


 

Richard




----- Original Message -----
From:
sikmleaders@...

To:
"sikmleaders@..."
Cc:

Sent:
Mon, 18 Mar 2013 16:04:25 -0700 (PDT)
Subject:
Fw: [sikmleaders] Semantic Web + Big Data = ?


 

Richard,
 
Interesting and detailed comments that I will respond to at length when I can. In the interim, I would note that AGIMO have just released a big data strategy paper & call for submissions: http://agimo.gov.au/2013/03/15/released-big-data-strategy-issues-paper/
 
Cheers,
 
Matt

___
>
   


Matt Moore <innotecture@...>
 

Richard,
 
A particular example is available here
See http://128.250.230.47:8007/
 
I like this tool. I will send you some comments off-list.
 BTW I agree with you about the importance of visualisation to both small & big data.
 
Cheers,
 
Matt

________________________________
From: "plessons@netspace.net.au" <plessons@netspace.net.au>
To: sikmleaders@yahoogroups.com
Sent: Friday, 15 March 2013 7:47 AM
Subject: Re: [sikmleaders] Semantic Web + Big Data = ?



Interesting piece Matt. Got the discussion rolling. You invited responses.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Over the years, I have been frustrated as much as anyone else by the abstract thinking of terms like  “semantic web”… without seeing any really meaningful solutions. The metaphor of the "big data" and pregnancy pattern matching aside (which I have to say, I find a somewhat  curious and displaced metaphor in relation to possibilities for big data) ... I think we are now just beginning to see what is possible as emergent – for example - in some current “small data” visualisation techniques that surround the traditional archival, cultural and social informatics worlds that support social research data management.
A particular example is available here
See http://128.250.230.47:8007/#
(best to use through the use of google chrome browser at the moment) renders from an OAIPMH compliant data set of four different social science resource collections.
 [Note: acknowledgement to my fellow colleagues at the eScholarship Research Centre at the University of Melbourne for this example of an  emerging infrastructure application of “small data”, but which highlights the substantial opportunities of big data].
If one navigates through this contextual explorer, the John Arrowsmith collection of Australian historic maps is probably the easiest data set to understand what is going on here - it takes a bit of playing around to really get this relationship between the different renderings, and the navigation between the visualisation of metadata exploration, the text based context descriptions, entity relationships and temporal descriptions and the content.
This example is built on the application of the Encoded Archival Context for Corporations, Persons and Families (EAC CPF) XML standard which is one of the standards being applied through the Australian National Library’s TROVE metadata harvesting service.
Representatives of the big data community have the potential to learn much from public data management agencies such as Australia’s National Library. The opportunity is to take these core developments and apply them through new expressions of public knowledge management. THis is a bold claim, and may take several years, but I still reckon things will move in this direction despite my friction.
In the US, this sort of background infrastructure and thinking is, in part, being led out of the University of Virginia with the Social Networks and Archival Context Project. It is still influenced by the social research mindset, but these applications have potential in more mainstream ways.  I have always intuitively felt that this focus on context entities described through Encoded Archival Context (EAC) principles could do a great deal to overcome the problems of the complexities of traditional approaches to taxonomies, semantics of related semantic web type applications and even the problems of “semantic interoperability”.
But this takes the immediate focus of big data out of the bedroom and into the realm of public and even public/ private expressions of knowledge management, including into the realm of sustainability. This shift in focus has significant implications for the sorts of innovations that will be pioneered in the coming half decade.
So even with the choices available about where to focus on “big data”  means that we cannot avoid the drift into personal, organisational and societal  politics (and gains) - indeed gender  politics - with all that follows as a result.
KM and indeed “big data” continue to blessed with these fraught attributes which continues to impact on our resiliance to do reform , if that is the perspective we focus on. Big data Iincluding big e-science) has major ramifications for how we manage our economies in sustainable ways.
Richard
 
Richard

See http://128.250.230.47:8007/#";