[iaoa-swao] Domain Experts paper - resubmission for review and replies to comments

Mon Jun 5 20:02:31 CEST 2017

With attachments

Andrea Westerinen
T: 425.891.8407
arwesterinen at gmail.com or andreaw at ninepts.com
organizingknowledge.blogspot.com

On Mon, Jun 5, 2017 at 1:47 PM, Andrea Westerinen <arwesterinen at gmail.com>
wrote:

> Attached.
>
> Andrea Westerinen
> T: 425.891.8407
> arwesterinen at gmail.com or andreaw at ninepts.com
> organizingknowledge.blogspot.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.ovgu.de/pipermail/iaoa-swao/attachments/20170605/a75b545e/attachment-0001.html>
-------------- next part --------------
- - - - -
KenB:

My review was brief.  Here it is:

The word "meryonymic" should be "meronymic"

<arw>Changed.</arw>

On page 4, near the bottom, "More just manipulating" should be "More than just manipulating"

<arw>Changed.</arw>

The survey of different tools for ontology development is the best part of the paper.  The subsequent part is work in progress that shows what the issues are and discusses possible solutions but does not completely solve any of the problems.  I grant that this is interesting, but it is rather too speculative for an archival journal.

<arw>As discussed at the last meeting, changed the text to be less about what we could do versus what we did and why.</arw>

It would be helpful to include a few sentences in the introduction that explain how this paper is relevant to the special issue.

<arw>I added the following paragraph to the Introduction, 'The role and engineering of ontologies for the Big Data and Linked Data communities were two of the basic problems addressed in the Ontology Summit 2014 Communiqué (Gruninger, Obrst et al., 2014). However, in order to use ontologies, they must be understandable and accessible to the members of the communities, and correctly reflect the necessary domain concepts. This requires that the concepts and relationships in an ontology be presented in a way that is familiar to the users and the experts.'</arw>

- - - - -
MikeB:

An important paper which sets out a vision for presenting ontology content to domain experts, something that is often ignored or underestimated in the field. The range and type of domain review artifacts is in line with my own experiences and efforts on FIBO. There are numerous grammatical and styling issues, and the recurrent problem of assuming the reader assumes that to be an ontology is to be an OWL ontology – that has to be rectified in order to be in line with the AO editorial policy, which also discourages detailed technical discussions – however the technical content here is important for understanding how the visualization techniques are made to work. The final section sets out directions for future work and the abstract and introduction should mention this. Also be careful about your assertions about UML - this has a very specific meaning and I think what you mean is UML-like.

<arw>Specific changes are discussed below.</arw>

Details below:

Ontology versus OWL Ontology

The introduction seems to conflate the notion of “what is an ontology” with “What is an ontology in OWL” as though OWL is the only applicable ontology language and that anyone who is engaging with ontology is doing so using the language of OWL. This needs to be reframed, starting with a clear statement of what the authors consider to be “an ontology” and describing the Protégé and OWL ecosystem as an example of wisely used syntax and tooling to achieve these ends, not a necessary precondition to doing ontology. See my suggested introduction to Section 2.2, which could even be brought forward to the earlier introduction.

See e.g. Section 2.1 for the right kind of language: “…formal language such as OWL”

<arw>Rewrote some of the sentences in the Introduction, such as 'Even the use of the word, "ontology", conveys complexity and the need to learn new ways of expressing and representing concepts.  Description or common logic languages, modeling methodologies, and ontology development tools, while highly useful, take time to comprehend and require experience to use effectively.'</arw>

Section 2.2 Visualizing Ontologies
Preface this discussion with a statement that

“In this section, only visualization for OWL based ontologies is discussed. Other ontology formats may have their own visualization methods but the ubiquity of the OWL ecosystem for tooling and ontology modelling means that RDF and OWL visualization tools are of the most immediate interest for collaborative work on ontologies”

Or something like that.

<arw>I added an opening paragraph to Section 2.2 . . . 'There are various techniques to define ontologies in an unambiguous, computer-understandable way. Here, we focus on the use of the Resource Description Framework ("RDF", 2014) and Web Ontology Language ("OWL", 2012) due to the ubiquity of the OWL ecosystem for tooling and development. This means that RDF and OWL visualization tools are of the most immediate interest for collaborative work on ontologies.'</arw>

Language and Styling
Typos / Grammar
"The steps in the UPON Lite methodology are discussed in more detail in Section 4. And,
it is interesting to note that its supporting tooling is a spreadsheet, which is discussed in Section 2.3."

Lose the “And, “ construction, also the “which” is a bit vague. Try:

"The steps in the UPON Lite methodology are discussed in more detail in Section 4. It is interesting to note that its supporting tooling is a spreadsheet, the use of which is discussed in Section 2.3."

<arw>I removed the second sentence in response to BrandonW's comments below.</arw>

References
Should be moved from footnotes and added to the References section.

<arw>The guidelines state that 'For a passing reference to a website in text, the URL is sufficient; no reference list entry is needed.' So, I moved most of the references to the back section (since they are more than passing references) and for Protege, added the URL in-line (since it is only referenced once).</arw>

Style of references should be a number [1] corresponding to the references number in the References section. Where this is not a published paper or book, please give the web address along with the date this was retrieved (1. Blah de Blah, from www.abc.de retrieved on 03 March 2017)

<arw>Actually this is not what the APA Guidelines state. They require . . . 'APA in-text citations should include the author's last name followed by the year of publication. All publications cited in the text should be presented in an alphabetical list of references at the end of the manuscript.'</arw>

Specific Comments
2.3 Ontologies and Spreadsheets
Have you looked at ANZO from Cambridge Semantics? I know you say there are many more but this is one that is getting a lot of traction.

<arw>Yes, but it did not add anything and is not open-source. I did not want to start going down a path of vendor-specific products.</arw>

3. Ontograph

Where it says
"But, a single diagram, such as UML, can also generated."

Do you mean it generates a graph that is explicitly in UML (e.g. UML Class Diagram notation), or do you mean generating single diagrams in the same way UML does, but for OWL. I assume the latter but it reads like the former. Also a “be” is missing. Also don’t start a sentence with But. Try

"A single diagram, as with UML, can also be generated."

<arw>Changed the wording to 'It is also possible to generate a single, UML-style class diagram.'</arw>

P7:
"Note that the "Graph Type" selection, shown in the figures above, defines the content to
be graphed (class/inheritance information, or a property, instance/enumeration or UML
diagram). It is possible to select more than one type (in which case, all the generated . . ."

UML is a very specific modeling language and its semantics are not the semantics of OWL. There are deep an unresolved conversations within the OMG itself about whether e.g. OWL classes may be considered as extensions of UML classes or not. You will be opening this up to a whole world of pain if you simply say “UML Diagrams” unless they are somehow compliant to the UML metamodel. And if they are, they are likely not good ontology models. There are currently 2 ways of extending or aliasing UML to provide visual representation of OWL model content: the Ontology Definition Metamodel (ODM) and the Semantics for Information Modeling and Federation (SMIF) models. Any discussion of generating “UML diagrams” would have to relate to one or other of these or be a third alternative to what they have done – a very ambitious claim.

If you say “UML-style” diagrams throughout, this should insulate you from that.

Unless you provide XMI output that can be ingested into a UML tool – effectively transforming the OWL ontology into a UML data model?

<arw>Thanks for catching this.  I changed everything to "UML-style".</arw>

Section 4 Conclusions and Future Work
I think this is good, but since this is setting out possible future thinking, I would add something in the abstract and the introduction to the effect that this paper is also intended to stimulate discussion on future directions for the techniques described here. I think this is important work.

<arw>I added a sentence to the abstract, and concluded the introduction with this paragraph . . . 'The following paper reviews development efforts in this space. Related work is reviewed in Section 2. We describe our work and experiences with a custom graphing tool (OntoGraph) and supporting textual documentation in Section 3. Then, in Section 4, we discuss how the OntoGraph and spreadsheet tooling has evolved, and areas of investigation for future development. One of the goals in presenting this work is to stimulate discussion on the requirements, technologies and techniques involved in working on ontologies with domain experts.'</arw>

- - - - -
BrandonW:

Thank you for submitting the paper, "Ontology Development by Domain Experts (Without Using the "O" Word)". This paper suggests that giving domain experts the ability to create ontologies, which otherwise would have been created by ontology experts (ontology engineers), is a useful endeavour. The authors illustrate this through examples of tools developed by the authors (or within their LLC), specifically OntoGraph and a worksheet application which lays out classes, properties, relationships, etc. in a spreadsheet format for users (domain experts) to fill in. Of particular interest is using spreadsheets for ontology authoring and a web based graph generator for visualizing ontologies, and/or specific components (classes, properties, etc.) of an ontology. These tools were created and devised to reduce the gap between ontology authoring and editing tools and commonly used, or ubiquitous, tools like spreadsheets.

Overall, this paper raises some interesting questions as to what it means to create an ontology---and, of course, by extension, what an ontology actually is (the elusiveness continues). Learning how to create an ontology does have a fairly steep learning curve. In fact, a previous ontology summit was dedicated to delineating the topics a student would have to be exposed to in order to be considered knowledgeable in all the relavent topics associated with ontology development---it was not a small list.

I did find parts of this paper to be potentially contradictory. In many respects ontology development is simply an interdisciplinary exercise between someone who understands the logic and modeling methodologies (and likely RDF/OWL) and someone else focused and/or trained in a different domain. This appears evident in the paper as the domain experts described are not actually creating the ontology, they simply populating it with values. There is a distinction. The domain experts are not making new relationships and developing the semantics of those newly created entities.

<arw>Having domain experts create new classes and properties is a future goal of the work, as noted in Section 4. 'The OntoGraph and spreadsheet tooling assumes that an appropriate, domain ontology exists for a domain, and is defined using an OWL syntax. We are extending the tooling to capture new concepts and properties (and their related data) from spreadsheets, as well as comments and tracked changes on existing concepts and properties. (Also important in this approach is to capture the provenance of the additions and changes.) The intent is to follow the workflow defined by UPON Lite . . .'</arw>

The authors allude to an advantage to increasing efficiency (and cost) by reducing complexity, but there is much work still needed to address the potential efficacy of this approach. It would be great of the authors addressed this somewhere in the work.

<arw>I am not sure where increased efficiency is mentioned (or alluded to). The only claim is about the need to have ontologies reviewed by domain experts, who are not versed in any of the formal logics or logic languages. If there is a specific text that needs clarification, I would be happy to change it.</arw>

My main concern regarding this submission is, what does this approach add, or do differently and/or better, as compared with other tools in this space---the ones mentioned as well as other spreadsheet (CSV, Excel, etc.) to RDF applications? An expert can set up a spreadsheet with arbitraty column headings that are not necessarily part of the RDF or OWL nomenclature. It's unclear what happens next, and how that is used and assessed (or validated).

<arw>In Conclusions, I clarified the first paragraph to read . . .'Section 3 focused on tooling to document existing OWL ontologies and share their content beyond ontology experts. The tooling is unique from that discussed in Section 2 in that it:

•	Combines both visual and spreadsheet (textual) output, versus being focused only on visualization or only using spreadsheet data
•	Provides visual output that can be customized to business or industry conventions, versus mandating a specific visualization format
•	Can simplify visualizations by collapsing multiple edges between two nodes to a single edge
•	Is general purpose, versus targeted at a particular industry or domain
•	Can create flexible spreadsheet outputs, versus restricting spreadsheet cells, columns or rows using templates and hard-coded conventions
•	Supports the current version of OWL (OWL 2)
•	Is available and maintained as open-source (https://github.com/NinePts)'</arw>

As an aside, it would be interesting to see how this method compares to ML classification or other frameworks like OntoLex and/or Lemon? This is perhaps stretching the essence of the paper, but acknowledgement might be appropriate if there were a brief discussion section.

<arw>The tooling really has nothing to do, at this point, with ML classification, NLP or lexicons/word senses.  I am not sure how to discuss these concepts as they are currently unrelated.</arw>

comments by section:

1. Introduction
-A good, simple introduction alluding to the amount of resources it takes to create an ontology using domain experts along with a knowledge engineer (or several of each).
-In addition to the "grey" literature, is there a link to the actual NASA ontology?

<arw>There are some articles by one of the scientists mentioned in Earley's paper, but I have not found a specific link to the NASA ontology. Also, this was only referenced as an example and the ontology is not used further in the paper.</arw>

-No need to call out Protege on the second page; the complexity issue holds with all ontology authoring tools---Protege, TopBraid, NeOn, etc., as well as other "vocabulary" or CV tools, for example VocBench.

<arw>Removed the reference to Protege.</arw>

-In the third paragraph, the issue with validation is conflated here. The validation is both ways. The domain expert is checking that the model is describing the semantics of the domain, and the ontologist is (likely) ensuring the logic is valid. It's a symbiant relationship illustrating accuracy vs precision.

<arw>Since this paper is focused on review of ontologies by domain experts (and not logic review by ontologists), I purposely did not raise the reverse validation as it would not have added anything.</arw>

2. Related Work
-Clearly states it's an overview.
-The characteristics of success defined by Bada et al., 2004 is interesting
-In the second paragraph, what is the "backing methodology"?
-Similarly, which method did UPON Lite extend?

<arw>Rewrote the second paragraph to read . . . '. . . choice of an ontology development methodology is extremely important and will impact the entire scope of a project. Many methodologies have been proposed, catalogued and extended over the years (Corcho, Fernandez-Lopez, & Gomez-Perez, 2003, and Sure, Tempich & Vrandecic, 2006). One such methodology, UPON Lite (De Nicola & Missikoff, 2016), extends the Unified Process for Ontology building (UPON) and directly relates to the problems described above. UPON is a cyclical ontology-building method that utilizes Unified Process (UP) and the Unified Modeling Language ("UML", n.d.) for iterative, incremental, and use-case driven development (De Nicola, Missikoff, & Navigli, 2005). Even so, UPON is still dependent on ontology engineers and requires extensive knowledge of modeling techniques. UPON Lite was developed to address the "growing need for simpler, easy-to-use methods for ontology building and maintenance, conceived and designed for end users, … reducing the role of (and dependence on) ontology engineers".'</arw>

-It would be great to name some of the development methodologies.

<arw>If the reader is really interested, they could go to the referenced paper. I am not sure what value a list of methodologies (by name) would provide.</arw>

-The last sentence in 2.1 is re-stated in 2.3. It seems to fit better with the content in 2.3.

<arw>Removed the sentence from Section 2.1.</arw>

-The visualization section is informative. I definitely learned something.

-The second paragraph in section 2.3 isn't needed as it is redundant.

<arw>The goal of the paragraph was to indicate that this is definitely NOT an exhaustive list of tools. I feel that it is important to state this.</arw>

-A CSV is not a spreadsheet. This informaton does seem relevant as CSV is a fairly ubiquitous storage/transfer format. Perhaps amending the section heading would help.

<arw>Clarified the paragraph to read . . . 'A simple approach to importing or exporting data to a spreadsheet is as a set of comma-separated values (CSVs).' Personally, I have done my fair share of exporting a spreadsheet to CSV and converting to an ontology.:-)</arw>

-There is a fair bit of text regarding ROBOT that seems to me to be extraneous.

<arw>The text about ROBOT was the concept of a "template". This is key to understanding how ROBOT works, and also what makes it difficult to set up and different from the OntoGraph/Spreadsheet tooling.</arw>

-is it fair to say the ROBOT, RightField and Populous add DB style restrictions to spreadsheet cells? It seems that is what is being stated. Making that explicit would be helpful.

<arw>I would not say that the listed tools add "DB style restrictions". For me, the latter means integrity constraints and specific datatype requirements. But, the tools provide ways to convert CSV and tab-separated data from spreadsheets to OWL. The templates are the conventions that allow this. For example, for ROBOT, the first line/row must contain names for every column used in the data. The second line/row contains template strings for each column that will be used in the OWL conversion. Then, the remaining rows correspond to OWL classes or individuals. This is stated in the paper.</arw>

3. OntoGraph and Spreadsheet Tooling for Domain Experts
-OntoGraph seems like it addresses some common issues with ontology visualization---especially as it relates to published artefacts.
-Did the settings in figure 1 create the image in figure 3, and do figures 2 and 4 have a similar relationship? It would be great if that were explicit---or if there were an example like that for the user to follow.

<arw>Yes. Modified the text to indicate the customization options that were used.</arw>

-figures 3 and 4 have about 10 prefixes present. I think only three are used, and some nodes do not have a prefix (i.e. :SpatialThing)

<arw>Fixed the missing prefixes (they were not originally defined in FOAF). The code lists all prefixes. They are all used in the properties diagram.</arw>

-I would really like figure 4 and 7 to be much bigger---i.e. readable.

<arw>Me too, but it won't fit on a page.</arw>

The OntoGraph and Spreadsheet Tooling functionality should be in their own sections.

<arw>I separated the discussion into 2 sub-sections.</arw>

-The spreadsheet tooling needs unpacking. The domain experts aren't really developing an ontology, they are simply filling in values to a structure that has been previously set up using rdf and owl components. 

<arw>Yes, this is addressed further in Section 4.</arw>

Also, the SPARQL query sets up the headers for the spreadsheet, or is there meant to be class and relationship information in there already for the domain experts to use (as a go by)?

<arw>Section 3 is clarified to state that the query output was manipulated by a specific program, OntoSheet, which is discussed further in Section 4.</arw>

-How is the informaton in the spreadsheet pulled back out of the spreadsheet and into the ontology---CSV to OWL, for example.

<arw>Discussion added to Section 4. 'We are extending the OntoSheet tooling to capture new concepts and properties (and their related data) added to the worksheets, as well as any review comments defined for existing concepts and properties. (Also important in this approach is to capture the provenance of the additions and changes.)'</arw>

-What is the benefit of using this approach to some of the other approaches, either listed in section 2 or other CSV to RDF and/or OWL tools?

<arw>Modified the start of Section 4 to clearly state the benefits and differences.</arw>

-How might a user add new information to the structure (similar to what Populous addressed)?

<arw>Noted in the new text about OntoSheet in Section 4 - as this is still in development.</arw>

4. Conclusion or Future Work
-where are the tools being released as open source projects---github, sourceforge, gitlab, etc.?

<arw>Provided details. Source is on schedule to be available in May.</arw>

-why is figure 7 in section 4? Surely this should be included in the previous section.

<arw>Moved to Section 3.</arw>

In the third paragraph, the first sentence states, "The OntoGraph *and* spreadsheet tooling assumes that an appropriate, domain ontology exists for a domain, and is defined using an OWL syntax". This is key information and should be one of the first things in section 3.

<arw>Modified the discussion of OntoGraph to read . . . 'For visualization, we created the OntoGraph program ("OntoGraph", 2016-2017). It was designed to provide documentation on existing OWL ontologies, developed for our consulting customers.' And modified the spreadsheet sub-section to begin with . . . 'Beyond graphs, we also generate domain documentation for OWL ontologies, based on a simple spreadsheet format.'</arw>

general notes:
citations---I assume these will be numbered and the full author date was for reviewing purposes. If that is not the case, it probably should be.

<arw>As stated above, the submission requires the use of the APA guidelines for citations.  These were followed (for good or for bad).</arw>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WorkingOnOntologies-RevisedApril.pdf
Type: application/pdf
Size: 1605732 bytes
Desc: not available
URL: <http://listserv.ovgu.de/pipermail/iaoa-swao/attachments/20170605/a75b545e/attachment-0001.pdf>