diff --git a/hacktober b/hacktober new file mode 100644 index 00000000..bcf7283b --- /dev/null +++ b/hacktober @@ -0,0 +1,1553 @@ +Submitted 9 October 2016 +Accepted 7 August 2017 +Published 2 October 2017 +Corresponding author +Silvio Peroni, silvio.peroni@unibo.it +Academic editor +Ciro Cattuto +Additional Information and +Declarations can be found on +page 31 +DOI 10.7717/peerj-cs.132 +Copyright +2017 Peroni et al. +Distributed under +Creative Commons CC-BY 4.0 +OPEN ACCESS +Research Articles in Simplified HTML: +a Web-first format for HTML-based +scholarly articles +Silvio Peroni1 +, Francesco Osborne2 +, Angelo Di Iorio1 +, Andrea Giovanni +Nuzzolese3 +, Francesco Poggi1 +, Fabio Vitali1 +and Enrico Motta2 +1 Digital and Semantic Publishing Laboratory, Department of Computer Science and Engineering, University +of Bologna, Bologna, Italy +2 Knowledge Media Institute, Open University, Milton Keynes, United Kingdom +3 +Semantic Technologies Laboratory, Institute of Cognitive Sciences and Technologies, Italian National +Research Council, Rome, Italy +ABSTRACT +Purpose. This paper introduces the Research Articles in Simplified HTML (or RASH), +which is a Web-first format for writing HTML-based scholarly papers; it is accompanied +by the RASH Framework, a set of tools for interacting with RASH-based articles. The +paper also presents an evaluation that involved authors and reviewers of RASH articles +submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops. +Design. RASH has been developed aiming to: be easy to learn and use; share scholarly +documents (and embedded semantic annotations) through the Web; support its +adoption within the existing publishing workflow. +Findings. The evaluation study confirmed that RASH is ready to be adopted in +workshops, conferences, and journals and can be quickly learnt by researchers who +are familiar with HTML. +Research Limitations. The evaluation study also highlighted some issues in the +adoption of RASH, and in general of HTML formats, especially by less technically savvy +users. Moreover, additional tools are needed, e.g., for enabling additional conversions +from/to existing formats such as OpenXML. +Practical Implications. RASH (and its Framework) is another step towards enabling +the definition of formal representations of the meaning of the content of an article, +facilitating its automatic discovery, enabling its linking to semantically related articles, +providing access to data within the article in actionable form, and allowing integration +of data between papers. +Social Implications. RASH addresses the intrinsic needs related to the various users +of a scholarly article: researchers (focussing on its content), readers (experiencing new +ways for browsing it), citizen scientists (reusing available data formally defined within +it through semantic annotations), publishers (using the advantages of new technologies +as envisioned by the Semantic Publishing movement). +Value. RASH helps authors to focus on the organisation of their texts, supports them in +the task of semantically enriching the content of articles, and leaves all the issues about +validation, visualisation, conversion, and semantic data extraction to the various tools +developed within its Framework. +How to cite this article Peroni et al. (2017), Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Comput. Sci. 3:e132; DOI 10.7717/peerj-cs.132 +Subjects Digital Libraries, World Wide Web and Web Science +Keywords Document conversion, XSLT, RASH, Semantic Publishing, Digital Publishing, +Semantic Web +INTRODUCTION +In the last months of 2014, several posts within technical mailing lists of the Web +(https://lists.w3.org/Archives/Public/public-lod/2014Nov/0003.html) and Semantic Web +(https://lists.w3.org/Archives/Public/public-lod/2014Oct/0058.html) community have +discussed an evergreen topic in scholarly communication, i.e., how could authors of research +papers submit their works in HTML rather than, say, PDF, MS Word or LaTeX. Besides +the obvious justification of simplification and unification of data formats for drafting, +submission and publication, an additional underlying rationale is that the adoption of +HTML would ease the embedding of semantic annotations, thus improving research +communications thanks to already existing W3C standards such as RDFa (Sporny, 2015), +Turtle (Prud’hommeaux & Carothers, 2014) and JSON-LD (Sporny, Kellogg & Lanthaler, +2014). This opens complex and exciting scenarios that the Semantic Publishing community +has promised us in terms of increased discoverability, interactivity, openness and usability +of the scientific works (Bourne et al., 2011; Shotton et al., 2009). +Nonetheless, HTML is still primarily used as an output format only: the authors +write their papers in LaTeX or MS Word and submit sources to the typesetters, who are +responsible for producing the final version, that eventually will be published and read on +the Web. Appropriate tools in the publishing toolchain are used to convert papers among +multiple formats. +The interest in Web-first research papers—that are natively designed, stored and +transferred in HTML—is increasing. Just to cite a few research efforts: Scholarly HTML +(http://scholarlyhtml.org) defines a set of descriptive rules for adopting a defined subset of +HTML to describe the metadata and content of scholarly articles; Dokieli (http://dokie.li) +is a Web application that allows authors to create HTML-based scholarly articles directly +on the browser, adding annotations and many other sophisticated features. +This paper introduces a novel approach towards the same goal: providing authors with a +subset of HTML for Web-first papers. The format is called RASH, Research Articles in Simplified HTML, and consists of 32 HTML elements only. This format is also accompanied by +the RASH Framework, a set of specifications and tools for RASH documents (Peroni, 2017). +There are two key differences between RASH and other similar proposals. First of all, +RASH adopts a simplified pattern-based data model. The number of markup elements to +be used by authors was reduced down to the bare minimum, and the elements themselves +were chosen in order to minimize the cognitive effort of authors when writing documents. +Secondly, RASH does not come with a full authoring environment but is expected to be +produced from MS Word, ODT and LaTeX sources. The basic idea is to allow authors +to keep using the word processors on which they routinely write their papers and to +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 2/35 +provide them with multi-format converters. These converters are included in the RASH +Framework, whose architecture is modular and extensible for handling new formats in +the future. +RASH is in fact intended to help authors in focussing on the organisation of their +texts and supports them in the task of semantically enriching the content of articles, +delegating all the issues about validation/presentation/conversion of RASH documents +to the various tools developed within its Framework. This is a well-known principle in +scientific publishing, even if not yet fully applied: clear separation of concerns. The authors +should focus on organising the content and structure only, and the format should not +require authors to worry about how the content will be presented on screen and in print. +The publishers will then take care of creating the final formatting to best render the +content in the style of their publications, or authors could use self-publishing platforms as +promoted by Linked Research (http://linkedresearch.org). +Such a separation of concerns can be pushed much forward. Pettifer et al. (2011) +explained well the difference between an article as ‘‘an instance of scholarly thought’’ and +‘‘a representation for consumption by human or machine’’, and showed how multiple +representations can be combined, integrated with external data, enhanced and interacted +with in order to provide scholars with sophisticated tools directly within their articles. +Another critical requirement for any HTML-based language used for scientific writing +is good rendering and acceptance by the publishers. Any new HTML-based format should +be beneficial for publishers as well. Of course, publishers, conferences, and workshop +organisers, would like to manage new formats in the same way as for the formats +they already support, such as LaTeX. To this end, these formats should support tools +for their conversion and for rendering the content in specific layouts, such as ACM +ICPS (http://www.acm.org/sigs/publications/proceedings-templates) and Springer LNCS +(http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0). RASH adopts a +pragmatic approach to this issue: while we are interested in a full-fledged native RASH +authoring environment, we implemented a set of converters, in the RASH Framework, +that are easily integrable (and were integrated) with existing publishing platforms. +The goal of this paper is, in fact, to describe the outcomes of some experimentations on +the use of RASH, so as to understand: +1. if it can be adopted as HTML-based submission format in academic venues (workshops, +conferences, journals); +2. if it is easy to learn and use; +3. if it can be used to add semantic annotations and what are the most widely adopted +vocabularies in RASH papers. +The rest of the paper is structured as follows. In ‘Related Works’ we introduce some +of the most relevant related works in the area, providing a functional comparison of the +various works. In ‘Which ‘‘Web-first’’ Format for Research Articles?’ we introduce the +rationale for the creation of a new Web-first format for scholarly publications, discussing +the importance of minimality. In ‘Writing Scholarly Articles in HTML with RASH’ and ‘The +RASH Framework’ we introduce the theoretical background of RASH, and then provide +an introduction to the language and the main tools included in its Framework. In ‘RASH +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 3/35 +and SAVE-SD: an Evaluation’ we present, as a case study, an analysis of the adoption of +RASH at the SAVE-SD 2015 (http://cs.unibo.it/save-sd/2015/index.html) and SAVE-SD +2016 (http://cs.unibo.it/save-sd/2016/index.html) workshops. Finally, in ‘Conclusions’ we +conclude the paper by sketching out some future developments. +RELATED WORKS +The growing interest in the publication of Web-first research papers has resulted in the +release of some interesting projects related to RASH. In the following subsections, we +discuss all the most important contributions in this area by splitting them into two main +categories: (i) HTML-based formats and (ii) WYSIWYG editors for HTML documents. +Note that we do not discuss in detail some other efforts that have recently been done +by means of non-HTML languages, even if they are equally relevant for the community. +ScholarlyMarkdown (http://scholarlymarkdown.com/) (Lin & Beales, 2015), for instance, +is a syntax to produce scholarly articles according to a Markdown (http://daringfireball.net/ +projects/markdown/) input. ShareLaTeX (https://www.sharelatex.com/) is a Web-based +real-time collaborative editor for LaTeX documents. +In Table 1 we briefly summarise the features and capabilities of the formats presented, +in order to highlight the main differences between them. +HTML-based formats +One of the first documented contributions that proposed an HTML-based format for +scholarly articles was Scholarly HTML (http://scholarlyhtml.org). It is not defined as a +formal grammar, but as a set of descriptive rules which allows one to specify just a reduced +amount of HTML tags for describing the metadata and content of a scholarly article. It is the +main intermediate format used in ContentMine (http://contentmine.org) for describing +the conversion of PDF content into HTML. +Along the same lines, PubCSS (https://github.com/thomaspark/pubcss/) is a project +which aims at pushing the use of HTML+CSS for writing scholarly articles. It does not +define a formal grammar for the HTML element set to use. Rather it provides some +HTML templates according to four different CSS styles, which mimic four LaTeX styles for +Computer Science articles, i.e., ACM SIG Proceedings, ACM SIGCHI Proceedings, ACM +SIGCHI Extended Abstracts, and IEEE Conference Proceedings. +HTMLBooks (https://github.com/oreillymedia/HTMLBook/) is an O’Reilly’s +specification for creating HTML documents (books, in particular) by using a subset of all +the (X)HTML5 elements. This is one of the first public works by a publisher for pushing +HTML-like publications, even if the status of its documentation (and, consequently, of its +schema) is still ‘‘unofficial’’. +Another project, which shares the same name of one of the previous ones, Scholarly +HTML (https://github.com/scienceai/scholarly.vernacular.io), is a work by the science.ai +(http://science.ai) company that aims at providing a domain-specific data format based +on open standards (among which HTML5) for enabling ‘‘the interoperable exchange of +scholarly articles in a manner that is compatible with off-the-shelf browsers’’ (Berjon & +Ballesteros, 2015). While the format is not defined by any particular formal grammar, it has +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 4/35 +Table 1 A comparison among existing HTML-oriented formats for scholarly papers according to seven distinct categories. +Format Syntax Doc Formal +grammar +Semantic +annotations +CSS for +different formats +WYSIWYG +editor +Conversion from Conversion to +RASHa HTML Available onlineb RelaxNGc RDFa, RDF/XML, +Turtle, JSON-LD +Web-based and Springer +LNCS +Apache OpenOffice, Microsoft Word, RASH +Javascript Editor (RAJE) +ODT, DOCX LaTeX: ACM ICPS, ACM +Journal Large, PeerJ CS, +Springer LNCS +Scholarly +HTML +(2011)d +HTML Available online e None RDFa None None PDF +(via ContentMine— +Normaf) +None +PubCSSg HTML Available onlineh Informal +(via HTML +templates) +None ACM SIG Proceedings, +ACM SIGCHI Proceedings, +ACM SIGCHI Extended +Abstracts, and IEEE Conference Proceedings +None None PDF +(via browser interface) +HTML +Booksi +HTML Available onlinej XML Schemak None CSS files for PDF printing and EPUB/MOBIcompatible device visualisations None None None +Scholarly +HTML +(2015)l +HTML Available onlinem None RDFa, JSON-LD Web-based Microsoft Word (as referenced onlinen and their online platform (no access for +free guaranted as of 20 June +2017) +DOCX None +Scholarly +HTML +(2016)o +HTML Available onlinep None RDFa, JSON-LD Web-based None None None +dokieli +format +HTML Available onlineq Informal +(via HTML +templates +and patterns) +RDFa, Turtle, +JSON-LD, TRiG +Web-based (Native and Basic), Springer LNCS, ACM +ICPS +dokielir None PDF +(via browser interface) +(continued on next page) +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 5/35 +Table 1 (continued) +Format Syntax Doc Formal +grammar +Semantic +annotations +CSS for +different formats +WYSIWYG +editor +Conversion from Conversion to +Fiduswriter +format +HTML None None None Web-based Fiduswriters None HTML, EPUB, LaTeX +Authorea +format +HTML None None None Web-based Authorea t DOCX, LaTeX DOCX, LaTeX (according to several stylesheets), +PDF, Zipped structure with +HTML +Notes. +ahttps://github.com/essepuntato/rash/. +bhttp://cs.unibo.it/save-sd/rash. +chttps://raw.githubusercontent.com/essepuntato/rash/master/grammar/rash.rng. +dhttp://scholarlyhtml.org/. +ehttp://scholarlyhtml.org/core-specification/. +fhttps://github.com/ContentMine/norma. +ghttps://github.com/thomaspark/pubcss/. +hhttp://thomaspark.co/2015/01/pubcss-formatting-academic-publications-in-html-css/. +ihttps://github.com/oreillymedia/HTMLBook/. +jhttp://oreillymedia.github.io/HTMLBook/. +khttps://raw.githubusercontent.com/oreillymedia/HTMLBook/master/schema/htmlbook.xsd. +lhttps://github.com/scienceai/scholarly.vernacular.io. +mhttp://scholarly.vernacular.io/. +nhttps://science.ai/overview. +ohttps://github.com/w3c/scholarly-html. +phttps://w3c.github.io/scholarly-html/. +qhttps://dokie.li/docs. +rhttp://dokie.li. +shttps://www.fiduswriter.org. +thttps://www.authorea.com. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 6/35 +1The main aim of the LinkedResearch +project is to propose principles for +enabling researchers to share and reuse +research knowledge by means of existing +Web and Semantic Web technologies +towards a future world where researchers +can publish and consume human-friendly +and machine-readable (e.g., by using RDFa +(Sporny, 2015)) scholarly documents. +a well-described documentation (Berjon & Ballesteros, 2015) that teaches how to produce +scholarly documents by using a quite large set of HTML tags, accompanied by schema.org +(http://schema.org) annotations for describing specific structural roles of documents as +well as basic metadata of the paper. The company also provides services that enable the +conversion from Microsoft Word document into ScholarlyHTML format. +One of the authors of the previous work is also the chair of a W3C community +group called ‘‘Scholarly HTML’’ (https://www.w3.org/community/scholarlyhtml/) which +aims at developing a HTML vernacular (https://github.com/w3c/scholarly-html) for the +creation of a Web-first format for scholarly articles. It involves several people from all the +aforementioned specifications (including RASH), and the group work should result in the +release of a community-proposed interchange HTML format. As of September 22, 2017, +the online documentation (https://w3c.github.io/scholarly-html/) is mainly a fork of the +Scholarly HTML specification proposed by science.ai discussed above. +HTML-oriented WYSIWYG editors +One of the most important and recent proposals, which is compliant with the principles +introduced as part of the Linked Research (http://linkedresearch.org) project1 +, is dokieli +(https://dokie.li) (Capadisli et al., 2017). Dokieli is a web application (still under +development) that allows the creation of HTML-based scholarly articles directly on +the browser, and implements several features among which are annotations (in RDF) and +a notification system. The application makes also available some HTML templates and a +series of widgets for navigating, visualising (in different formats) and printing research +documents easily by using common browsers. +Fidus Writer (https://www.fiduswriter.org/) is another Web-based application for +creating HTML scholarly documents by means of a wordprocessor-like interface. While +the particular format used is not explicitly specified, it allows the conversion of the HTML +documents created within the application in two different formats, i.e., EPUB and LaTeX +(alongside with HTML). +Authorea (https://www.authorea.com) is a Web service that allows users to write papers +by means of a clear and effective interface. It enables the inclusion of the main components +of scientific papers such as inline elements (emphasis, quotations, etc.), complex structures +(figures, equations, etc.), and allows the use of Markdown and LaTeX for adding more +sophisticated constructs. In addition, Authorea is able to export the document in four +different formats (PDF, LaTeX, DOCX, and zipped archive with several HTML files) and +according to a large number of stylesheets used in academic venues. +WHICH “WEB-FIRST” FORMAT FOR RESEARCH +ARTICLES? +The term ‘‘Web-first’’ format indicates the possibility of using HTML as a primary format +to write, store and transfer research articles, and not only to make these articles available +on the Web. Some questions naturally arise in this context: shall we use the full HTML? If +we impose a limited subset, which elements should we consider? Shall we demand specific +rules for using the language? +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 7/35 +2Note that accepting HTML as format for +submissions in conferences/workshops is +a totally different issue, since this choice +is normally taken by the organisers. +For instance, see the SAVE-SD 2015 +call for papers (http://cs.unibo.it/savesd/2015/submission.html) and the +various editions of SePublica (http://ceurws.org/Vol-1155/). +Some works, e.g., Capadisli, Riedl & Auer (2015), suggest not to force any particular +HTML structure for research papers. This choice would allow authors to use whatever +HTML structure they want for writing papers and would reduce (even, eliminate) the fear +for the template bottleneck, i.e., the fact that users may not adopt a particular language if +they are compelled to follow specific rules. On the other hand, leaving to the authors the +freedom of using, potentially, the whole HTML specification may affect, in some way, +the whole writing and publishing process of articles. The author could adopt any kind of +HTML linearisation, e.g., using elements div instead of elements section, using elements +table for their presentational behaviour (i.e., how they are rendered by browsers or other +software readers) and not for presenting tabular data, and the like. This freedom could +result in two main kinds of issues: +• visualisation bottleneck—it may affect the correct use of existing, well-developed and +pretty standard CSSs (e.g., Capadisli’s CSSs developed for Dokieli (https://dokie.li)) +for both screen and print media, in having to write new codes for handling paper +visualisation correctly; +• less focus on the research content—the fact that a certain paper is not visualised in a +browser very well (or, worse, in a way that is not the one the author expects) could bring +the author to work on the presentation of the text, rather than on focussing on the actual +research content of the text. +Another point against the use of any HTML syntax for writing papers concerns the +possibility of enabling an easy way for sharing the paper with others (e.g., co-authors) who, +potentially, may not use HTML in the same way. If all the co-authors of a paper are able +to use the full HTML, they may not understand other users’ specific use of some HTML +tags —‘‘why did they use the elements section instead of div?’’; ‘‘what is this freaky use +of elements table?’’. Hence, the advantages of using a common HTML format is quite +evident: only one syntax and only one possible semantics. +There is a further issue worth mentioning. Having a shared, unambiguous and simple +format would facilitate conversions from/into other complex ones (e.g., ODT (JTC1/SC34 +WG 6, 2006), OOXML (JTC1/SC34 WG 4, 2011), DocBook (Walsh, 2009), JATS (National +Information Standards Organization, 2012), thus enabling authors to use their own text +editors or word-processors to modify the articles. The conversion is instead much more +complex, error-prone and imprecise on the full HTML. +To complicate an already complex scenario there is the necessary involvement of +publishers. Allowing the authors to use their own HTML format could be counterproductive +from a publisher’s perspective, in particular when we speak about the possibility of adopting +such HTML formats for regular conference/journal camera-ready submissions. From a +recent discussion on the Force11 mailing list (https://groups.google.com/forum/#!topic/ +forcnet/g4BNAOOMjMM), it emerges that publishers are willing to adopt HTML for +submissions if and only if it is a clear community need. It means that they will include +HTML formats in the publishing workflow only once a number of conference organisers +decides to include HTML among the accepted formats for paper submissions2 +. However, +using one clear Web-first format, rather than a plethora of possible variations allowed by +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 8/35 +3OASIS LegalDocumentML is the +standardisation of AkomaNtoso (http: +//www.akomantoso.org/), which is a set +of simple technology-neutral electronic +representations in XML format of +parliamentary, legislative and judiciary +documents, and has been already adopted +by several parliaments in European Union, +Africa, and South America. +the full HTML schema, would certainly lighten the burden of publishers for including +HTML within their publishing workflow. This inclusion could be additionally favoured by +the availability of services (e.g., editors, converters, enhancers, visualisers) for facilitating +the use of such a Web-first format within the existing publishing environments. +Last but not least, using a controlled subset of HTML is more appropriate for +Semantic Publishing applications (Shotton et al., 2009; Peroni, 2014b). The development +of scripts and applications to extract, for instance, RDF statements directly from the +markup structure of the text is a sort of nightmare if different authors use HTML in +different manners. For instance, what happens when trying to extract the rhetorical +organisation of a scientific paper according to the Document Component Ontology +(DoCO) (http://purl.org/spar/doco) (Constantin et al., 2016) from two HTML documents +that use HTML tags in different ways? Is an HTML element table an actual table +(containing tabular data)? Which are the tags identifying sections? These analyses are +all easier within a controlled and unambiguous subset of HTML. +WRITING SCHOLARLY ARTICLES IN HTML WITH RASH +The subset of HTML we propose in RASH is strictly compliant to a patterns theory we have +developed over the past few years. Patterns are widely accepted solutions to handle recurring +problems. Firstly introduced for architecture and engineering problems (Alexander, 1979), +they have been successfully deployed in computer science and in particular in software +engineering (Gamma et al., 1994). In this section, we briefly introduce our patterns for +document engineering and then we go into the details of RASH. +Theoretical foundations: structural patterns +While we have plenty of tools and languages for creating new markup languages +(e.g., RelaxNG (Clark & Makoto, 2001) and XMLSchema Gao, Sperberg-McQueen & +Thompson, 2012), these usually do not provide any particular guideline for fostering +the development of robust and well-shaped document languages. In order to fill that +gap, in the last decade we have experimented with the use of a theory of structural +patterns for markup documents (Di Iorio et al., 2014), that has since been applied in +several national and international standards, among which OASIS LegalDocumentML +(https://www.oasis-open.org/committees/legaldocml/) +3 +, a legal document standard for the +specification of parliamentary, legislative and judicial documents, and for their interchange +between institutions in different countries. +The basic idea behind this theory is that each element of a markup language should +comply with one and only one structural pattern, depending on the fact that the element: +• can or cannot contain text (+t in the first case, −t otherwise); +• can or cannot contain other elements (+s in the first case, −s otherwise); +• is contained by another element that can or cannot contain text (+T in the first case, +−T otherwise). +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 9/35 +By combining all these possible values—i.e., ±t, ±s, and ±T—we basically obtain eight +core structural patterns, namely (accompanied by a plausible example within the HTML +elements): +1. inline [+t+s+T], e.g., the element em; +2. block [+t+s−T], e.g., the element p; +3. popup [−t+s+T], e.g., the element aside; +4. container [−t+s−T], e.g., the element section; +5. atom [+t−s+T], e.g., the element abbr; +6. field [+t−s−T], e.g., the element title; +7. milestone [−t−s+T], e.g., the element img; +8. meta [−t−s−T], e.g., the element link. +Instead of defining a large number of complex and diversified structures, the idea is that +a small number of structural patterns are sufficient to express what most users need for +defining the organisation of their documents. Therefore, the two main aspects related to +such patterns are: +• orthogonality—each pattern has a specific goal and fits a specific context. It makes it +possible to associate a single pattern to each of the most common situations in document +design. Conversely, for every situation encountered in the creation of a new markup +language, the corresponding pattern is immediately selectable and applicable; +• assemblability—each pattern can be used only in some contexts within other patterns. +This strictness provides expressiveness and non-ambiguity in the patterns. By limiting +the possible choices, patterns prevent the creation of uncontrolled and misleading +content structures. +Such patterns allow authors to create unambiguous, manageable and well-structured +markup languages and, consequently, documents, fostering increased reusability (e.g., +inclusion, conversion, etc.) among different languages. Also, thanks to the regularity they +provide, it is possible to perform easily complex operations on pattern-based documents +even when knowing very little about their vocabulary (automatic visualisation of document, +inferences on the document structure, etc.). In this way, designers can implement more +reliable and efficient tools, can make a hypothesis regarding the meanings of the document +fragments, can identify singularities and can study the global properties of a set of +documents, as described in Di Iorio et al. (2012) and Di Iorio et al. (2013). +HTML does not use the aforementioned patterns in a systematic way, as it allows the +creation of arbitrary and, sometimes, quite ambiguous structures. To apply the structural +pattern guidelines for RASH, we restricted HTML by selecting a good subset of elements +expressive enough to capture the typical components of a scholarly article while being also +well-designed, easy to reuse and robust. +RASH: Research Article in Simplified HTML +The Research Articles in Simplified HTML (RASH) format is a markup language that +restricts the use of HTML (http://www.w3.org/TR/html5/) elements to only 32 elements +for writing academic research articles. It allows authors to use embedded RDF annotations. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 10/35 +4Please refer to the official RASH +documentation, available at https: +//rawgit.com/essepuntato/rash/master/ +documentation/index.html, for a complete +introduction of all the elements and +attributes that can be used in RASH +documents. +5The following prefixes are always +mandatory in any RASH document: +• schema: http://schema.org/ +• prism: http://prismstandard.org/ +namespaces/basic/2.0/. +In addition, RASH strictly follows the Digital Publishing WAI-ARIA Module 1.0 (Garrish +et al., 2016) for expressing structural semantics on various markup elements used. +All RASH documents begin as a simple HTML5 document4 +(Hickson et al., 2014), +by specifying the generic HTML DOCTYPE followed by the document element html +with the usual namespace (‘‘http://www.w3.org/1999/xhtml’’) and with additional (and +mandatory) prefix declarations through the attribute prefix5 +. The element html contains +the element head for defining metadata of the document according to the DCTERMS (http: +//dublincore.org/documents/dcmi-terms/) and PRISM (http://www.prismstandard.org/) +standards and the element body for including the whole content of the document. The +element head of a RASH document must include some information about the paper, +i.e., the paper title (element title), at least one author, while other related information +(i.e., affiliations, keywords and categories included using the elements meta and link) are +optional. The element body mainly contains textual elements (e.g., paragraphs, emphases, +links, and quotations) for describing the content of the paper, and other structural elements +(e.g., abstract, sections, references, and footnotes) used to organise the paper in appropriate +blocks and to present specific complex structures (e.g., figures, formulas, and tables). +In the following subsection, we provide a quick discussion about usage patterns in +RASH, and introduce the tools used for developing its grammar. +Development and patterns +The development of RASH started from the whole HTML5 grammar, and proceeded by +removing and restricting the particular use of HTML elements, to make them expressive +enough for representing the structures of scholarly papers and to have the language totally +compliant with the theory on structural patterns for XML documents (Di Iorio et al., 2014) +introduced in ‘Theoretical foundations: structural patterns’. +The systematic use of these structural patterns is an added value in all stages of the +documents’ lifecycle: they can be guidelines for creating well-engineered documents and +vocabularies, rules to extract structural components from legacy documents, indicators +to study to what extent documents share design principles and community guidelines. +All these characteristics have allowed us to simplify, at least to some extent, the handling +of all the requirements introduced in ‘Introduction’ and ‘Which ‘‘Web-first’’ Format for +Research Articles?’ in RASH. Table 2 shows what is the current pattern assignment for each +element in RASH. +Notice that we do not use two patterns presented in ‘Theoretical foundations: structural +patterns’, namely atom and popup. The elements compliant with the former pattern are +contained in discursive blocks (e.g., paragraphs) and contain only textual content with no +additional elements. This is very infrequent in scholarly writings since any element used for +emphases, links, and other in-sentence elements can always contain additional elements +(e.g., an emphasis can contain a link). +A different discourse can be done for the pattern popup, which is meant to represent +complex substructures that interrupt but do not break the main flow of the text, such +as footnotes (Di Iorio et al., 2014). An element compliant to the popup pattern, while still +not allowing directly text content inside itself, is found in elements with a mixed context +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 11/35 +Table 2 The use of structural patterns in RASH. +Pattern RASH element +inline a, code, em, math, q, span, strong, sub, sup, svg +block figcaption, h1, p, pre, th +popup none +container blockquote, body, figure, head, html, li, ol, section, +table, td, tr, ul +atom none +field script, title +milestone img +meta link, meta +[t+s+]. In particular, in developing RASH, we discussed which of the following two +possible approaches for defining footnotes was more adequate to our needs. +The first option was a container-based behaviour, also suggested by JATS (National +Information Standards Organization, 2012) by means of the element fn-group and not +included in HTML specifications, that allows the authors to specify footnotes (through +the element ft) by using a tag that is totally separated from the main text from which it is +referenced (usually through XML attributes), as shown in the following excerpt: +<-- A paragraph referring to a footnote --> +

+In this paragraph there is an explicit reference to the +second footnote < xref rid =" n2 " > . +

+<-- The group containing all the footnotes --> + +< fn id =" n1 " > +

This is a paragraph within a footnote .

+ +< fn id =" n2 " > +

This is a paragraph in another footnote .

+

+All the footnotes are contained in a group , so as +to collect them together . +

+ +... + +The alternative was a popup-based behaviour, used by default in LaTeX (through the +marker \footnote{}) and even possible in JATS (which is a very permissive language +by design), where a paragraph can be abruptly interrupted by one or more paragraphs +specified in a footnote, as shown in the following excerpt: +<-- A paragraph containing a footnote --> +

+In this paragraph the footnote < fn id =" n3 " >

That is +what we call popup - based behaviour !. has been +defined directly within it . +

+We considered the latter approach a bit confusing, since it actually decreases the +readability of the HTML source where footnotes are needed. We thus decided to adopt a +solution similar to the JATS fn-group element, extending the HTML5 section element +with @role set to doc-endnotes and doc-endnote: +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 12/35 +6The content model of an element is the +particular organisation of its content in +terms of text, attributes and elements that +it can contain. +<-- A paragraph referring to a footnote --> +

+In this paragraph there is an explicit reference to the +second footnote < a href ="# fn2 " > . +

+<-- The group containing all the footnotes --> +< section role =" doc - endnotes " > +< section role =" doc - endnote " id =" fn1 " > +

This is the text of a footnote .

+ +< section role =" doc - endnote " id =" fn2 " > +

This is the text of another footnote .

+ +... + +Grammar and peculiarities +The formal grammar of RASH (https://raw.githubusercontent.com/essepuntato/rash/ +7ef4c2f2ea63575fb32f17e826d60333543eda67/grammar/rash.rng) (current version: 0.6.1) +has been developed by means of RelaxNG (Clark & Makoto, 2001), which is a simple, +easy to learn, and powerful schema language for XML. The grammar has been logically +organised in four distinct logical blocks of syntactic rules, defining respectively elements, +attributes, content models6 +for the elements and their related attribute lists, as summarised +in the following excerpt: +... +< define name =" p " > +< element name =" p " > +< ref name =" attributes_html_element_no_role " / > +< ref name =" cm_inline " / > + + +... +< define name =" aClass " > +< attribute name =" class " > +< data type =" NMTOKENS " / > + + +... +< define name =" cm_inline " > +< zeroOrMore > +< choice > +< text / > +< ref name =" a " / > +< ref name =" aRef " / > +< ref name =" img " / > +< ref name =" svg " / > +< ref name =" math " / > +< ref name =" img_math " / > +< ref name =" span_latex " / > +< ref name =" span " / > +< ref name =" code " / > +< ref name =" sub " / > +< ref name =" sup " / > +< ref name =" em " / > +< ref name =" strong " / > +< ref name =" q " / > + + + +... +< define name =" attributes_html_element_no_role " > +< ref name =" attributes_html_generic " / > +< optional > +< ref name =" aClass " / > + +< ref name =" attributes_rdfa " / > + +... +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 13/35 +7 +In the paper, for the sake of clarity, we use +the prefix ‘‘@’’ when we name attributes +(e.g., the attribute named ‘‘role’’ is +introduced as @role), while we just name +elements with their name (e.g., section). +Starting from the latest versions of the language, there has been a clear shift towards +an extended use of HTML5 semantic elements, despite the fact they are not backwards +compatible with their more generic alternatives in HTML4 (Raggett, Le Hors & Jacobs, +1999). In particular, the elements section, figure, and figcaption have been adopted +so as to clearly refer to paper sections and boxes with tables, figures, listings and formulas, +accompanied by a particular caption. +While this choice has fostered the readability of the source, the use of these HTML5 +elements was not enough to provide proper semantics and accessibility to the RASH source. +Thus, in order to improve the user experience in terms of accessibility of such HTML-based +papers, RASH reuses some items from the W3C Accessible Rich Internet Applications 1.1 +(Diggs et al., 2015), and also exploits several roles introduced in the Digital Publishing +WAI-ARIA Module 1.0 (Garrish et al., 2016), which allows the ‘‘digital publishers to +apply the structural semantics they need to drive the authoring process while getting free +accessibility’’ (https://lists.w3.org/Archives/Public/public-dpub-aria/2016Feb/0032.html). +The use of such semantics is implemented by means of the attribute @role7 +, that can be +used on certain RASH elements, e.g., sections, and it is very useful for specifying a clear +structural semantics where it is not formally defined. For instance, all the references are +organised in a list within a special section defined by using the element section with the +attribute @role set to ‘‘doc-bibliography’’. This special section contains one list with a +bibliographic reference for each list item (i.e., the element li accompanied by the attribute +@id for referencing to it within the text and the attribute @role set to ‘‘doc-biblioentry’’), +as shown in the following excerpt: +< section role =" doc - bibliography " > +

References +
    +< li id =" Per2014 " role =" doc - biblioentry " > +

    Write here the reference entry .

    + +... + + +Formulas require special consideration, since there are different ways to implement them. +The standard specification for representing mathematics on the Web is MathML (Carlisle, +Ion & Miner, 2014). Even if MathML is the best accessible way for writing mathematical +formulas, the organisation of the elements for defining even a simple formula is quite +verbose and this is a reasonable obstacle to its direct adoption, as shown in the following +excerpt for describing the formula πr +2 +: +< math xmlns =" http :// www . w3 . org /1998/ Math / MathML " > +π + +< msup > +r +2 + + +To help the creator of RASH documents in dealing with formulas, RASH adds two other +ways for writing formulas in addition to MathML. The first one is to use an image (element +img), which is a very simple way to include maths in a paper. On the other hand, it is +not accessible at all since the various elements of the formula are not marked-up properly +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 14/35 +so as to distinguish them. Another option is to use LaTeX (or, alternatively, ASCIIMath: +http://asciimath.org), which is one of the most common ways to write formulas in many +scientific papers. Both options are specifiable in RASH by using either the element img +or the element span respectively, accompanied by the attribute @role set to ‘‘math’’, as +shown in the following excerpt: +<-- Specifying a formula through the element 'img ' --> +< img role =" math " src =" formula . png " alt =" r ^2" / > +<-- Specifying a formula in LaTeX through the element 'span ' --> +< span role =" math " >\ pi r ^2 +The rendering of any LaTeX or ASCIIMath formula and the multi-browser support +for MathML is implemented by using MathJax (https://www.mathjax.org/), which is a +Javascript display engine for mathematics that works in most modern browsers. Of course, +it is necessary to explicitly import MathJax in the element head if any rendering of formulas +is actually needed, as shown in the following: + +< script src =" https :// cdnjs . cloudflare . com / ajax / libs / mathjax /2.7.1/ MathJax . js ? config = TeX - AMS - +MML_HTMLorMML " > + +RASH has been developed in order to allow anyone to add RDFa annotations (Sporny, +2015) to any element of the document. For instance, this paragraph contains the following +RDF statement (in Turtle (Prud’hommeaux & Carothers, 2014)): +@prefix cito : < http :// purl . org / spar / cito / > . +<> cito : credits < http :// www . w3 . org / TR / rdfa - syntax / > . +That was implemented by using specific RDFa attributes (@property and @resource, +in this case) within the paragraph content, while the prefixes were defined in the element +html, as shown in the following excerpt: +< html prefix =" cito : http :// purl . org / spar / cito /" > +... +

    +RASH has been developed in order to allow anyone to add +< span +property =" cito : credits " +resource =" http :// www . w3 . org / TR / rdfa - syntax /" > RDFa +annotations to any element of the document . +

    +... + +In addition to RDFa, RASH makes available another way to inject RDF statements +(Cyganiak, Wood & Lanthaler, 2014) to the document, by means of an element script +(within the element head): +• with the attribute type set to ‘‘text/turtle’’ for adding plain Turtle content +(Prud’hommeaux & Carothers, 2014); +• with the attribute type set to ‘‘application/ld+json’’ for adding plain JSON-LD content +(Sporny, Kellogg & Lanthaler, 2014); +• with the attribute type set to ‘‘application/rdf+xml’’ for adding plain RDF/XML +content (Gandon & Schreiber, 2014). +An example of the use of the script for Turtle and JSON-LD statements is shown in +the following excerpt: +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 15/35 +< script type =" text / turtle " > +@prefix pro : & lt ; http :// purl . org / spar / pro /& gt ; . +@prefix foaf : & lt ; http :// xmlns . com / foaf /0.1/& gt ; . +@prefix sd : & lt ; https :// w3id . org / scholarlydata / person /& gt ; . +sd : silvio - peroni a foaf : Person ; +foaf : givenName " Silvio " ; +foaf : familyName " Peroni " ; +foaf : homepage & lt ; http :// www . essepuntato . it & gt ; ; +pro : holdsRoleInTime [ +a pro : RoleInTime ; +pro : withRole pro : author ; +pro : relatesToDocument & lt ;& gt ; +] . + +< script type =" application / ld + json " > +{ +" @context ": +{ +" nick ": " http :// xmlns . com / foaf /0.1/ nick " , +" sd ": " https :// w3id . org / scholarlydata / person /" +} , +" @id ": " sd : silvio - peroni " , +" nick ": [" S ." , " essepuntato "] +} + +It is worth noticing that RASH does not require any particular vocabulary for introducing +RDF statements, except three properties from schema.org (http://schema.org) for defining +author’s metadata (see the RASH documentation (https://rawgit.com/essepuntato/rash/ +master/documentation/index.html#metadata) for additional details). For instance, in +this document (in particular, in its RASH version (https://w3id.org/people/essepuntato/ +papers/rash-peerj2016.html)) we mainly use CiTO (Peroni & Shotton, 2012) and other +SPAR Ontologies (Peroni, 2014a) for creating citation statements about the paper itself, +but alternative and/or complementary vocabularies are freely usable as well. +THE RASH FRAMEWORK +One of the issues we had to face, and in general anyone has to face when proposing a new +markup language, was to provide tools for writing papers in RASH. It is undeniable that: +• not all the potential authors are able (or willing) to write scholarly articles in HTML, +even within the Web community; +• not all the potential authors are able (or willing) to manually add additional semantic +annotations, even within the Semantic Web community. +The authorial activity of writing an article by using RASH, but also any other new +Web-first format, must be supported by appropriate interfaces and tools to reach a broad +adoption. +A possible solution was to implement a native HTML authoring environment, so that +authors did not have to deal directly with the new language. However, this solution would +have forced all co-authors to use to the same tool and introduced a variety of technical +difficulties, since it is not easy to create and support a user friendly and flexible work +environment. We believe that a more liberal approach, that allows each author to keep +using her/his preferred tools, even off-line, is more practical. +This is the idea behind the RASH Framework (https://github.com/essepuntato/rash) +(Peroni, 2017): a set of specifications and writing/conversion/extraction tools for +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 16/35 +Figure 1 RASH Framework. The RASH Framework and its main components. +writing articles in RASH. In this section, we give a brief description of all the tools +we have developed in the framework. All the software components are distributed +under an ISC License (http://opensource.org/licenses/ISC), while the other components +are distributed under a Creative Commons Attribution 4.0 International License +(http://creativecommons.org/licenses/by/4.0/). A summary of the whole framework is +introduced in Fig. 1. +Validating RASH documents +RASH has been developed as a RelaxNG grammar (Clark & Makoto, 2001), i.e., a wellknown schema language for XML documents. All the markup items it defines are fully +compatible with the HTML5 specifications (Hickson et al., 2014). +In order to check whether a document is compliant with RASH, we developed a script +(https://github.com/essepuntato/rash/blob/master/tools/rash-check.sh) to enable RASH +users to check their documents simultaneously both against the specific requirements +in the RASH RelaxNG grammar and also against the HTML specification through W3C +Nu HTML Checker (http://validator.w3.org/nu/). This will hopefully help RASH users to +timely detect and fix any mistakes in their documents. This script also checks datatype +microsyntaxes. +In addition to the aforementioned script, we developed a Python application +(https://github.com/essepuntato/rash/tree/master/tools/rash-validator) that enables one to +validate RASH documents against the RASH grammar. This application makes also available +a Web interface for visualising all the validation issues retrieved in RASH documents. +Visualising RASH documents +The visualization of a RASH document is rendered by the browser by means of appropriate +CSS3 (http://www.w3.org/Style/CSS/specs.en.html) stylesheets (Atkins Jr, Etemad & +Rivoal, 2017) and Javascript developed for this purpose. +RASH adopts external libraries, such as Bootstrap (http://getbootstrap.com/) and JQuery +(http://jquery.com/), in order to provide the current visualisation and include additional +tools for the user. For instance, the footbar with statistics about the paper (i.e., number of +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 17/35 +8The layouts currently available are Webbased and Springer’s Lecture Note in +Computer Science (http://www.springer. +com/computer/lncs?SGWID=0-164-6- +793341-0)—the latter is based on the +Springer LNCS CSS included in dokieli +(http://dokie.li) (Capadisli et al., 2017). +words, figures, tables and formulas) and a menu to change the actual layout of the page8 +, +the automatic reordering of footnotes and references, the visualisation of the metadata of +the paper, etc. +Note that this kind of automatic rendering of paper items, such as references to a +bibliographic entry or a figure, reduce the cognitive effort of an author when writing a +RASH paper. For instance, a piece of text referencing a table, e.g., ‘‘as shown in Table 2’’, is +created without caring about the particular text to specify for that reference (‘‘Table 2’’ in +the example), since RASH prescribes to specify just an empty link to the object one wants +to refer to, as shown in the following excerpt: +

    ... as shown in ... +For these objects, the Javascript scripts decide which is the most suitable text to put there +according to the type of the item referenced. +Converting RASH into LaTeX styles +We spent some effort in preparing XSLT 2.0 documents (Kay, 2007) for converting +RASH documents into different LaTeX styles, such as ACM ICPS (http://www.acm. +org/sigs/publications/proceedings-templates) and Springer LNCS (http://www.springer. +com/computer/lncs?SGWID=0-164-6-793341-0), among the others. We believe this is +essential to foster the use of RASH within international events and to easily publish RASH +documents in the official LaTeX format currently required by the organisation committee +of such events. Obviously, the full adoption of RASH or any other Web-first format would +make these stylesheets not necessary but, currently, they are fundamental for the adoption +of the overall approach. +Producing RASH from ODT and DOCX +We also developed two XSLT 2.0 documents to perform conversion from Apache +OpenOffice documents (https://github.com/essepuntato/rash/blob/master/xslt/fromodt.xsl) and Microsoft Word documents (https://github.com/essepuntato/rash/blob/ +master/xslt/from-docx.xsl) into RASH documents. The RASH documentation provides a +detailed description of how to use Apache OpenOffice (https://rawgit.com/essepuntato/ +rash/master/documentation/rash-in-odt.odt) and Microsoft Word (https://rawgit. +com/essepuntato/rash/master/documentation/rash-in-docx.docx) for writing scientific +documents that can be easily converted to the RASH format. The standard features of these +two editors (e.g., styles, document properties, etc.), elements (e.g., lists, pictures, captions, +footnotes, hyperlinks, etc.) and facilities (e.g., mathematical editor, cross-reference editor, +etc.) can be used to produce fully compliant RASH documents. A web-based service, +for converting documents online (presented in ‘ROCS’) and two Java applications +for ODT (https://github.com/essepuntato/rash/tree/master/tools/odt2rash) and DOCX +(https://github.com/essepuntato/rash/tree/master/tools/docx2rash) documents (that can +be downloaded and used offline on the local machine) were developed to facilitate the +conversion process of Apache OpenOffice and Microsoft Word documents into the RASH +format. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 18/35 +Figure 2 ROCS. The architecture of ROCS. +In the past few years, as sort of alpha-testing, we have used these conversion approaches +with many internal projects in the Digital and Semantic Publishing Laboratory of +the Department of Computer Science and Engineering at the University of Bologna. +Moreover, also our co-authors and collaborators from different disciplines (e.g., business +and management, humanities, medicine, etc.) have successfully used this approach for +producing their documents, giving us a chance to have fruitful feedback, comments, and +suggestions. In particular, we have been able to convert with discrete success several ODT +and DOCX files of research papers, PhD theses, documentations, and project proposals +and deliverables. +ROCS +We created an online conversion tool called ROCS (RASH Online Conversion Service) +(http://dasplab.cs.unibo.it/rocs) (Di Iorio et al., 2016) for supporting authors in writing +RASH documents and preparing submissions that could be easily processed by journals, +workshops, and conferences. ROCS integrates the tools introduced in the previous sections. +The abstract architecture of the tool is shown in Fig. 2. ROCS allows converting either +an ODT document or a DOCX document, written according to specific guidelines, into +RASH and, then, into LaTeX according to the following layouts: Springer LNCS, ACM +IPCS, ACM Journal Large, PeerJ. Such guidelines, introduced in ‘Producing RASH from +ODT and DOCX’, are very simple and use only the basic features available in Apache +OpenOffice Writer and in Microsoft Word, without any external tool or plug-in. +ROCS allows users to upload four kinds of file, i.e., an ODT document, a DOCX +document, an HTML file compliant with RASH, and a ZIP archive which contains an +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 19/35 +9The source code and binaries of SPAR +Xtractor are available at https://github. +com/essepuntato/rash/tree/master/sources/ +spar-xtractor and https://github.com/ +essepuntato/rash/tree/master/tools/sparxtractor, respectively. +10The prefix po: stands for the namespace +http://www.essepuntato.it/2008/12/ +pattern#. +HTML file compliant with RASH and related files (i.e., CSSs, javascript files, fonts, +images). It returns a ZIP archive containing the original document plus all its converted +versions, i.e., RASH, if an ODT/DOCX file was given, and the LaTeX file. +The main advantage of having the paper both in RASH and in LaTeX is that it is fairly +easy for RASH to be adopted by workshops, conferences or journals. Since the program +committee, the reviewers, and the editors have also access to a LaTeX or a PDF version of +the paper, the RASH file is an addition that does not preclude any current workflows. Of +course, the hope is that the inherent advantages of an HTML-based format such as RASH +will eventually persuade stakeholders to adopt the HTML version whenever it is possible, +keeping the alternatives as fall-back options. +Enriching RASH documents with structural semantics +Another development of the RASH Framework concerns the automatic enrichment of +RASH documents with RDFa annotations defining the actual structure of such documents +in terms of the FRBR-aligned Bibliographic Ontology (FaBIO) (http://purl.org/spar/fabio) +and the Document Component Ontology (DoCO) (http://purl.org/spar/doco) (Constantin +et al., 2016). More in detail, we developed a Java application called SPAR Xtractor suite9 +. +SPAR Xtractor is designed as a one-click tool able to add automatically structural semantics +to a RASH document. SPAR Xtractor takes a RASH document as input and returns a new +RASH document where all its markup elements have been annotated with their actual +structural semantics by means of RDFa. The tool associates a set of FaBIO or DoCO types +with specific HTML elements. The set of HTML elements and their associations with +FaBIO or DoCO types can be customised according to specific needs of expressivity. The +default association provided by the current release of SPAR Xtractor is the following: +• the root html element is mapped to an individual of the class fabio:Expression +(http://purl.org/spar/fabio/Expression). The class fabio:Expression identifies the +specific intellectual or artistic form that a work takes each time it is realised; +• the body element is mapped to an individual of the class doco:BodyMatter +(http://purl.org/spar/doco/BodyMatter). The class doco:BodyMatter is the central +principle part of a document, it contains the real document content, and it is subdivided +hierarchically by means of sections; +• p elements are represented as individuals of the class doco:Paragraph (http: +//purl.org/spar/doco/Paragraph), i.e., self-contained units of discourse that deal with a +particular point or idea; +• figure elements containing the element img within a paragraph are represented as +individuals of the class doco:FigureBox (http://purl.org/spar/doco/FigureBox), which +is a space within a document that contains a figure and its caption; +• section elements are mapped to individuals of the class doco:Section (http: +//purl.org/spar/doco/Section), which represents a logical division of the text. Sections +can be organised according to a variable level of nested sub-sections. Accordingly, SPAR +Xtractor reflects this structural behaviour by representing the containment relation +by means of the object property po:contains (http://www.essepuntato.it/2008/12/ +pattern#contains) +10 +. For example, a certain section element with a nested section +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 20/35 +element produces two individuals of the class doco:Section (e.g., :section_outer +a doco:Section and :section_inner a doco:Section) related by the property +po:contains (e.g., section_outer po:contains :section_inner). +In addition to these semantic annotations, which come from the actual structure +of a document, the tool is also able to automatically detect sentences and annotate +them as individuals of the class doco:Sentence (http://purl.org/spar/doco/Sentence). A +doco:Sentence denotes an expression in natural language forming a single grammatical +unit. For the sentence detection task, SPAR Xtractor relies on the sentence detection +module of the Apache OpenNLP project (https://opennlp.apache.org/), which provides +a machine learning based toolkit for the processing of natural language text. By default, +SPAR Xtractor is released to support English only. However, it is possible to extend it with +new languages by adding their corresponding models for Apache OpenNLP, most of which +are available with an open licence (http://opennlp.sourceforge.net/models-1.5/). +We remark that the object property po:contains is used for representing any kind +of containment relation among the structural components that SPAR Xtractor deals +with. Hence, the usage of such a property is not limited to the individuals of the class +doco:Section only. In fact, the property po:contains can be used, for example, for +expressing the containment relation between a doco:BodyMatter and a doco:Section or +between a doco:Section and a doco:Sentence. For example, let us consider the following +code snippets that provide a sample HTML document. +< html > +... +< body > +... +< section > < h1 > A section +... +

    This is a sentence . This is another sentence of this paragraph .

    +... +< section > < h1 > A sub - section ... +... + +... + + +The HTML document in the snippet above is enriched by SPAR Xtractor resulting in +the document reported in the snippet below. +< html +resource =" expression " +typeof =" http :// purl . org / spar / fabio / Expression " > +... +< body resource =" body " +typeof =" http :// purl . org / spar / doco / BodyMatter " +property =" http :// www . essepuntato . it /2008/12/ pattern # contains " > +... +< section resource =" section_outer " +typeof =" http :// purl . org / spar / doco / Section " +property =" http :// www . essepuntato . it /2008/12/ pattern # contains " > +< h1 resource =" section_outer / title " +typeof =" http :// purl . org / spar / doco / SectionTitle " > +< span property =" http :// purl . org / spar / c4o / hasContent " > +A section + + +... +

    +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 21/35 +< span property =" http :// www . essepuntato . it /2008/12/ pattern # contains " +resource =" section_outer / paragraph -1/ sentence -1" +typeof =" http :// purl . org / spar / doco / Sentence " > +< span property =" http :// purl . org / spar / c4o / hasContent " > +This is a sentence . + + +< span property =" http :// www . essepuntato . it /2008/12/ pattern # contains " +resource =" section_outer / paragraph -1/ sentence -2" +typeof =" http :// purl . org / spar / doco / Sentence " > +< span property =" http :// purl . org / spar / c4o / hasContent " > +This is another sentence of this paragraph . + + +

    +... +< section resource =" section_inner " +typeof =" http :// purl . org / spar / doco / Section " +property =" http :// www . essepuntato . it /2008/12/ pattern # contains " > +< h1 resource =" section_inner / title " +typeof =" http :// purl . org / spar / doco / SectionTitle " " > +< span property =" http :// purl . org / spar / c4o / hasContent " > +A sub - section + + +... + +... + +... + + +Writing RASH documents with a native editor +A recent development of RASH is the RASH Javascript Editor (RAJE) (https://github.com/ +essepuntato/rash/tools/RAJE) (Spinaci et al., 2017), a multiplatform What You See Is What +You Get (WYSIWYG) word processor for writing scholarly articles in HTML, according +to the RASH format. In particular RAJE allows authors to write research papers in HTML +natively by means of a user-friendly interface, instead of writing raw markup with an IDE, +a text editor or any external word processor +RAJE guarantees to its users the benefits of a word processor combined with those given +by an HTML-based format, i.e., interactiveness, accessibility and easiness to be processed +by machine. In addition, RAJE uses the GitHub API (https://api.github.com/) so as to +allow authors to store their articles online, to keep track of changes by means of the GitHub +services, and to share the articles with others. +RASH AND SAVE-SD: AN EVALUATION +The true validation for RASH as a format for research papers rests on its adoption by +authors and workshops and its integration in the publishing process. For this reason, +RASH was first released in conjunction with the Semantics, Analytics, Visualisation: +Enhancing Scholarly Data (SAVE-SD 2015) workshop (http://cs.unibo.it/save-sd/2015/ +index.html), co-located with WWW 2015. It was subsequently adopted by a number of +workshops and conferences (https://github.com/essepuntato/rash/#rash-papers-acceptedin-scholarly-venues). In this section, we will present an evaluation of RASH based on +the analysis of questionnaires completed by authors and reviewers of SAVE-SD 2015 and +SAVE-SD 2016 (http://cs.unibo.it/save-sd/2016/index.html) workshops and a study on +RDF annotations in the relevant papers. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 22/35 +The users were asked to fill a questionnaire which included a section about their +background, a SUS questionnaire and six open questions about their experience with +RASH. We will first introduce the two workshops and then discuss and compare the +evaluation results. Finally, we will present an analysis of the most frequent vocabularies and +entities in RASH papers. The completed questionnaires and the outcomes of the analysis are +available at Osborne & Peroni (2016), while the RDF annotations considered in the study are +embedded in the RASH papers available in the SAVE-SD 2015 and SAVE-SD 2016 websites. +We used the online version of the RDFa 1.1 Distiller (https://www.w3.org/2012/pyRdfa/) +for extracting the RDF annotations from the RASH papers. +It is worth noting that in 2015 there were no converters in the RASH framework, and +ROCS was introduced immediately before SAVE-SD 2016. Thus, in both years authors +wrote RASH papers with plain text editors or XML editors, apart from one author that +used ROCS in 2016. In general, the authors appreciated RASH and the tools in the RASH +framework, even if the editing environment and the converters are still limited. +SAVE-SD 2015 and 2016 +SAVE-SD 2015 was organized by some of the authors of this paper with the aim of bringing +together publishers, companies, and researchers in order to bridge the gap between the +theoretical/academic and practical/industrial aspects in regard to scholarly data. It was thus a +multifaceted workshop which drew researchers from a number of heterogeneous fields, such +as Document and Knowledge Engineering, Semantic Web, Natural Language Processing, +Scholarly Communication, Bibliometrics and Human–Computer Interaction. Since many +of the interested researchers were keen on experimenting with novel technologies regarding +semantic publishing, it was a natural choice for the debut of RASH. For this reason, +SAVE-SD 2015 allowed authors to submit papers using either RASH or PDF, explicitly +encouraging authors to test the new format. To this end, the organisers introduced a special +award for the best submission in RASH, according to the quality of the markup, the number +of RDF statements defined in RDFa, and the number of RDF links to LOD datasets. The +possibility of submitting in RASH was also advertised on social media (e.g., Twitter (https: +//twitter.com/savesdworkshop)), Facebook (https://www.facebook.com/savesdworkshop)) +and during various international events (e.g., DL 2014 (http://www.city.ac.uk/digitallibraries-2014), EKAW 2014 (http://www.ida.liu.se/conferences/EKAW14/home.html), +FORCE 2015 (https://www.force11.org/meetings/force2015)). +The initiative had a substantial success: the workshop received six out of 23 submissions +in RASH and after the review process an additional author chose to prepare the cameraready paper in RASH. Out of these seven final submissions, three were research papers, one +was a position paper, and three posters/demo. These papers were submitted by 16 authors +from Switzerland, Italy, Germany, Netherlands, United Kingdom, Ireland, and the USA. +At the time of the workshop submission deadline, there were no public tools available +for converting other formats into RASH. However, the authors were able to self-learn it by +simply referring to the documentation page, confirming that computer scientists have no +particular problem in handling it directly. The conversion of the RASH submissions into +the ACM format requested by the Sheridan publisher (responsible for the publications of +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 23/35 +Table 3 User background for SAVE-SD 2015, SAVE-SD 2016. +Year MS Word OO Writer LaTeX HTML XML RelaxNG SW RDFa Turtle JSON-LD +2015 33% 33% 83% 83% 100% 67% 83% 100% 100% 50% +2016 57% 0% 71% 71% 71% 29% 57% 57% 57% 43% +AVR 40% 13% 67% 67% 73% 40% 60% 67% 77% 40% +all WWW proceedings) was handled by the organisers through a semi-automatic process. +In particular, they used the XSLT files introduced in ‘Converting RASH into LaTeX styles’ +and had to fix only a few layout misalignments. +Six authors and four reviewers involved in SAVE-SD 2015 participated in our evaluation. +SAVE-SD 2016 had the same characteristics and goals of the predecessor. In order to +give authors full freedom, the organizer decided to accept not only RASH, but any kind +of HTML-based format. Since it was not possible to handle the conversion of all possible +HTML-based format to the publisher layout, the authors of alternative formats were asked +to prepare a PDF of the camera-ready version according to the publisher needs. +SAVE-SD 2016 received 6 out of 16 submissions in RASH from 14 authors from Italy, +Sweden, Greece, Germany, Belgium, and the USA. In total, five out of the 14 accepted +papers were in RASH, including two full papers, two demos, and one position paper. Even if +no author chose to submit in other HTML-based formats, this possibility will be kept open +in future editions. Differently from the previous edition, the proceedings were published +as a dedicated LNCS volume. The conversions of RASH papers to the PDF documents in +Springer LNCS layout was automatically handled by ROCS. +As in the previous workshop, we evaluated RASH by conducting the same study (with +the same exact questions) on ten people. Seven authors of RASH papers and three reviewers +participated in the survey. +User background +It is useful to first assess the background of RASH pioneer users in term of their knowledge of +relevant technologies and software. For this reason, the first section of the survey included +a number of statements about the user expertize (e.g., ‘‘I have extensive experience in +writing academic papers with LaTeX’’) and allowed five response options, from ‘‘Strongly +Agree’’ to ‘‘Strongly Disagree’’. Table 3 shows the percentage of users who claimed to be +familiar with a range of technologies (by selecting ‘‘Agree’’ or ‘‘Strongly Agree’’). +In 2015, the authors were mainly from the Semantic Web community and therefore +familiar with technologies such as RDFa and Turtle. Most of them knew how to correctly +annotate an HTML file and understood the advantages of including semantic relationships +in the paper. They also commonly used LaTeX rather than Microsoft Word or OpenOffice +Writer. This suggests that they were acquainted with WYSIWYG editors and had experience +with complex formats. A qualitative analysis of the survey answers confirms this intuition; +for example, an author remarked: ‘‘I am used to writing papers in LaTeX so I do not want +to bother with formatting and in that sense RASH is similar’’. +In 2016 the situation changed and only 57% of the users were familiar with semantic +technologies. In addition, even if most of them knew how to use LaTex, the majority of +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 24/35 +them had experience also with Microsoft Word. It seems thus that RASH started to interest +also less technical users with different research backgrounds. +User survey +We assessed the strengths and weaknesses of RASH by means of six open questions. +We summarize here the answers of both authors and reviewers for the 2015 and 2016 +workshops. The reviewers answered only questions 2, 3, 4 and 5. Note that the questions +were exactly the same in both editions and none of the users partecipated in both the +surveys. +SAVE-SD 2015 survey +• [Q1] Why did you choose the RASH format for your paper? +Four authors answered that the main reason was to try it out, mostly because they +‘‘supported the idea of publishing academic papers as HTML’’ and were convinced +that ‘‘PDF should be replaced’’. Two of them added that they were motivated by the +possibility of adding semantic annotations to their papers. +• [Q2] How effectively did RASH support you in writing/reviewing the paper? +The majority of the authors suggested that some tasks, such as setting up the bibliography, +were still cumbersome. They added that the development of tools that could solve these +issues and hide the technical details from the common users would be very important +for a broader adoption. The reviewers remarked that their experience was very similar +to reviewing a paper in PDF format and did not present any particular challenge (e.g., +‘‘did not have many features that would distinguish it from a PDF’’, ‘‘it met all of my +needs and was easy to use’’). +• [Q3] What were the most useful features of RASH to help you writing/reviewing the +paper? +The authors listed a number of functionalities including the multiple graphical layouts +(two authors), the support of RDFa annotations (two) and the built-in validation +(one). The ability to display the paper according to different layouts was praised also by +reviewers. +• [Q4] What were the main weaknesses that RASH exhibited in supporting the writing/reviewing of the paper? +Most authors suggested that the handling of bibliography, figures and captions should +be improved. Half of them also pointed out that the manual insertion of semantic +annotations was cumbersome and a large amount of RDFa ‘‘introduces a bit of confusion +in the paper’’. An author observed that using the word count as a limit in the RASH +venues rather than the number of pages introduces the issue of possibly exceeding the +editor limits. Most reviewers did not report any problem in using RASH for assessing a +paper. However, one of them noted that it still lacked a menu for easily navigating the +different sections, as PDF files instead support. +• [Q5] Can you think of any additional features to be included in RASH that would have +helped you to write/review the paper? +The majority of authors suggested that the aforementioned limitations were mainly +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 25/35 +due to the use of an HTML editor, requesting the development of a WYSIWYG editor +or a tool for converting from ODT to RASH. A user also suggested developing a tool for +graphically showing the semantic annotations, as ‘‘what is linked to what, in order to +check the correctness of assertions’’ and a reviewer advised to implement a way to easily +access the different sections of the document. +• [Q6] Would you use RASH regularly for writing your academic papers? +Five out of six authors answered they would like to keep using RASH. Most of them, +however, added that this would also depend on the creation of a better editor and a +solid array of tools for managing technical details and converting standard formats for +writing a research paper to and from RASH. +SAVE-SD 2016 survey +• [Q1] Why did you choose the RASH format for your paper? +As with the 2015 results, the majority of the authors (four) claimed that they adopted +it for trying a new format, three authors because they were motivated by the workshop +and three because they actively support the ideas behind RASH. +• [Q2] How effectively did RASH support you in writing/reviewing the paper? +Five users wrote the papers directly in RASH and only one used Open Office and +then converted it with ROCS. In the first group, one user was positive, one neutral, +and three suggested the need for a WYSIWYG editor, since ‘‘writing in html is not +so effective’’ and ‘‘not everyone [of the co-authors] knew how to validate against the +schema’’. In particular, it was suggested the need for a Microsoft Word converter, since +the ODT produced by Microsoft Word could not be processed by ROCS. As in 2015, +the reviewers did not find many differences with respect to PDF papers. One of them +claimed to actually prefer RASH since it ‘‘makes better use of the page space’’. +• [Q3] What were the most useful features of RASH to help you writing/reviewing the paper? +The authors mentioned a variety of different features including the formatting semantics +(‘‘no worries about section and layout’’), the bibliographic reference management and +the ability to display the paper according to different layouts. A reviewer also praised the +ability to convert RASH to PDF. +• [Q4] What were the main weaknesses that RASH exhibited in supporting the writing/reviewing of the paper? +Differently from 2015, the authors had no particular problem with the handling of +bibliography, figures, and captions. However, most of them (five) remarked that directly +writing the HTML code was not trivial. Three of them suggested solving the problem by +introducing a WYSIWYG editor, while two of them suggested creating new converters +to translate LaTeX and Microsoft Word into RASH. One user also flagged that the +visualization of RASH document can change in different browsers. The reviewers, as in +2015, did not report any particular problem in using RASH. +• [Q5] Can you think of any additional features to be included in RASH that would have +helped you to write/review the paper? +Consistently with the aforementioned weaknesses and the 2015 results the users called +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 26/35 +11The authors who answered ‘‘Strongly +Agree’’ to the background questions +where classified as ‘‘Experts’’, the ones +who answered ‘‘Agree’’ as ‘‘Familiars’’, and +all the others as ‘‘Not familiar’’. +for the creation of a WYSIWYG editor (3) and a way to convert from LaTeX and +Microsoft Words (3). In addition, a user suggested a tool for automatically generating a +bibliography, similar to BibTeX. +• [Q6] Would you use RASH regularly for writing your academic papers? +Three authors asserted that they would be happy to keep using RASH, two of them +that they were ready to use it again, depending on its development, and only one was +negative about it. +RASH usability +We also performed a quantitative analysis of the usability of RASH, using the System +Usability Scale (SUS) questionnaire (Brooke, 1996). The scores are acceptable, though not +very high, especially if we consider that all authors but one edited RASH files directly with +text/XML editors. Users perceived even a ‘vanilla RASH’ as acceptable, though they need +more sophisticated converters as remarked in the open questions of the survey. +RASH yielded a mean score of 62.7 ± 11.9, slightly lower than the average SUS score +(68). However, SUS scores varied dramatically according to the person’s background. +Figure 3 shows the results of different categories of expertize11 in HTML, LateX, and +Semantic Web Technologies (SWT), which appear correlated with the average SUS scores +(respectively r = 0.78, 0.97, 0.99). Users with a strong expertize in LaTeX and SWT yielded +significantly better SUS scores than the other authors, while authors with HTML expertize +yielded only slightly better scores. For this reason, authors from 2015, who as previously +discussed had a higher expertize in these categories obtained an average SUS score of 69.6 +± 11.9, while the authors from 2016 yielded 57.1 ± 9.7. However, the difference is not +statistically significant because the two samples are small and the test power is low. +These results further confirm that most users with limited expertize in non-WYSIWYG +editors and semantic technologies find it unfeasible to write HTML directly, even in a +simplified form. +Analysis of RDF annotations in RASH documents +To complete the previous analysis, we also studied the nature of the semantic annotations +in RASH papers. We focused on a sample of 1,751 annotations obtained from 11 papers +published in SAVE-SD 2015 and 2016. The number of statements in a single paper was +found to range from 24 to 903, yielding a median value of 46 (25th percentile 34, 75th +percentile 175). We extracted all the RDF statements by running the W3C RDFa 1.1 Distiller +service (https://www.w3.org/2012/pyRdfa/) on each article. We then considered only the +statements that used http-based entities as predicates, or their objects if used for typing +resources. The data are organised in several CSV files and have been obtained by running +a Python script we developed for gathering the data used in this evaluation. The script and +all the aforementioned data have been made available at Osborne & Peroni (2016). +The first goal of the study was to determine the prevalent vocabularies and how much they +were used in the average paper. The left panel of Fig. 4A shows the common vocabularies. +Schema.org and PRISM are actually enforced by RASH: the first is used for standard +metadata such as emails, affiliations and organization names and the second for keywords. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 27/35 +Figure 3 Expertize vs. perceived usability. User expertize in HTML, LaTeX and Semantic Web Technologies versus average SUS score. +Figure 4 Average number of statements per vocabulary. Percentage of papers and average number of +statements using a vocabulary. +In addition, a quantity of RDF statements was automatically extracted when processing +DPUB-ARIA roles (Garrish et al., 2016). Thus we will not consider such vocabularies in the +rest of the evaluation. The other common vocabularies are Dublin Core, which appears in +82% of the papers, FOAF (27%) and the SPAR ontologies (Peroni, 2014a), such as FABIO +(36%) and CITO (27%) (Peroni & Shotton, 2012). The right panel of Fig. 4B illustrates the +average number of statement for each of these vocabularies. Dublin Core characterizes the +highest number of annotation (9.4), followed by FOAF (7.4) and FABIO (6.4). +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 28/35 +Figure 5 Average number of entities per vocabulary. Average percentage of vocabulary entities in a +RASH paper (excluding the mandatory ones). +We also performed a more fine-grained analysis considering the amount of entities +of these vocabularies within the various RDF statements. The goal was to understand +the percentage of contribution that the various entities provide (on average) to the +statements of the document analysed. As expected, the entities that contribute to about +60% of the statements are either those that are obliged by RASH (prism:keyword 6.9%, +schema:affiliation 5.7%, schema:name 5.3%, and schema:email 4.7%) or those +automatically extracted by processing the DPUB roles included, mandatorily, in the +documents (xhtml:role 38%). Excluding these, the following top ten entities, shown in +Fig. 5, cover about 20% of the statements. +Among these entities, there are three classes describing three diverse but +interlinked kinds of objects, i.e., people (foaf:Person) authoring a research work +(fabio:ResearchPaper) and the sentences (doco:Sentence) therein contained. The +other seven entities are three object properties—two of them (pav:authoredBy and +pattern:contains) provide the links between the three aforementioned classes, while +the other, i.e., cito:cites, describes citation links between papers—and four data +properties—used for providing additional metadata about the entities (dcterms:title, +dcterms:bibliographicCitation, foaf:name) and for describing bunches of textual +content of the sentences (c4o:hasContent). +Discussion +The evaluation study confirmed that RASH is ready to be adopted in workshops, +conferences, and journals and can be quickly learnt by researchers who are already familiar +with HTML. However, it also highlighted some issues in the adoption of HTML formats, +especially by less technically savvy users. +Interestingly, the 2016 survey showed that RASH is being tried also by users unfamiliar +with semantic web technology. While the expansion of the user base represents a positive +development, it also yields a number of challenges. The mass of authors accustomed to +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 29/35 +WYSIWYG editors such as Microsoft Word or OpenOffice Writer, tend to have difficulties +with HTML editors. In addition, since research papers are often written by multiple authors, +it is usually simpler to use the most well-known solutions. For these reasons, we need to +offer the authors who currently cannot or do not want to change their workflow the tools +for converting their favourite format in to RASH and annotate the resulting paper. While +ODT was a first step in this direction, it is imperative to be also able to process DOCX +(which has been already implemented) and LaTeX. A second important issue is that authors +who are not expert in semantic technologies can find it hard to correctly annotate their +papers. Hence, we also need to use and/or develop simple tools for helping authors in this +phase—such as the OpenLink Structured Data Editor (http://osde.openlinksw.com/). The +introduction of these solutions will be critical for motivating users to adopt HTML-based +approaches and for creating a robust framework that can be used by expert and common +users alike. +As far as the analysis of the RDF annotations in RASH documents is concerned, +the outcomes highlighted that the users decided to adopt a few well-known standard +vocabularies, rather than using a multiplicity of different solutions. The most used +vocabularies other than Schema.org and PRISM (used by default by RASH), are Dublin +Core, FOAF, and the SPAR ontologies. However, the outcomes of our evaluation generally +show a quite low number of statements specified by the authors. This behaviour could +derive from the lack of appropriate support for the annotation of RASH papers with RDF +data. In addition, this low number seems not to be related to the research community the +authors work in. For instance, several of the papers written by Semantic Web experts do +not include any RDF statements other than those enforced by RASH. +CONCLUSIONS +In this paper we have introduced RASH, a markup language defined as a subset of +HTML for writing scientific articles, and the RASH Framework, a set of specifications +and tools for writing articles in RASH. In particular, we have discussed the rationale +behind the development of RASH, and we have presented the language and the +validation/visualisation/conversion/extraction/editing tools developed so far. +The goal of the paper was also to investigate the applicability and the potentialities of +RASH, though the evaluation of its adoption in two SAVE-SD workshops. To the best of our +knowledge, this is the first empirical evaluation on the adoption of HTML-based languages +for writing scientific papers. The experiments proved that RASH can be successfully used +for workshops and conferences, with a good acceptance by the authors and a smooth +integration in the existing publishing process. +As immediate future developments, we plan to develop tools for automating the process +of semantic enrichment of RASH documents. For instance, we are currently working on +the automatic identification of section rhetorics and citation functions so as to describe +them according to two SPAR Ontologies (Peroni, 2014a), i.e., the Document Component +Ontology (DoCO) (http://purl.org/spar/doco) and the Citation Typing Ontology (CiTO) +(http://purl.org/spar/cito) respectively. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 30/35 +We intend to further develop the RASH framework. In first instance, we are working +on more sophisticated authoring tools and converters. For instance, we are currently +developing additional XSLT documents in order to convert RASH documents into +several different LaTeX formats for scholarly communications—such as IEEE conference +proceedings and IOS Press—as well as into EPUB for easing its (offline) portability in +mobile devices, which is something that would guarantee a better archival and accessibility +of the whole document including its figures, CSS files, and JS scripts. +We are also experimenting techniques for automatically generating accessible graphs +from data contained in a referenced CSV file. Some results of this experimentation are +already discussed in Di Mirri et al. (2017). +ACKNOWLEDGEMENTS +We would like to thank Sarven Capadisli (http://csarven.ca/) for our inspiring discussions +on the topic, all the authors and the reviewers of the accepted papers of the SAVE-SD +2015 (http://cs.unibo.it/save-sd/2015/accepted-papers.html) and the SAVE-SD 2016 (http: +//cs.unibo.it/save-sd/2016/accepted-papers.html) workshops for having provided us useful +suggestions and insights for improving RASH and the related tools, as well as all the other +early adopters of RASH. We would also like to thank the other two organisers of the past +two edition of SAVE-SD, i.e., Jun Zhao (https://sites.google.com/site/junzhaohome/) and +Alejandra Gonzalez-Beltran (http://www.oerc.ox.ac.uk/people/alejandra) for supporting +us in the adoption of RASH as possible HTML submission format. In addition, we are +particularly grateful to all the GitHub users that suggested and introduced new features +to RASH and developed the tools included in its Framework: Alberto Nicoletti (https: +//twitter.com/illbexyz), Vincenzo Rubano (https://twitter.com/titengodocchio), Mike +Smith (https://sideshowbarker.net/), Gianmarco Spinaci (https://twitter.com/spino9330), +Ruben Verborgh (http://ruben.verborgh.org). +ADDITIONAL INFORMATION AND DECLARATIONS +Funding +The authors received no funding for this work. +Competing Interests +Silvio Peroni is an Academic Editor for PeerJ Computer Science. +Author Contributions +• Silvio Peroni conceived and designed the experiments, performed the experiments, +contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or +tables, performed the computation work. +• Francesco Osborne conceived and designed the experiments, performed the experiments, +analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, +prepared figures and/or tables, performed the computation work. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 31/35 +• Angelo Di Iorio wrote the paper, prepared figures and/or tables, reviewed drafts of the +paper. +• Andrea Giovanni Nuzzolese and Francesco Poggi contributed reagents/materials/analysis +tools, wrote the paper, prepared figures and/or tables, performed the computation work. +• Fabio Vitali and Enrico Motta reviewed drafts of the paper. +Data Availability +The following information was supplied regarding data availability: +Osborne, Francesco; Peroni, Silvio (2017): Outcomes of SAVE-SD 2015 and 2016 +questionnaires on RASH and analysis of RDF annotations in the RASH papers. figshare. +https://dx.doi.org/10.6084/m9.figshare.3980463.v5. +Supplemental Information +Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ +peerj-cs.132#supplemental-information. +REFERENCES +Alexander C. 1979. The timeless way of building. Oxford: Oxford University Press. +Atkins Jr T, Etemad EJ, Rivoal F. 2017. CSS Snapshot 2017. W3C Working Group Note +31 January 2017. World Wide Web Consortium. Available at https://www.w3.org/ +TR/ css3-roadmap/. +Berjon R, Ballesteros S. 2015. What is scholarly HTML? Available at http:// scholarly. +vernacular.io/. +Bourne PE, Clark T, Dale R, De Waard A, Herman I, Hovy EH, Shotton D. 2011. +FORCE11 White Paper: improving The Future of Research Communications and +e-Scholarship. White Paper, 28 October 2011. FORCE11. Available at https://www. +force11.org/white_paper. +Brooke J. 1996. SUS-A quick and dirty usability scale. Usability Evaluation in Industry +189(194):4–7. +Capadisli S, Guy A, Verborgh R, Lange C, Auer S, Berners-Lee T. 2017. Decentralised +authoring, annotations and notifications for a read-write web with dokieli. In: +Proceedings of the 17th international conference on web engineering. Cham: Springer, +469–481 DOI 10.1007/978-3-319-60131-1_33. +Capadisli S, Riedl R, Auer S. 2015. Enabling accessible knowledge. In: Proceedings of +the 2015 International conference for e-democracy and open government (CeDEM +2015). Krems: Universität Krems. Available at http:// csarven.ca/ enabling-accessibleknowledge. +Carlisle D, Ion P, Miner R. 2014. Mathematical Markup Language (MathML) Version +3.0. 2nd edition. W3C Recommendation 10 April 2014. World Wide Web Consortium. Available at http://www.w3.org/ TR/MathML3/. +Clark J, Makoto M. 2001. RELAX NG specification. Committee specification, 3 December 2001. OASIS. Available at http:// relaxng.org/ spec-20011203.html. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 32/35 +Constantin A, Peroni S, Pettifer S, Shotton D, Vitali F. 2016. The Document Component Ontology (DoCO). Semantic Web 7(2):167–181 DOI 10.3233/SW-150177. +Cyganiak R, Wood D, Lanthaler M. 2014. RDF 1.1 concepts and abstract syntax. W3C +recommendation 25 February 2014. World Wide Web Consortium. Available at +http://www.w3.org/ TR/ rdf11-concepts/. +Di Iorio A, González Beltrán A, Osborne F, Peroni S, Poggi F, Vitali F. 2016. It ROCS!: +the RASH online conversion service. In: WWW (Companion Volume) 2016. New +York: ACM, 25–26 DOI 10.1145/2872518.2889408. +Di Iorio A, Peroni S, Poggi F, Vitali F. 2012. A first approach to the automatic +recognition of structural patterns in XML documents. In: Proceedings of the +2012 ACM symposium on document engineering. New York: ACM, 85–94 +DOI 10.1145/2361354.2361374. +Di Iorio A, Peroni S, Poggi F, Vitali F. 2014. Dealing with structural patterns of XML +documents. Journal of the American Society for Information Science and Technology +65(9):1884–1900 DOI 10.1002/asi.23088. +Di Iorio A, Peroni S, Poggi F, Vitali F, Shotton D. 2013. Recognising document components in XML-based academic articles. In: Proceedings of the 2013 ACM symposium +on document engineering. New York: ACM, 181–184 DOI 10.1145/2494266.2494319. +Di Mirri S, Peroni S, Rubano V, Salomoni P, Vitali F. 2017. Towards accessible graphs in +HTML-based scientific articles. In: Proceedings of the 2nd international workshop on +accessible devices and services (ADS 2017). IEEE DOI 10.1109/CCNC.2017.7983287. +Diggs J, Craig J, McCarron S, Cooper M. 2015. Accessible rich internet applications +(WAI-ARIA) 1.1. W3C Candidate Recommendation 27 October 2016. World Wide +Web Consortium. Available at http://www.w3.org/ TR/wai-aria-1.1/. +Gamma E, Helm R, Johnson R, Vlissides J. 1994. Patterns: elements of reusable objectoriented software. New York: Addison-Wesley. +Gandon F, Schreiber G. 2014. RDF 1.1 XML syntax. W3C recommendation 25 February +2014. World Wide Web Consortium. Available at https://www.w3.org/ TR/ rdfsyntax-grammar/. +Gao S, Sperberg-McQueen CM, Thompson HS. 2012. W3C XML schema definition +language (XSD) 1.1 Part 1: structures. W3C recommendation, 5 April 2012. World +Wide Web Consortium. Available at https://www.w3.org/ TR/ xmlschema11-1/. +Garrish M, Siegman T, Gylling M, McCarron S. 2016. Digital publishing WAI-ARIA +module 1.0. W3C candidate recommendation 15 December 2016. World Wide Web +Consortium. Available at https://www.w3.org/ TR/dpub-aria-1.0/. +Hickson I, Berjon R, Faulkner S, Leithead T, Doyle Navara E, O’Connor E, Pfeiffer +S. 2014. HTML5: a vocabulary and associated APIs for HTML and XHTML. W3C +recommendation 28 October 2014. World Wide Web Consortium. Available at +http://www.w3.org/ TR/ html5/. +JTC1/SC34 WG 4. ISO/IEC 29500-1:2011 - Information technology - Document +description and processing languages - Office Open XML File Formats - Part +1: Fundamentals and Markup Language Reference. 2011. Geneva: International +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 33/35 +Organization for Standardization. Available at http://www.iso.org/iso/iso_catalogue/ +catalogue_tc/ catalogue_detail.htm?csnumber=59575. +JTC1/SC34 WG 6. ISO/IEC 26300:2006 - Information technology - Open Document +Format for Office Applications (OpenDocument) v1.0. 2006. Geneva: International +Organization for Standardization. Available at http://www.iso.org/iso/iso_catalogue/ +catalogue_tc/ catalogue_detail.htm?csnumber=43485. +Kay M. 2007. XSL transformations (XSLT) version 2.0. W3C recommendation 23 +January 2007. World Wide Web Consortium. Available at http://www.w3.org/ TR/ +xslt20/. +Lin TTY, Beales G. 2015. ScholarlyMarkdown Syntax Guide. Guide, 31 January 2015. +Available at http:// scholarlymarkdown.com/ Scholarly-Markdown-Guide.html. +National Information Standards Organization. 2012. JATS: journal article tag suite. +American national Standard No. ANSI/NISO Z39.96-2012, 9 August 2012. Available +at http://www.niso.org/ apps/ group_public/download.php/ 10591/ z39.96-2012.pdf . +Osborne F, Peroni S. 2016. Outcomes of SAVE-SD 2015 and 2016 questionnaires on RASH and analysis of RDF annotations in RASH papers. Figshare. +DOI 10.6084/m9.figshare.3980463. +Peroni S. 2014a. The semantic publishing and referencing ontologies. In: Semantic web +technologies and legal scholarly publishing. Cham: Springer, 121–193. +Peroni S. 2014b. Semantic web technologies and legal scholarly publishing. In: Law, +governance and technology series 15. Cham: Springer. +Peroni S. 2017. RASH framework 0.6.1. Zenodo DOI 10.5281/zenodo.815603. +Peroni S, Shotton D. 2012. FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semantics 17:33–43 DOI 10.1016/j.websem.2012.08.001. +Pettifer S, McDermott P, Marsh J, Thorne D, Villeger A, Attwood TK. 2011. Ceci +n’est pas un hamburger: modelling and representing the scholarly article. Learned +Publishing 24(3):207–220 DOI 10.1087/20110309. +Prud’hommeaux E, Carothers G. 2014. Turtle—Terse RDF triple language. W3C +recommendation 25 February 2014. World Wide Web Consortium. Available at +http://www.w3.org/ TR/turtle/. +Raggett D, Le Hors A, Jacobs I. 1999. HTML 4.01 specification. W3C recommendation, +24 December 1999. World Wide Web Consortium. Available at http://www.w3.org/ +TR/ html401/. +Shotton D, Portwin K, Klyne G, Miles A. 2009. Adventures in semantic publishing: +exemplar semantic enhancements of a research article. PLOS Computational Biology +5(4):e1000361 DOI 10.1371/journal.pcbi.1000361. +Spinaci G, Peroni S, Di Iorio A, Poggi F, Vitali F. 2017. The RASH Javascript editor +(RAJE)—a wordprocessor for writing Web-first scholarly articles. In: Proceeding of +the 17th ACM symposium on document engineering (DocEng 2017). New York: ACM, +85–94 DOI 10.1145/3103010.3103018. +Sporny M. 2015. HTML+RDFa 1.1: support for RDFa in HTML4 and HTML5. W3C +recommendation 17 March 2015. World Wide Web Consortium. Available at http: +//www.w3.org/ TR/ rdfa-in-html/. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 34/35 +Sporny M, Kellogg G, Lanthaler M. 2014. JSON-LD 1.0—a JSON-based serialization for +linked data. W3C Recommendation 16 January 2014. World Wide Web Consortium. +Available at https://www.w3.org/ TR/json-ld/. +Walsh N. 2009. The DocBook Schema Version 5.0. OASIS Standard, 1 November 2009. +Burlington: Organization for the Advancement of Structured Information Standards. +Available at http://docs.oasis-open.org/docbook/ specs/docbook-5.0-spec-os.html. +Peroni et al. (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.132 35/35