Matches in Nanopublications for { ?s <http://purl.org/spar/c4o/hasContent> ?o ?g. }
- paragraph-1 hasContent snippet-paragraph-1 assertion.
- abstract hasContent "In this paper we examine the use of crowdsourcing as a means to master Linked Data quality problems that are difficult to solve automatically. We base our approach on the analysis of the most common errors encountered in Linked Data sources, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and compare different crowdsourcing approaches to identify these Linked Data quality issues, employing the DBpedia dataset as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds’ complementary aptitudes in quality issue detection. The results show that a combination of the two styles of crowdsourcing is likely to achieve more efficient results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of Linked Data." assertion.
- paragraph hasContent "Many would consider Linked Data (LD) to be one of the most important technological trends in data man- agement of the last decade [16]. However, seamless consumption of LD in applications is still very lim- ited given the varying quality of the data published in the Linked Open Data (LOD) Cloud [18,44]. This is the result of a combination of data- and process- related factors. The data sets being released into the LOD Cloud are – apart from any factual flaws they may contain – very diverse in terms of formats, struc- ture, and vocabulary. This heterogeneity and the fact that some kinds of data tend to be more challenging to lift to RDF than others make it hard to avoid errors, especially when the translation happens automatically." assertion.
- paragraph hasContent "Simple issues like syntax errors or duplicates can be easily identified and repaired in a fully automatic fash- ion. However, data quality issues in LD are more challenging to detect. Current approaches to tackle these problems still require expert human intervention, e.g., for specifying rules [14] or test cases [21], or fail due to the context-specific nature of quality assessment, which does not lend itself well to general workflows and rules that could be executed by a computer pro- gram. In this paper, we explore an alternative data cu- ration strategy, which is based on crowdsourcing." assertion.
- paragraph hasContent "Crowdsourcing [19] refers to the process of solving a problem formulated as a task by reaching out to a large network of (often previously unknown) people. One of the most popular forms of crowdsourcing are ‘microtasks’ (or ‘microwork’), which consists on di- viding a task into several smaller subtasks that can be independently solved. Conditional on the tackled prob- lem, the level of task granularity can vary (microtasks whose results need to be aggregated vs. macrotasks, which require filtering to identify the most valuable contributions); as can the incentive structure (e.g., pay- ments per unit of useful work vs. prizes for top par- ticipants in a contest). Another major design decision in the crowdsourcing workflow is the selection of the crowd. While many (micro)tasks can be performed by untrained workers, others might require more skilled human participants, especially in specialized fields of expertise, such as LD. Of course, expert intervention usually comes at a higher price; either in monetary re- wards or in the form of effort to recruit participants in another setting, such as volunteer work. Microtask crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) 1 on the other hand offer a formidable and readily-available workforce at relatively low fees." assertion.
- paragraph hasContent "In this work, we crowdsource three specific LD quality issues. We did so building on previous work of ours [43] which analyzed common quality prob- lems encountered in Linked Data sources and classi- fied them according to the extent to which they could be amenable to crowdsourcing. The first research ques- tion explored is hence: RQ1: Is it feasible to detect quality issues in LD sets via crowdsourcing mecha- nisms? This question aims at establishing a general un- derstanding if crowdsourcing approaches can be used to find issues in LD sets and if so, to what degree they are an efficient and effective solution. Secondly, given the option of different crowds, we formulate RQ2: In a crowdsourcing approach, can we employ unskilled lay users to identify quality issues in RDF triple data or to what extent is expert validation needed and desirable? As a subquestion to RQ2, we also examined which type of crowd is most suitable to detect which type of quality issue (and, conversely, which errors they are prone to make). With these questions, we are interested (i) in learning to what extent we can exploit the cost- efficiency of lay users, or if the quality of error detec- tion is prohibitively low. We (ii) investigate how well experts generally perform in a crowdsourcing setting and if and how they outperform lay users. And lastly, (iii) it is of interest if one of the two distinct approaches performs well in areas that might not be a strength of the other method and crowd." assertion.
- paragraph hasContent "To answer these questions, we (i) first launched a contest that acquired 58 experts knowledgeable in Linked Data to find and classify erroneous RDF triples from DBpedia (Section 4.1). They inspected 68, 976 triples in total. These triples were then (ii) submitted as paid microtasks on MTurk to be examined by workers on the MTurk platform in a similar way (Section 4.2). Each approach (contest and paid microtasks) makes several assumptions about the audiences they address (the ‘crowd’) and their skills. This is reflected in the design of the crowdsourcing tasks and the related in- centive mechanisms. The results of both crowds were then compared to a manually created gold standard." assertion.
- paragraph hasContent "The results of the comparison of experts and turkers, as discussed in Section 5, indicate that (i) untrained crowdworkers are in fact able to spot certain quality issues with satisfactory precision; that (ii) experts per- form well locating two but not the third type of qual- ity issues given, and that lastly (iii) the two approaches reveal complementary strengths." assertion.
- paragraph hasContent "Given these insights, RQ3 was formulated: How can we design better crowdsourcing workflows using lay users or experts for curating LD sets, beyond one-step solutions for pointing out quality flaws? To do so, we adapted the crowdsourcing pattern known as Find-Fix- Verify, which has been originally proposed by Bern- stein et al. in [3]. Specifically, we wanted to know: can (i) we enhance the results of the LD quality issue de- tection through lay users by adding a subsequent step of cross-checking (Verify) to the initial Find stage? Or is it (ii) even more promising to combine experts and lay workers by letting the latter Verify the results of the experts’ Find step, hence drawing on the crowds’ complementary skills for deficiency identification we recognized before?”" assertion.
- paragraph hasContent "Accordingly, the results of both Find stages (ex- pert and workers) – in the form of sets of triples iden- tified as incorrect, marked with the respective errors – were fed into a subsequent Verify step, carried out by MTurk workers (Section 4.3). The task consisted solely of the rating of a formerly indicated quality is- sue for a triple as correctly or wrongly assigned. This Verify step was, in fact, able to improve the preci- sion of both Find stages substantially. In particular, the experts’ Find stage results could be improved to precision levels of around 0.9 in the Verify stage for two error types which showed to score much lower for an expert-only Find approach. The worker-worker Find-Verify strategy yielded also better results than the Find-only worker approach, and for one error type even reached slightly better precision than the expert-worker model. All in all, we show that (i) a Find-Verify combination of experts and lay users is likely to pro- duce the best results, but that (ii) they are not superior to expert-only evaluation in all cases. We demonstrate also that (iii) lay users-only Find-Verify approaches can be a viable alternative for detection of LD qual- ity issues if experts are not available and that they cer- tainly outperform Find-only lay user workflows." assertion.
- paragraph hasContent "Note that we did not implement a Fix step in this work, as correcting the greatest part of the found errors via crowdsourcing is not the most cost-efficient method of addressing these issues. Thus, we argue in Section 4, a majority of errors can and should be addressed already at the level of individual wrappers leveraging datasets to LD." assertion.
- paragraph hasContent "To understand the strengths and limitations of crowdsourcing in this scenario, we further executed automated baseline approaches to compare them to the results of our crowdsourcing experiments. We show that while they may be amenable to pre-filtering RDF triple data for ontological inconsistencies (thus poten- tially decreasing the amount of cases necessary to be browsed in the Find stage), a substantial part of quality issues can only be addressed via human intervention." assertion.
- paragraph hasContent "This paper is an extension to previous work of ours [1], in which we presented the results of combining LD experts and lay users from MTurk when detecting quality issues in DBpedia. The novel contributions of our current work can be summarized as follows:" assertion.
- section-introduction-contributions-title hasContent "Contributions" assertion.
- paragraph hasContent "In Section 2, we discuss the type of LD quality is- sues that are studied in this work. Section 3 briefly in- troduces the crowdsourcing methods and related con- cepts that are used throughout the paper. Our approach is presented in Section 4, and is empirically evaluated in Section 5. In Section 6 we summarize the findings of our experimental study and provide answers to the for- mulated research questions. Related work is discussed in Section 7. Conclusions and future work are pre- sented in Section 8." assertion.
- section-introduction-structure-of-the-paper-title hasContent "Structure of the paper" assertion.
- section-number hasContent "1" assertion.
- section-introduction-title hasContent "Introduction" assertion.
- paragraph hasContent "The Web of Data spans a network of data sources of varying quality. There are a large number of high- quality data sets, for instance, in the life-science do- main, which are the result of decades of thorough curation and have been recently made available as Linked Open Data 2 . Other data sets, however, have been (semi-)automatically translated into RDF from their primary sources, or via crowdsourcing in a decen- tralized process involving a large number of contrib- utors, for example DBpedia [23]. While the combina- tion of machine-driven extraction and crowdsourcing was a reasonable approach to produce a baseline ver- sion of a greatly useful resource, it was also the cause of a wide range of quality problems, in particular in the mappings between Wikipedia at" assertion.
- paragraph hasContent "Our analysis of Linked Data quality issues focuses on DBpedia as a representative data set for the broader Web of Data due to the diversity of the types of er- rors exhibited and the vast domain and scope of the data set. In our previous work [44], we compiled a list of data quality dimensions (criteria) applica- ble to Linked Data quality assessment. Afterwards, we mapped these dimensions to DBpedia [43]. A sub-set of four dimensions of the original framework were found particularly relevant in this setting: Ac- curacy, Relevancy, Representational-Consistency and Interlinking. To provide a comprehensive analysis of DBpedia quality, we further divided these four cate- gories of problems into sub-categories. For the purpose of this paper, from these categories we chose the fol- lowing three triple-level quality issues." assertion.
- paragraph hasContent "Object incorrectly/incompletely extracted. Consider the triple: (dbpedia:Rodrigo Salinas, dbpedia-owl:birthPlace, dbpe- dia:Puebla F.C.) . The DBpedia resource is about the person ‘Rodrigo Salinas’, with the incorrect value of the birth place. Instead of extracting the name of the city or country from Wikipedia, the stadium name Puebla F.C , is extracted." assertion.
- paragraph hasContent "Datatype or language tag incorrectly extracted. This category refers to triples with an incorrect datatype for a typed literal. For example, consider the triple: (dbpe- dia:Oreye, dbpedia-owl:postalCode, “4360”@en) . The datatype of the literal “4360” is incorrectly identified as english instead of integer." assertion.
- paragraph hasContent "Incorrect link. This category refers to RDF triples whose association between the subject and the object is incorrect. Erroneous interlinks can associate values within a dataset or between several data sources. This category of quality issues also includes links to external Web sites or other external data sources such as Wikimedia, Freebase, GeoSpecies or links generated via the Flickr wrapper are incorrect; that is, they do not show any related content pertaining to the resource." assertion.
- paragraph hasContent "These categories of quality problems occur pervasively in DBpedia. These problems might be present in other data sets which are extracted in a similar fashion as DBpedia. Given the diversity of the situations in which they can be instantiated (broad range of datatypes and object values) and their sometimes deeply contextual character (interlinking), assessing them automatically is challenging. In the following we explain how crowdsourcing could support quality assessment processes." assertion.
- section-number hasContent "2" assertion.
- section-2-title hasContent "Linked Data Quality Issues" assertion.
- paragraph hasContent "The term crowdsourcing was first proposed by Howe [19] that consists on a problem-solving mechanism in which a task is performed by an “an undefined (and generally large) network of people in the form of an open call.” Nowadays, many different forms of crowdsourcing have emerged, e.g., microtask, contests, macrotask, crowdfunding, among others; each form of crowdsourcing is designed to target particular types of problems and reaching out to different crowds. In the following we briefly describe contest- based and microtask crowdsourcing, the two crowdsourcing methods studied in this work." assertion.
- paragraph hasContent "A contest reaches out to a crowd to solve a given problem and rewards the best ideas. It exploits competition and intellectual challenge as main drivers for participation. The idea, originating from open innovation, has been employed in many domains, from creative industries to sciences, for tasks of varying complexity (from designing logos to building sophisticated algorithms). In particular, contests as means to successfully involve experts in advancing science have a long-standing tradition in research, e.g., the Darpa challenges 3 and NetFlix. 4 Usually, contests as crowdsourcing mechanisms are open for a medium to long period of time in order to attract high quality contributions. Contests may apply different reward models, but a common modality is to define one main prize for the contest winner." assertion.
- paragraph hasContent "We applied this contest-based model to mobilize an expert crowd consisting of researchers and Linked Data enthusiasts to discover and classify quality issues in DBpedia. The reward mechanism applied in this contest was “one-participant gets it all”. The winner was the participant who covered the highest number of DBpedia resources." assertion.
- paragraph hasContent "This form of crowdsourcing is applied to problems which can be broken down into smaller units of work (called ‘microtasks’). Microtask crowdsourcing works best for tasks that rely primarily on basic human abilities, such as visual and audio cognition or natural language understanding, and less on acquired skills (such as subject-matter knowledge)." assertion.
- paragraph hasContent "To be more efficient than traditional outsourcing (or even in-house resources), microtasks need to be highly parallelized. This means that the actual work is executed by a high number of contributors in a decentralized fashion; 5 this not only leads to significant improvements in terms of time of delivery, but also offers a means to cross-check the accuracy of the answers (as each task is typically assigned to more than one person). Collecting answers from different workers allow for techniques such as majority voting (or other aggregation methods) to automatically identify accurate responses. The most common reward model in microtask crowdsourcing implies small monetary payments for each worker who has successfully solved a task." assertion.
- paragraph hasContent "In our work, we used microtask crowdsourcing as a fast and cost-efficient way to examine the three types of DBPedia errors described in Section 2. We provided specific instructions to workers about how to assess RDF triples according to the three previous quality issues. We reached out to the crowd of the microtask marketplace Amazon Mechanical Turk (MTurk). In the following we present a summary of the relevant MTurk terminology:" assertion.
- paragraph hasContent "The Find-Fix-Verify pattern [3] consists on dividing a complex human task into a series of simpler tasks that are carried out in a three-stage process. Each stage in the Find-Fix-Verify pattern corresponds to a verifi- cation step over the outcome produced in the immediate previous stage. The first stage of this crowdsourcing pattern, Find, asks the crowd to identify portions of data that require attention depending on the task to be solved. In the second stage, Fix, the crowd corrects the elements belonging to the outcome of the previous stage. The Verify stage corresponds to a final quality control iteration." assertion.
- paragraph hasContent "Originally, this crowdsourcing pattern was introduced in Soylent [3], a human-enabled word processing interface that contacts microtask workers to edit and improve parts of a document. The tasks studied in Soylent include: text shortening, grammar check, and unifying citation formatting. For example, in the Soylent text shortening task, microtasks workers in Find stage are asked to to identify portions of text that can potentially be reduced in each paragraph. Candidate portions that meet certain consensus degree among workers move on to the next step. In the Fix stage, workers must shorten the previously identified portions of paragraphs. All the rewrites generated are assessed by workers to select the most appropriate one without changing the meaning of the original text." assertion.
- paragraph hasContent "The Find-Fix-Verify pattern has proven to produce reliable results since each stage exploits independent agreement to filter out potential low-quality answers from the crowd. In addition, this approach is efficient in terms of the number of questions asked to the paid microtask crowd, therefore the costs remain competitive with other crowdsourcing alternatives." assertion.
- paragraph hasContent "In scenarios in which crowdsourcing is applied to validate the results of machine computation tasks, question filtering relies on specific thresholds or historical information about the likelihood that human in- put will significantly improve the results generated algorithmically. Find-Fix-Verify addresses tasks that initially can be very complex (or very large), like in our case the discovery and classification of various types of errors in DBpedia." assertion.
- paragraph hasContent "The Find-Fix-Very pattern is highly flexible, since each stage can employ different types of crowds, as they require different skills and expertise [3]." assertion.
- section-number hasContent "3.1" assertion.
- section-3.1-title hasContent "Types of Crowdsourcing" assertion.
- section-number hasContent "3.1.1" assertion.
- section-3.1.1-title hasContent "Contest-based Crowdsourcing" assertion.
- section-number hasContent "3.1.2" assertion.
- section-3.1.2-title hasContent "Microtask Crowdsourcing" assertion.
- section-number hasContent "3.2" assertion.
- section-3.2-title hasContent "Crowdsourcing Pattern Find-Fix-Verify" assertion.
- section-number hasContent "3" assertion.
- section-3-title hasContent "Crowdsourcing Preliminaries" assertion.
- paragraph hasContent "Our work on human-driven Linked Data quality assessment focuses on applying crowdsourcing techniques to annotate RDF triples with their corresponding quality issue. Given a set of quality issues Q and a set T of RDF triples to be assessed, we formally define the annotation of triples with their corresponding quality issues as follows." assertion.
- definition hasContent "Definition 1" assertion.
- paragraph hasContent "Definition 1. (Problem Definition: Mapping RDF Triples to Quality Issues). Given T a set of RDF triples and Q a set of quality issues, a mapping of triples to quality issues is defined as a partial function φ : T → 2 Q . φ(t) denotes the quality issues associated with t ∈ T . In particular, when φ(t) 6 = ∅ the triple t is considered ‘incorrect’, otherwise it can be affirmed that t is ‘correct’." assertion.
- paragraph hasContent "In order to provide an efficient crowdsourcing solution to the problem presented in Definition 1, we applied a variation of the crowdsourcing pattern Find- Fix-Verify [3]. As discussed in Section 3, this crowdsourcing pattern allows for increasing the overall quality of the results while maintaining competitive monetary costs when applying other crowdsourcing approaches. Our implementation of the Find-Fix-Verify pattern is tailored to assess the quality of Linked Data sets that are automatically created from other sources. Such is the case of DBpedia [24], a data set created by extracting knowledge from Wikipedia via declarative mediator/wrapper pattern. The wrappers are the result of a crowdsourced community effort of contributors to the DBpedia project. To crowdsource the assessment of triples of data sets like DBpedia, we devise a two-fold approach including the following stages: Find and Verify. In the Find stage, the crowd was requested to detect LD quality issues in a set of RDF triples, and annotate them with the corresponding issue(s) if applicable. We define the Find Stage as follows:" assertion.
- paragraph hasContent "Definition 2. (Find Stage). Given a set T of RDF triples and a set Q of quality issues, the Find stage consists on crowdsourcing the mappings φ̇ : T → 2 Q . The input of the Find stage is represented as F i = (T , Q), and the output F o = (T , φ̇(T ))." assertion.
- paragraph hasContent "The outcome of this stage – triples judged as ‘incorrect’ – is then assessed in the Verify stage, in which the crowd confirms/denies the presence of quality issues in each RDF triple processed in the previous stage. We define the Verify Stage as follows:" assertion.
- paragraph hasContent "Definition 3. (Verify Stage). Given a set T of RDF triples and mappings φ̇(T ), the Verify stage consists on crowdsourcing mappings as follows φ̈ : φ̇(T ) → φ̇(T ). The input of the Verify stage is represented as, V i = (T , φ̇(T )) which corresponds to the output of the Find stage (V i = F o ), and the output of the Verify stage is represented as V o = (T , φ̈(T ))." assertion.
- paragraph hasContent "The Fix stage originally proposed in the Find-Fix-Verify pattern is out of the scope of this paper, since the main goal of this work is identifying quality issues. In addition, since our work is designed for data sets extracted automatically via wrappers, it is highly probable that the quality issues detected for a certain triple might also occur in the set of triples that were generated via the same wrapper. Therefore, a more efficient solution to implement the Fix stage could consist of adjusting the wrappers that caused the issue in the first place, instead of crowdsourcing the correction of each triple which increases the overall monetary cost." assertion.
- paragraph hasContent "In the implementation of the Find and Verify stages in our approach, we explore two different crowdsourcing workflows combining different types of crowds. The first workflow combines LD experts and microtask workers: This workflow leverages the expertise of Linked Data experts in a contest to find and classify erroneous triples according to a predefined quality taxonomy, while the workers verify the outcome of the contest. The second workflow entirely relies on microtask crowdsourcing to perform the Find and Verify stages. As discussed in Section 3, these crowdsourcing approaches exhibit different characteristics in terms of the types of tasks they can be applied to, the way the results are consolidated and exploited, and the audiences they target. Therefore, in this work we study the impact on involving different types of crowd to detect quality issues in RDF triples: LD experts in the contest and workers in the microtasks. Table 1 presents a summary of the two approaches as they have been used in this work for LD quality assessment purposes." assertion.
- paragraph hasContent "Figure 1 depicts the steps carried out in each of the stages of the two crowdsourcing workflows studied in this work. In the following sections, we provide more details about the implementation of the variants of the Find and Verify stages." assertion.
- paragraph hasContent "In this implementation of the Find stage, we reached out to an expert crowd of researchers and Linked Data enthusiasts via a contest. The tasks in the contest con- sist on identifying and classifying specific types of Linked Data quality problems in DBpedia triples. To collect the contributions from this crowd, in previous work [43], we developed a web-based tool called TripleCheckMate 6 (cf. Figure 2). TripleCheckMate [22] allows users to select RDF resources, identify issues related to triples of the resource and classify these issues according to a pre-defined taxonomy of data quality problems [44]. A prize was announced for the user submitting the highest number of (real) quality problems." assertion.
- paragraph hasContent "The workflow starts when a user signs into the TripleCheckMate tool to participate in the contest, as shown in Figure 1. As a basic means to avoid spam, each user first has to login with his Google account through OAuth2. Then she is presented with three op- tions to choose a resource from DBpedia: (i) ‘Any’, for random selection; (ii) ‘Per Class’, where she may choose a resource belonging to a particular class of her interest; and (iii) ‘Manual’, where she may pro- vide a URI of a resource herself. Once a resource is selected following one of these alternatives, the user is presented with a table in which each row corresponds to an RDF triple of that resource. The next step is the actual quality assessment at triple level. The user is provided with the link to the corresponding Wikipedia page of the given resource in order to offer more con- text for the evaluation. If she detects a triple containing a problem, she checks the box ‘Is Wrong’. Moreover, she assigns specific quality problems (according to the classification devised in [43]) to troublesome triples, as depicted in Figure 2. The user can assess as many triples from a resource as desired, or select another re- source to evaluate." assertion.
- paragraph hasContent "The TripleCheckMate tool only records the triples that are identified as ‘incorrect’. This is consistent with the definition of Find stage from the original Find-Fix- Verify pattern, where the crowd exclusively detects the problematic elements; while the remaining data is not taken into consideration. In addition, this tool mea- sures inter-rater agreements. This means that DBpedia resources are typically checked multiple times. This redundancy mechanism is extremely useful to analyze the performance of the users (as we compare their responses against each other), to identify quality problems which are likely to be real (as they are confirmed by more than one opinion) and to detect unwanted be- havior (as users are not ‘rewarded’ unless their assess- ments are ‘consensual’)." assertion.
- paragraph hasContent "The outcome of this contest corresponds to a set of triples T judged as ‘incorrect’ by LD experts and classified according to the detected quality issues in Q." assertion.
- section-number hasContent "4.1" assertion.
- section-4.1-title hasContent "Find Stage: Contest-based Crowdsourcing" assertion.
- paragraph hasContent "This Find stage applies microtasks solved by lay users from a crowdsourcing platform. In order to perform a fair comparison between the performance of LD experts and crowd workers, in this variant of the Find stage we aimed at implementing a similar workflow (including a similar user interface) for the crowd workers as the one provided to the LD experts. Therefore, in this stage the crowd is enquired for identifying quality issues on a set of RDF triples associated with RDF resources from the DBpedia data set. However, given that crowd workers are not necessarily knowledgeable about RDF or Linked Data, each microtask was augmented with human-readable information associated with the RDF triples. Formally, in our approach, a microtask is defined as follows." assertion.
- paragraph hasContent "Definition 4. (Microtask). A microtask m is a set of 3-tuples (t, h t , Q), where t is an RDF triple, h t corre- sponds to human-readable information that describes t, and Q is the set of quality issues to be assessed on triple t." assertion.
- paragraph hasContent "Following the MTurk terminology (cf. Section 3), each 3-tuple (t, h t , Q) corresponds to a question while m is a HIT (Human Intelligence Task) with granularity (number of questions) equals to |m|." assertion.
- paragraph hasContent "The execution of this stage, as depicted in Figure 1, starts by generating the microtasks from F i , i.e., the sets of RDF triples T and quality issues Q to crowdsource. In addition, a parameter α can be specified as a threshold on the number of questions to include in a single microtask. Algorithm 1 presents the procedure to create the microtasks. The algorithm firstly per- forms a pruning step (line 2) to remove triples that do not require human assessment. For instance, in our im- plementation, the function prune discards RDF triples whose URIs could not be dereferenced. The algorithm then proceeds to build microtasks such that each microtask only contains triples associated with a specific resource, similar to the interfaces of the TripleCheck-Mate tool used in the contest. The set S contains all the resources that appear as subjects in the set of triples T (line 3). For each subject, the algorithm builds the set of triples T 0 associated with the subject (line 5), and the creation of microtasks begins (line 6). From the pool T 0 , a triple t is selected (line 8) and the corresponding human-readable information is extracted (line 9). In this stage, similar to the TripleCheckMate, each microtask requires the workers to browse all the possible quality issues, therefore, the set of issues to assess on triple t is equal to Q in each microtask cre- ated (line 10). In case that the number of questions in the current microtask exceeds the threshold α, a new microtask is then created. The definition of the parameter α allows for avoiding the creation of very long tasks, i.e., when the number of triples with the same subject is large; appropriate values of α enables the creation of tasks than can still be solved in a reasonable time, consistent with the concept of microtask (a short task). Algorithm 1 continues creating microtasks for all the triples of a resource (lines 7-16), for all the resources (lines 4-18). The outcome of the algorithm is a set M of microtasks to assess the quality of the triples in T according to the issues in Q." assertion.
- paragraph hasContent "The generated microtasks are then submitted to the crowdsourcing platform. When a worker accepts a microtask or HIT, she is presented with a table that con- tains triples associated to an RDF resource, as shown in Figure 1. For each triple, the worker determines whether the triple is ’incorrect’ with respect to a fixed set of quality issues Q (cf. Section 2): object incorrectly/incompletely extracted, datatype incorrectly extracted or incorrect link, abbreviated as ‘Value’, ‘datatype’, and ‘Link’, respectively. Once the worker has assessed all the triples within a microtask, she proceeds to submit the HIT. Consistently with the Find stage implemented with a contest, the outcome of the microtasks corresponds to a set of triples T judged as ‘incorrect’ by workers and classified according to the detected quality issues in Q." assertion.
- figure hasContent "Figure 3" assertion.
- footnote hasContent "Footnote 8" assertion.
- paragraph hasContent "An important aspect when generating microtasks from RDF data (or machine-readable data in general) is developing useful human-understandable interfaces (Algorithm 1, line 9) for the target non-expert crowds. In microtasks, optimal user interfaces reduce ambiguity as well as the probability to retrieve erroneous answers from the crowd due to a misinterpretation of the task. Therefore, before start to resolve one of our tasks, the crowd workers were instructed with details and examples about each quality issue. After reading the instructions, workers proceed to resolve the given task. Figure 3 depicts the interface of a microtask generated for the Find stage in our approach. To display each triple, we retrieved the values of the foaf:name or rdfs:label properties for subjects, predicates, and datatypes. The name of languages in language-tagged strings were parsed using a conversion table from the best current practices BCP 47 [7], as suggested by the RDF specification 7 . Language tags and datatypes of objects were highlighted, such that workers can easily identify them [Footnote 8] . Additionally, in order to provide contextual information, we implemented a simple wrapper which extracts the corresponding data encoded in the infobox of the Wikipedia article associated with the resource – specified via foaf:isPrimaryTopicOf . The crowd has the possibility to select one or several quality issues per triple." assertion.
- paragraph hasContent "Further microtask design criteria related to spam detection and quality control; we used different mechanisms to discourage low-effort behavior which leads to random answers and to identify accurate answers (see Section 5.2.2)." assertion.
- section-number hasContent "4.2" assertion.
- section-4.2-title hasContent "Find Stage: Paid Microtask Crowdsourcing" assertion.
- paragraph hasContent "In this stage, we applied microtask crowdsourcing in order to verify quality issues in RDF triples identified as problematic during the Find Stage (see Figure 1). To ensure that in this stage a proper validation is executed on each triple, the microtasks are simplified with respect to the ones from the Find stage: (i) each microtask focuses on a specific quality issue, (ii) the number of triples per microtask is reduced." assertion.
- paragraph hasContent "The generation of microtasks in this stage is presented in Algorithm 2. This algorithm groups the triples in T obtained from the previous stage by quality issue, which enables the workers to focus on one quality issue at the time. The input of this stage is the set of triples to assess T and their mappings to quality issues φ̇(.). The parameter β specifies the number of questions to include in a single microtask. The algorithm firstly performs a pruning step (line 2) to remove certain triples. For instance, a triple t that was considered ‘correct’ in the Find stage ( φ̇(t) = ∅) is discarded, consistently with the definition of the Find-Fix-Verify pattern [3]. Also, in our implementation, the function prune discards answers whose inter-rater agreement were not higher than a certian value. The algorithm then proceeds to build microtasks such that each microtask only contains triples associated with a specific quality issue. For each answer from the previous stage, the algorithm decomposes the set of quality issues φ̇(t) of a triple t into singletons (lines 3-7). The set Q contains all the quality issues present in the set of triples T (line 8). For each quality issue q (line 9), the algorithm processes all triples associated with that quality issue (lines 11-18). The algorithm extracts human-readable information about the triples (line 12) and appends it to the microtask (line 13). In case that the number of questions in the current microtask exceeds the threshold β, a new microtask is then created. The outcome of the algorithm is a set M of microtasks to assess the quality of the triples in T according to the issues iden- tified in the Find stage ( φ̇(.))." assertion.
- paragraph hasContent "Based on the classification of LD quality issues explained in Section 2, we created three different inter- faces for the microtasks. Each microtask contains the description of the procedure to be carried out to complete the task successfully. We provided the worker examples of incorrect and correct examples along with four options (as shown in Figure 1): (i) ‘Correct’; (ii) ‘Incorrect’; (iii) ‘I cannot tell/I don’t know’; (iv) ‘Data doesn’t make sense’. The third option was meant to allow the user to specify when the question or values were unclear. The fourth option referred to those cases in which the presented data was truly unintelligible. Furthermore, the workers were not aware that the presented triples were previously identified as ‘incorrect’ in the Find stage and the questions were designed such that workers could not foresee the right answer. We describe the particularities of the interfaces of the microtask generated for the Verify stage in the following." assertion.
- paragraph hasContent "Incorrect/incomplete object value. In this type of microtask, we asked the workers to evaluate whether the value of a given RDF triple from DBpedia is correct or not. We displayed human-readable information retrieved by dereferencing the URIs of the subject and predicate of the triple. In particular, we selected the values of the foaf:name or rdfs:label properties for each subject and predicate. Additionally, we extracted the values from the infobox of the Wikipedia article associated with the subject of the triple using the wrapper implemented in the Find stage (cf. Section 4.2). Figure 4 depicts the interface of the resulting tasks." assertion.
- paragraph hasContent "In the task presented in Figure 4a, the worker must decide whether the place of birth of “Rodrigo Salinas” is correct. According to the DBpedia triple, the value of this property is Puebla F.C , while the information extracted from Wikipedia, suggests that the right value is Apizaco . Therefore, the right answer to this tasks is: the DBpedia data is incorrect." assertion.
- paragraph hasContent "An example of a DBpedia triple whose value is cor- rect is depicted in Figure 4b. In this case, the worker must analyze the date of birth of “Elvis Presley”. According to the information extracted from Wikipedia, the date of birth of Elvis Presley is January 8, 1935 , while the DBpedia value is 1935-01-08 . Despite the dates are represented in different formats, semantically the dates are indeed the same, thus the DBpedia value is correct." assertion.
- paragraph hasContent "Incorrect datatypes & language tags. This type of microtask consists of detecting those DBpedia triples whose object datatype or language tags were not correctly assigned. The generation of the interfaces for these tasks was very straightforward, by dereferencing the URIs of the subject and predicate of each triple and displaying the values for the foaf:name or rdfs:label ." assertion.
- paragraph hasContent "In the description of the task, we introduced the crowd the concept of data type of a value and provided two simple examples. The first example illustrates when the language tag ( rdf:langString ) is incorrect while analyzing the entity “Torishima Izu Islands”: Given the property “name”, is the value “鳥島” of type “English”? A worker does not need to understand that the name of this island is written in Japanese, since it is evident that the language type “English” in this example is incorrect. In a similar fashion, we provided an example where the language tag is assigned correctly by looking at the entity “Elvis Presley”: Given the property “name”, is the value “Elvis Presley” of type “English”? According to the information from DBpedia, the value of the name is written in English and the type is correctly identified as English." assertion.
- paragraph hasContent "Incorrect links. In this type of microtask, we asked the workers to verify whether the content of the external page referenced from the Wikipedia article corresponds to the subject of the RDF triple. For the interface of the HITs, we provided the worker a preview of the Wikipedia article and the external page by implementing HTML iframe tags. In addition, we retrieved the foaf:name of the given subject and the link to the corresponding Wikipedia article using the predicate foaf:isPrimaryTopicOf ." assertion.
- figure hasContent "Figure 5" assertion.
- paragraph hasContent "Examples of this type of task are depicted in Figure 5. In the first example (see Figure 5a), the workers must decide whether the content in the given external web page is related to “John Two-Hawks”. It is easy to observe that in this case the content is not directly associated to the person “John Two-Hawks”. Therefore, the right answer is that the link is incorrect. On the other hand, we also exemplified the case when an interlink presents relevant content to the given subject. Consider the example in Figure 5b, where the subject is the plant “Pandanus boninensis” and the external link is a web page generated by the DBpedia Flickr wrapper. The web page indeed shows pictures of the subject plant. Therefore, the correct answer is that the link is correct." assertion.
- section-number hasContent "4.3" assertion.
- section-4.3-title hasContent "Verify Stage: Paid Microtask Crowdsourcing" assertion.
- paragraph hasContent "Given that the contest settings are handled through the TripleCheckMate tool, in this section we expose the properties of the proposed microtask crowdsourcing approaches. First, we demonstrate that the algorithms for microtask generation in the Find and Verify stages are efficient in terms of time." assertion.
- paragraph hasContent "Proposition 1. The time complexity of the microtask generators is O(|T |) for the Find stage and O(|T ||Q|) for the Verify stage." assertion.
- paragraph hasContent "Proof. The algorithm of the Find stage iterates over all the triples associated with each distinct triple subject in T , therefore the complexity of this stage is O(|T |). In the Verify stage, the algorithm firstly iterates over the answers obtained from the previous stage, which corresponds to T . Next, the algorithm iterates over the quality issues detected in the Find stage; in the worst case, each quality issue is found in at least one triple, then, the set Q 0 is equal to Q. For each quality is- sue, the algorithm processes the triples annotated with that quality issue, which again in the worst case is T (all the triples present all the quality issues). There- fore, the complexity of the Find stage is calculated as O(|T | + |T ||Q|), then O(|T ||Q|). " assertion.
- paragraph hasContent "One important aspect when applying paid microtask crowdsourcing is the number of generated tasks, since this determines the overall monetary cost. The following theorem states the efficiency of Algorithms 1 and 2 in terms of the number of crowdsourced microtasks." assertion.
- proposition hasContent "Proposition 2" assertion.
- paragraph hasContent "Proposition 2. The number of microtasks generated in each stage is linear with respect to the number of triples assessed." assertion.
- paragraph hasContent "Proof. In the Find stage, a microtask is generated when the number of triples within task exceeds the threshold α. Since in this stage each microtask groups triples by subjects, then l the number of microtaks m i ,p,o)∈T }| per subject is given by |{(p,o)|(s α , where {(p, o)|(s i , p, o) ∈ T } corresponds to triples with sub- ject s i . In total, in the Find stage, l the exact number m of P i ,p,o)∈T }| microtasks generated is s i ∈S |{(p,o)|(s α , which is less than |T | (for α >1). In the Verify stage, each microtask groups RDF triples with the same quality issue. When considering β as the maximum number of triples contained within a microtask, then the number l of microtasks created per quality issue m |{t|t∈T ∧ q i ∈ φ̇(t)}| . Therefore, the exact q i ∈ Q is β number l of microtasks generated in the Verify stage is m P |{t|t∈T ∧ q i ∈ φ̇(t)}| , which is ≤ |T ||Q|. Con- q i ∈Q β sidering that the set Q is considerably smaller than T , we can affirm that the number of microtasks generated in the Verify stage is linear with respect to T ." assertion.
- paragraph hasContent "When analyzing the number of microtasks gener- ated in each stage, the Verify stage seems to produce more tasks than the Find stage. This is a consequence of simplifying the difficulty of the microtasks in the Verify stage, where workers have to assess only one type of quality issue at the time. However, in practice, the number of microtasks generated in the Verify stage is not necessarily larger. For instance, in our experiments with LD experts and crowd workers, we ob- served that large portions of the triples are not anno- tated with quality issues in the Find stage. Since Algorithm 2 prunes triples with no quality issues (consistently with the definition of the Find-Fix-Verify pattern), the subset of triples crowdsourced in the Verify stage is considerably smaller than the original set, hence the number of microtasks to verify is reduced." assertion.
- paragraph hasContent "A summary of our microtask crowdsourcing approach implemented for the Find and Verify stages is presented in Table 2." assertion.
- section-number hasContent "4.4" assertion.
- section-4.4-title hasContent "Properties of Our Approach" assertion.
- section-number hasContent "4" assertion.
- section-4-title hasContent "Our Approach: Crowdsourcing Linked Data Quality Assessment" assertion.