Crowdsourcing Satellite Imagery (part 2): Iterative vs. parallel human organisations

In Crowdsourcing Satellite Imagery (part 1) I discussed about the use of a parallel model to organise people in the analysis of satellite imagery , its redundancy property and some meritocratic technics to produce quality data. Here I talk about the qualitative and quantitative differences of 2 organisational models: parallel vs iterative models (wikipedia’s style) written for the next GISscience2012 conference

note: you can find the slides + pdf at the end of the post,


This work is based on the influence of two domains: human computation which is a paradigm for utilizing human processing power to solve problems that computers cannot solve, and the domain of collective problem solving emphasizing a collective process to solve problems.

The general idea here is to apply well known algorithmic process in computer science , but in the context of human organisations: iterative process vs parallel process of information.

Parallel vs. Iterative human organisations

In a parallel model, a set of volunteers performs independently the same task and an aggregation function is used to generate a collective output.

In an iterative model, a chain of volunteers is used to iteratively improve the work of previous workers (Wikipedia’s

On the properties on each model

Qualitative differences

Problem divisibility

  • The nature of the problem and its divisibility can restrict the choice of one approach over the other. In the parallel model each participant solves the problem independently and thus alone. A problem too complex to be solved by one person should then be divided in easier pieces.
  • Such constraint is not present in the case of an iterative model: the whole complexity of the problem can be presented at once. One volunteer can start but not complete the problem, and next participants improve the result.

Diffusion of information: exploration / exploitation trade-off

  • A common issue on collective problem solving [2][1] is the exploration-exploitation trade-off emerging from the structure of the organization. Networked organizations like iterative models can benefit from the experience of others via the diffusion of knowledge. But exploiting previously discovered solutions can lead to a premature convergence on suboptimal solutions.
  • On the other hand, in the parallel model, individuals are unable to copy one another, leading to a broader exploration in the search space and thus generating a greater diversity of solutions.

Mechanism enforcing quality

  • The concept of wisdom of crowd lies on the empirical evidence that the aggregation of diverse independently-deciding individuals is likely to make certain types of decisions and predictions better than those of a few experts. Thus, an unbiased approach like the parallel model better supports this property than the iterative model. However, the critical question about the aggregation of individual answer remains to be considered.
  • The iterative model integrates more naturally the notion of improvement, but it is very sensitive to vandalism (e.g. spamming in Wikipedia). Furthermore, as discussed above, the social influence can impact negatively the collective output [3] due to the path dependency effect [4]: once past decisions have become sufficiently informative, later members simply copy those around
    them.

task and effort
In the human computation field, [5] categorizes the nature of the tasks according to two types: generation of information, and the evaluation and selection of information. For the parallel model the human effort (with potentially the task of aggregating annotation) are related to creation tasks,whereas the iterative model also enables the reviewing. Thus, the effort required can be different: starting from scratch to produce an output requires a priori more effort than reviewing or improving a previous result.

Experiment with MechanicalTurk

Read the paper (pdf) for more information about the experiment with MechanicalTurk as low-cost and cheap human simulator . I report here only the output/findings.

Parallel model:

  • Linus’law, studied for OpenStreetMap (see this blog), an iterative model, has a limited validity in the parallel model: after a given threshold, adding more volunteers will not change the representativeness of opinion and thus will not change the consensual output.
  • Furthermore we showed that varying the decision threshold in the voting process is a factor impacting signi cantly the global quality (F-measure). This threshold should be choosen carefully, especially regarding any bias at the individual level. In our case applying the majority rule produces sub-optimal performance due to such a common bias.

iterative model:

  • We observed that the first iterations have a high impact on the final results due to a path dependency e ffect: stronger commitment during the first steps are thus a primary concern for using such model (asking expert/committed
    users to start).

On the performance on each model

We investigated the quality of both organisational model according to two aspects: the accuracy (type I and type II errors) and consistency of the results.We concluded the following:

Accuracy – type I errors:

The parallel strategy, generating only consensual results, corrects type I errors (wrong annotations) more signi cantly than the iterative model. However in dicult areas (e.g. map 3), it does not mitigate well disagreements. Thanks to the accumulation of knowledge, the iterative model is thus more approprieted to handle ambiguous cases, or problems being hardly divisible in smaller and easier tasks participants will perform better than a parallel model when ambigious cases are considered to migitigate decision). So the iterative model outperforms the parallel one for dicult/complex areas, but with a potential path dependency e ect: mistakes could be propagated, generating
more easily type I errors as the iterations proceed.

Accuracy – type II errors

We observed that the iterative model reduces type II errors (the spatial coverage) from one iteration to the next. It outperforms the parallel model due to the accumulation of knowledge, enabling next users to focus their attention on `fresh’ areas. The lower spatial coverage that is usually seen with the parallel model is due to the nature of the strategy: due to the independence of the work, the nth volunteer might well annotate for the nth times the same obvious building, without bringing new information at the collective level. This results in a waste of time for the volunteer and the community.

About the consistency of the result

The parallel model provides an output which is more reliable than that of a basic iterative. The reason is that the latter is sensitive to vandalism or knowledge destruction.

Future work

Other strange organisations e.g. Human Automata cellular could be experimented and compared to these organisations.

Slides

References

  • [1] – David Lazer and Allan Friedman. The network structure of exploration and exploitation. Administrative Science Quarterly, 52(4):667–694, 2007
  • [2] – Christina Fang, Jeho Lee, and Melissa A Schilling. Balancing exploration and exploitation through structural design: The isolation of subgroups and organization learning. Organization Science, 21(3):625–642, 2010.
  • [3] – Jan Lorenz, Heiko Rauhut, Frank Schweitzer, and Dirk Helbing. How social in- fluence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences of the United States of America, 108(22):9020–9025, 2011.
  • [4] – Massimo Egidi and Alessandro Narduzzo. The emergence of path-dependent be- haviors in cooperative contexts. International Journal of Industrial Organization, 15(6):677 – 709, 1997.
  • [5] – Thomas W Malone, Robert Laubacher, and Chrysanthos Dellarocas. Harnessing crowds: Mapping the genome of collective intelligence. MIT Center for Collective Intelligence 2009

Download

PDF file

Bibtex (old school link/hypertext)

@incollection {springerlink:10.1007/978-3-642-33024-7_9,
author = {Maisonneuve, Nicolas and Chopard, Bastien},
affiliation = {Computer Science Department, University of Geneva, Switzerland},
title = {Crowdsourcing Satellite Imagery Analysis: Study of Parallel and Iterative Models},
booktitle = {Geographic Information Science},
series = {Lecture Notes in Computer Science},
editor = {Xiao, Ningchuan and Kwan, Mei-Po and Goodchild, Michael and Shekhar, Shashi},
publisher = {Springer Berlin / Heidelberg},
isbn = {978-3-642-33023-0},
keyword = {Computer Science},
pages = {116-131},
volume = {7478},
url = {http://dx.doi.org/10.1007/978-3-642-33024-7_9},
note = {10.1007/978-3-642-33024-7_9},
year = {2012}
}



Comments are closed.