Measuring Editor Collaborativeness With Economic Modelling

##Max Klein [@notconfusing](https://twitter.com/notconfusing) ##Wikimania 2014

Audience Poll

+ How many people here are: + wikipedia researchers? + familiar with network/graph theory? + sort of understand pagerank?

New Editors

  • Leave when they encounter uncooperative Wikipedians

Suggestions Need Input

+ I wanted to make an input-less suggester + So new editors can browse it.

What To Show New Editors?

  • Can we fit them into the more functional parts of Wikipedia?

Collaboration

+ What is collaboration on a Wikipedia page? + How do you measure it?

Collaboration

  • How more contributing editor exeperience effect article quality?
    • More experience not neccessarily better.

Collaboration Flipside

+ File this away for later + How does editing more articles effect editor expertise?

Principle

  • They both use an network science algorithm on a bi-partite graph, to rank countries economic perfomance.

Key Insight

+ Lower GDPs + Just Agriculture + Ubiquitous products only + Switzerland + Agriculture & Watches + Ubiquitous and Rare products

So what?

  • Infer the GDP rankings of the world economy just by knowing
    • Which countries
    • export which products
      • not even the quantities of the exports

It's notconfusing

  • "Unlike laws and sausages, those who like Wikis and Tofu should inquire into how they are being made."

Bipartite Network

+ A bi-partite network is where there are two distinct types of nodes in a graph. + In this case, countries and products.

Basically, it's the Page Rank Algroithm

  • Except we have two node-types
  • And an extra variable for improtance of highly connected nodes
    • I'll explain more later

Lay terms

  • If you know who exports what
    • Then you can rank Countries (In economics)

Translations

  • Instead of
    • countries exporting products
  • What about
    • editors writing articles

Translations

  • Instead of
    • rich countries

Translations

  • Instead of
    • rich countries
  • What about
    • super users

Translations

  • Instead of
    • ubiqitous products

Translations

  • Instead of
    • ubiqitous products
  • What about
    • highly edited articles

Translations

  • Instead of
    • global economy
  • What about
    • a wikipedia category

Editor Article Matrix

  • It's Triangular
    • the power users are editing most of the articles in the category. Feminst Writers

Iterative Algorithm

+ A nonmathematical explanation: + Imagine everyone in the room starts with £1 + Distribute your money evenly to all your friends + Round 2, some people may have more or less than £1 + but again distribute all your money evenly to all your friends. + Repeat over and over again. + Eventually converges.

http://www.scottaaronson.com/blog/?p=1820

Iterative Algorithm - One More Variable

  • Same scenario as above except:
    • You don't distribute your money evenly
    • You can give your popular friends disproportionately larger percentage:
      • or disproportionately less.

Iterative Algorithm - Notation

  • In this experiement those are controlled by
    • \(\alpha\) (article popularity exponent) and
    • \(\beta\) (editor portfolio size exponent) levels.

Editors rise and fall over time

convergence

convergence

End Result of Algorithm

+ A ranking for Editors + A ranking for Articles

Exogenous Rankings

+ Getting unrelated metrics for: + Editors + Articles

Exogenous Editor Rankings

  • Edit count bad
  • Use @halfak and @staeiou [__"Labour Hours"__](http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf)
    • Labour Hours: Sum of Edit Sessions
    • Edit Session: The start and end times of all the edits that occur within 1 hour of another edit.

Exogenous Article Rankings

+ Mix of: 1. ratio of mark-up to readable text 1. number of headings 1. article length 1. citations per article length 1. outgoing intrawiki links.

Calibration

  • Find the values of \(\alpha\) and \(\beta\) which maximize:
    • The rank correlation between model and exogenous rankings

Calibration on Feminist Writers

High Correlations

  • We find correlations around
    • .6 to .9
  • Even better than the Economics GDP papers around
    • .4

Snapshotting

  • Took 13 Snapshots of each Category

Rank Accuracy

  • This really works...
  • Increases over time

Most collaborative

  • Question: in which category do power editors improve article quality?
    1. American male novelists
    2. 2013 films
    3. American women novelists
    4. Nobel Peace Prize laureates
    5. Sexual acts
    6. Economic theories
    7. Feminist writers
    8. Yoga
    9. Military history of the US
    10. Counterculture festivals
    11. Computability theory
    12. Bicycle parts

Most collaborative

  • Question: in which category do power editors improve article quality?
    1. Military history of the US

Least collaborative

  • Question: in which category do power editors hurt article quality?
    1. American male novelists
    2. 2013 films
    3. American women novelists
    4. Nobel Peace Prize laureates
    5. Sexual acts
    6. Economic theories
    7. Feminist writers
    8. Yoga
    9. Military history of the US
    10. Counterculture festivals
    11. Computability theory
    12. Bicycle parts

Least collaborative

  • Question: in which category do power editors hurt article quality?
    1. Sexual acts

Full Category Rankings

Edit Count or Touches

Forest Not Trees

  • If you accept this \(\beta\) measure as a collaborativeness measure how can we use it?

Detect dysfunction

  • For learning
    • Arguing is not neccessarily bad.
  • For intervention?

Detect Where The Wiki is Working

  • At least where your time invested relates to article quality
    • Even superlinearly so

A Potential Use

  • Make a carousel of friendly places for new users

Measuring Editor Collaborativeness With Economic Modelling

##Max Klein [@notconfusing](https://twitter.com/notconfusing) ##Wikimania 2014