Difference between revisions of "CSE503 SoftwareIdol"

From PublicWiki
Jump to: navigation, search
(Project Ideas)
Line 22: Line 22:
 
'''Options:'''
 
'''Options:'''
 
* Come up with a reasonable way of estimating the "cost" of an OSS project; maybe do some metrics.
 
* Come up with a reasonable way of estimating the "cost" of an OSS project; maybe do some metrics.
* Come up with a reasonable way of defining "success"; analyze how a few not-too-deep metrics, such as size, or churn, or {whatever?} correspond to that split.
+
* Come up with a reasonable way of defining "success"; analyze how a few not-too-deep metrics, such as size, or churn, or (whatever?) correspond to that split.
 
* Take ratings or popularity, and measure some harder approaches (like Englishyness and churn) to try to come up with a correlation.
 
* Take ratings or popularity, and measure some harder approaches (like Englishyness and churn) to try to come up with a correlation.
 
* Take the easiest-to-measure aspects, get a good data set (ideally within a domain), and train a classifier using Weka or a similar toolkit; look at most-relevant features for something human-readable. http://en.wikipedia.org/wiki/Weka_(machine_learning)
 
* Take the easiest-to-measure aspects, get a good data set (ideally within a domain), and train a classifier using Weka or a similar toolkit; look at most-relevant features for something human-readable. http://en.wikipedia.org/wiki/Weka_(machine_learning)
Line 31: Line 31:
 
== Cost of an OSS project ==
 
== Cost of an OSS project ==
  
A great advantage of Open-source software over a commercial software is the price. Users get these software for free, but there is some work that was put in to these great products. What are the actual cost of making these software?
+
A great advantage of Open-source software over a commercial software is the price. Users get the software for free, but there is some work that was put in to these great products. What are the actual costs of making these software?
 +
 
 +
Ohloh has an "estimated number of person-years" and associated estimated cost.
  
 
=== Quotes ===
 
=== Quotes ===

Revision as of 20:13, 5 February 2008

Project Ideas

We want to get involved in comparing OSS projects; there are a few different aspects of them that we could compare, and a few different metrics we could (try to) use for actually doing the comparison.

After meeting with Dr. Notkin (1/29), we're currently focusing on making a list of software and software sites that provide ratings and download frequencies. We'll try running easy metrics against that and then reconvene.

  • Aspects we could compare, from hard to easy:
    • Cost of building/maintaining
    • Successfulness
    • Popularity
    • Ratings
  • Metrics we could use:
    • Machine-learning of differentiating features (training a classifier)
    • "Englishyness"
    • Size (lines of code, number of revisions, etc)
    • Churn (rate of changes over time given repository change tracking)
    • Number of contributors
    • Other (many)

We probably want to combine a hard aspect with an easy metric or vice versa.

Options:

  • Come up with a reasonable way of estimating the "cost" of an OSS project; maybe do some metrics.
  • Come up with a reasonable way of defining "success"; analyze how a few not-too-deep metrics, such as size, or churn, or (whatever?) correspond to that split.
  • Take ratings or popularity, and measure some harder approaches (like Englishyness and churn) to try to come up with a correlation.
  • Take the easiest-to-measure aspects, get a good data set (ideally within a domain), and train a classifier using Weka or a similar toolkit; look at most-relevant features for something human-readable. http://en.wikipedia.org/wiki/Weka_(machine_learning)
    • A special case: Split a few code bases temporally and try to use classification of the repository for the *first year* of a project to relate to the success/cost/etc *eventually* (so as to come up with a predictor)

Also, this isn't a strict split, so: compare one or more aspect against one another - how does cost relate to ratings and popularity? How does popularity relate to success?

Cost of an OSS project

A great advantage of Open-source software over a commercial software is the price. Users get the software for free, but there is some work that was put in to these great products. What are the actual costs of making these software?

Ohloh has an "estimated number of person-years" and associated estimated cost.

Quotes

"Indeed, as we have repeatedly emphasized, the Internet is the primary enabler of the OSS development and distribution process, making it possible for widely distributed groups to share ideas and software extremely quickly at negligible cost." Understanding Open Source Software Development By Joseph Feller, Brian Fitzgerald

"But open source is a low-cost way of increasing the opportunity for surprise." Lessons from Open Source software development, Tim O'Reilly 1999

Success of Open Source Project

Most downloaded on sourceforge: http://sourceforge.net/top/topalltime.php?type=downloads

Papers

Defining Open Source Software Project Success, Kevin Crowston, Hala Annabi, and James Howison, 2003 http://floss.syr.edu/publications/icis2003success.pdf This paper identify a range of measures that can be used to assess the success of open source software (OSS) projects.

Information Systems Success in Free and Open Source Software Development: Theory and Measures http://floss.syr.edu/publications/crowston2006flossSuccessSPIPpre-print.pdf

Useful Links

Motivations of open-source developers:

Working for Free? Motivations for Participating in Open-Source Projects
http://mesharpe.metapress.com/media/fc22ht5ywp6vyndknrar/contributions/e/e/p/d/eepdf96rnt0geahv.pdf

Why Open Source software can succeed:
http://opensource.mit.edu/papers/rp-bonaccorsirossi.pdf

Case Studies: A Case Study of Open Source Software Development: The Apache Server
http://conway.isri.cmu.edu/~jdh/collaboratory/research_papers/apachefinal3.pdf