Difference between revisions of "CSE503 SoftwareIdol"

From PublicWiki
Jump to: navigation, search
(Current Work)
Line 33: Line 33:
 
=== Possible Software ===
 
=== Possible Software ===
  
[Ohloh|http://www.ohloh.net] seems to be a good source.  An example of two pieces of media software:
+
[http://www.ohloh.net Ohloh] seems to be a good source.  An example of two pieces of media software listings:
  
http://www.ohloh.net/projects/3975?p=VLC
+
*[http://www.ohloh.net/projects/3975?p=VLC VLC on Ohloh]
http://www.ohloh.net/projects/3137?p=guliverkli
+
*[http://www.ohloh.net/projects/3137?p=guliverkli guliverkli on Ohloh]
 +
 
 +
=== Ratings Sources ===
 +
 
 +
* [http://www.ohloh.net Ohloh] provides average rating, downloads, and [http://www.ohloh.net/about/faq#stack stacks]
 +
* [http://freshmeat.net Freshmeat] provides [http://freshmeat.net/stats/#rating rating], [http://freshmeat.net/stats/#vitality vitality], and [http://freshmeat.net/stats/#popularity popularity]
 +
* [http://fileforum.betanews.com Betanews] provides rating and downloads
 +
* [http://www.download.com/ Download.com] provides average editor ratings, user ratings, and downloads
 +
* [http://www.softsea.com/ Softsea] provides ratings
 +
* [http://www.snapfiles.com/ Snapfiles] provides editor ratings and user ratings
 +
 
 +
=== More on Metrics ===
 +
 
 +
'''Search hits:''' Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings.  eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]").
  
 
== OSS Resources ==
 
== OSS Resources ==

Revision as of 20:37, 5 February 2008

Project Ideas

We want to get involved in comparing OSS projects; there are a few different aspects of them that we could compare, and a few different metrics we could (try to) use for actually doing the comparison.

  • Aspects we could compare, from hard to easy:
    • Cost of building/maintaining
    • Successfulness
    • Popularity
    • Ratings
  • Metrics we could use:
    • Machine-learning of differentiating features (training a classifier)
    • "Englishyness"
    • Size (lines of code, number of revisions, etc)
    • Churn (rate of changes over time given repository change tracking)
    • Number of contributors
    • Other (many)

We probably want to combine a hard aspect with an easy metric or vice versa.

Options:

  • Come up with a reasonable way of estimating the "cost" of an OSS project; maybe do some metrics.
  • Come up with a reasonable way of defining "success"; analyze how a few not-too-deep metrics, such as size, or churn, or (whatever?) correspond to that split.
  • Take ratings or popularity, and measure some harder approaches (like Englishyness and churn) to try to come up with a correlation.
  • Take the easiest-to-measure aspects, get a good data set (ideally within a domain), and train a classifier using Weka or a similar toolkit; look at most-relevant features for something human-readable. http://en.wikipedia.org/wiki/Weka_(machine_learning)
    • A special case: Split a few code bases temporally and try to use classification of the repository for the *first year* of a project to relate to the success/cost/etc *eventually* (so as to come up with a predictor)

Also, this isn't a strict split, so: compare one or more aspect against one another - how does cost relate to ratings and popularity? How does popularity relate to success?

Current Work

We're currently focusing popularity, as measured by ratings and downloads. As of 2/5, we're making a list of media-related software and running easy metrics against that.

Possible Software

Ohloh seems to be a good source. An example of two pieces of media software listings:

Ratings Sources

More on Metrics

Search hits: Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings. eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]").

OSS Resources

Cost of an OSS project

A great advantage of Open-source software over a commercial software is the price. Users get the software for free, but there is some work that was put in to these great products. What are the actual costs of making these software?

Ohloh has an "estimated number of person-years" and associated estimated cost.

Quotes

"Indeed, as we have repeatedly emphasized, the Internet is the primary enabler of the OSS development and distribution process, making it possible for widely distributed groups to share ideas and software extremely quickly at negligible cost." Understanding Open Source Software Development By Joseph Feller, Brian Fitzgerald

"But open source is a low-cost way of increasing the opportunity for surprise." Lessons from Open Source software development, Tim O'Reilly 1999

Success of Open Source Project

Most downloaded on sourceforge: http://sourceforge.net/top/topalltime.php?type=downloads

Papers

Defining Open Source Software Project Success, Kevin Crowston, Hala Annabi, and James Howison, 2003 http://floss.syr.edu/publications/icis2003success.pdf This paper identify a range of measures that can be used to assess the success of open source software (OSS) projects.

Information Systems Success in Free and Open Source Software Development: Theory and Measures http://floss.syr.edu/publications/crowston2006flossSuccessSPIPpre-print.pdf

Useful Links

Motivations of open-source developers:

Working for Free? Motivations for Participating in Open-Source Projects
http://mesharpe.metapress.com/media/fc22ht5ywp6vyndknrar/contributions/e/e/p/d/eepdf96rnt0geahv.pdf

Why Open Source software can succeed:
http://opensource.mit.edu/papers/rp-bonaccorsirossi.pdf

Case Studies: A Case Study of Open Source Software Development: The Apache Server
http://conway.isri.cmu.edu/~jdh/collaboratory/research_papers/apachefinal3.pdf