Difference between revisions of "CSE503 SoftwareIdol"
(→Current Work) |
|||
Line 33: | Line 33: | ||
=== Possible Software === | === Possible Software === | ||
− | [ | + | [http://www.ohloh.net Ohloh] seems to be a good source. An example of two pieces of media software listings: |
− | http://www.ohloh.net/projects/3975?p=VLC | + | *[http://www.ohloh.net/projects/3975?p=VLC VLC on Ohloh] |
− | http://www.ohloh.net/projects/3137?p=guliverkli | + | *[http://www.ohloh.net/projects/3137?p=guliverkli guliverkli on Ohloh] |
+ | |||
+ | === Ratings Sources === | ||
+ | |||
+ | * [http://www.ohloh.net Ohloh] provides average rating, downloads, and [http://www.ohloh.net/about/faq#stack stacks] | ||
+ | * [http://freshmeat.net Freshmeat] provides [http://freshmeat.net/stats/#rating rating], [http://freshmeat.net/stats/#vitality vitality], and [http://freshmeat.net/stats/#popularity popularity] | ||
+ | * [http://fileforum.betanews.com Betanews] provides rating and downloads | ||
+ | * [http://www.download.com/ Download.com] provides average editor ratings, user ratings, and downloads | ||
+ | * [http://www.softsea.com/ Softsea] provides ratings | ||
+ | * [http://www.snapfiles.com/ Snapfiles] provides editor ratings and user ratings | ||
+ | |||
+ | === More on Metrics === | ||
+ | |||
+ | '''Search hits:''' Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings. eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]"). | ||
== OSS Resources == | == OSS Resources == |
Revision as of 20:37, 5 February 2008
Contents
Project Ideas
We want to get involved in comparing OSS projects; there are a few different aspects of them that we could compare, and a few different metrics we could (try to) use for actually doing the comparison.
- Aspects we could compare, from hard to easy:
- Cost of building/maintaining
- Successfulness
- Popularity
- Ratings
- Metrics we could use:
- Machine-learning of differentiating features (training a classifier)
- "Englishyness"
- Size (lines of code, number of revisions, etc)
- Churn (rate of changes over time given repository change tracking)
- Number of contributors
- Other (many)
We probably want to combine a hard aspect with an easy metric or vice versa.
Options:
- Come up with a reasonable way of estimating the "cost" of an OSS project; maybe do some metrics.
- Come up with a reasonable way of defining "success"; analyze how a few not-too-deep metrics, such as size, or churn, or (whatever?) correspond to that split.
- Take ratings or popularity, and measure some harder approaches (like Englishyness and churn) to try to come up with a correlation.
- Take the easiest-to-measure aspects, get a good data set (ideally within a domain), and train a classifier using Weka or a similar toolkit; look at most-relevant features for something human-readable. http://en.wikipedia.org/wiki/Weka_(machine_learning)
- A special case: Split a few code bases temporally and try to use classification of the repository for the *first year* of a project to relate to the success/cost/etc *eventually* (so as to come up with a predictor)
Also, this isn't a strict split, so: compare one or more aspect against one another - how does cost relate to ratings and popularity? How does popularity relate to success?
Current Work
We're currently focusing popularity, as measured by ratings and downloads. As of 2/5, we're making a list of media-related software and running easy metrics against that.
Possible Software
Ohloh seems to be a good source. An example of two pieces of media software listings:
Ratings Sources
- Ohloh provides average rating, downloads, and stacks
- Freshmeat provides rating, vitality, and popularity
- Betanews provides rating and downloads
- Download.com provides average editor ratings, user ratings, and downloads
- Softsea provides ratings
- Snapfiles provides editor ratings and user ratings
More on Metrics
Search hits: Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings. eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]").
OSS Resources
Cost of an OSS project
A great advantage of Open-source software over a commercial software is the price. Users get the software for free, but there is some work that was put in to these great products. What are the actual costs of making these software?
Ohloh has an "estimated number of person-years" and associated estimated cost.
Quotes
"Indeed, as we have repeatedly emphasized, the Internet is the primary enabler of the OSS development and distribution process, making it possible for widely distributed groups to share ideas and software extremely quickly at negligible cost." Understanding Open Source Software Development By Joseph Feller, Brian Fitzgerald
"But open source is a low-cost way of increasing the opportunity for surprise." Lessons from Open Source software development, Tim O'Reilly 1999
Success of Open Source Project
Most downloaded on sourceforge: http://sourceforge.net/top/topalltime.php?type=downloads
Papers
Defining Open Source Software Project Success, Kevin Crowston, Hala Annabi, and James Howison, 2003 http://floss.syr.edu/publications/icis2003success.pdf This paper identify a range of measures that can be used to assess the success of open source software (OSS) projects.
Information Systems Success in Free and Open Source Software Development: Theory and Measures http://floss.syr.edu/publications/crowston2006flossSuccessSPIPpre-print.pdf
Useful Links
Motivations of open-source developers:
Working for Free? Motivations for Participating in Open-Source Projects
http://mesharpe.metapress.com/media/fc22ht5ywp6vyndknrar/contributions/e/e/p/d/eepdf96rnt0geahv.pdf
Why Open Source software can succeed:
http://opensource.mit.edu/papers/rp-bonaccorsirossi.pdf
Case Studies:
A Case Study of Open Source Software Development: The Apache Server
http://conway.isri.cmu.edu/~jdh/collaboratory/research_papers/apachefinal3.pdf