Difference between revisions of "CSE503 SoftwareIdol"
(→Possible Software) |
(→OSS Resources) |
||
Line 64: | Line 64: | ||
'''Search hits:''' Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings. eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]"). | '''Search hits:''' Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings. eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]"). | ||
− | == OSS Resources == | + | == General OSS Resources == |
=== Cost of an OSS project === | === Cost of an OSS project === | ||
Revision as of 20:43, 5 February 2008
Contents
Project Ideas
We want to get involved in comparing OSS projects; there are a few different aspects of them that we could compare, and a few different metrics we could (try to) use for actually doing the comparison.
- Aspects we could compare, from hard to easy:
- Cost of building/maintaining
- Successfulness
- Popularity
- Ratings
- Metrics we could use:
- Machine-learning of differentiating features (training a classifier)
- "Englishyness"
- Size (lines of code, number of revisions, etc)
- Churn (rate of changes over time given repository change tracking)
- Number of contributors
- Other (many)
We probably want to combine a hard aspect with an easy metric or vice versa.
Options:
- Come up with a reasonable way of estimating the "cost" of an OSS project; maybe do some metrics.
- Come up with a reasonable way of defining "success"; analyze how a few not-too-deep metrics, such as size, or churn, or (whatever?) correspond to that split.
- Take ratings or popularity, and measure some harder approaches (like Englishyness and churn) to try to come up with a correlation.
- Take the easiest-to-measure aspects, get a good data set (ideally within a domain), and train a classifier using Weka or a similar toolkit; look at most-relevant features for something human-readable. http://en.wikipedia.org/wiki/Weka_(machine_learning)
- A special case: Split a few code bases temporally and try to use classification of the repository for the *first year* of a project to relate to the success/cost/etc *eventually* (so as to come up with a predictor)
Also, this isn't a strict split, so: compare one or more aspect against one another - how does cost relate to ratings and popularity? How does popularity relate to success?
Current Work
We're currently focusing popularity, as measured by ratings and downloads. As of 2/5, we're making a list of media-related software and running easy metrics against that.
OSS Software
Ohloh seems to be a good source. An example of two pieces of media software listings:
Lists of Best Open-Source Software Lists:
- http://www.opensourcewindows.org/
- http://www.opensourcemac.org/
- http://lifehacker.biz/articles/best-open-source-software/
- http://www.opensourcelist.org/oss/suggestedapplications.html
The first software that was mentioned in all of the best open-source lists (some are listed below) is Firefox. So I started looking at Firefox and realize that the "add-ons" are open-source. Though people can argue that anything we find in there might be Firefox-only characteristic, it's a good place to start looking with a smaller scope.
Firefox add-on browser has a feature to rank add-ons by popularity and ratings. This is a great example of when popularity and rating don't correlate. For example, the top-rated download management add-ons are far from the most popular ones. The simple reasons behind this are:
- there are only a few people who rates any one of the add-ons.
- the ratings are not normalized by the number of people who rate the add-ons.
Ratings Sources
- Ohloh provides average rating, downloads, and stacks
- Freshmeat provides rating, vitality, and popularity
- Betanews provides rating and downloads
- Download.com provides average editor ratings, user ratings, and downloads
- Softsea provides ratings
- Snapfiles provides editor ratings and user ratings
More on Metrics
Search hits: Experiments with using the same list to do web hit counts are interesting. Audacity, Gimp, Pidgin, Eraser, etc. are words with multiple meanings. eMule, eMule morph and eMule Xtreme Mod are all separate downloads, but the hit counts overlap. Some things are too short a string (such as "ABC [Yet Another Bittorrent Client]").
General OSS Resources
Cost of an OSS project
A great advantage of Open-source software over a commercial software is the price. Users get the software for free, but there is some work that was put in to these great products. What are the actual costs of making these software?
Ohloh has an "estimated number of person-years" and associated estimated cost.
Quotes
"Indeed, as we have repeatedly emphasized, the Internet is the primary enabler of the OSS development and distribution process, making it possible for widely distributed groups to share ideas and software extremely quickly at negligible cost." Understanding Open Source Software Development By Joseph Feller, Brian Fitzgerald
"But open source is a low-cost way of increasing the opportunity for surprise." Lessons from Open Source software development, Tim O'Reilly 1999
Success of Open Source Project
Most downloaded on sourceforge: http://sourceforge.net/top/topalltime.php?type=downloads
Papers
Defining Open Source Software Project Success, Kevin Crowston, Hala Annabi, and James Howison, 2003
http://floss.syr.edu/publications/icis2003success.pdf
This paper identify a range of measures that can be used to assess the success of open source software (OSS) projects.
Information Systems Success in Free and Open Source Software Development: Theory and Measures
http://floss.syr.edu/publications/crowston2006flossSuccessSPIPpre-print.pdf
Useful Links
Motivations of open-source developers:
Working for Free? Motivations for Participating in Open-Source Projects
http://mesharpe.metapress.com/media/fc22ht5ywp6vyndknrar/contributions/e/e/p/d/eepdf96rnt0geahv.pdf
Why Open Source software can succeed:
http://opensource.mit.edu/papers/rp-bonaccorsirossi.pdf
Case Studies:
A Case Study of Open Source Software Development: The Apache Server
http://conway.isri.cmu.edu/~jdh/collaboratory/research_papers/apachefinal3.pdf
How to Evaluate Open Source Software / Free Software (OSS/FS) Programs:
http://www.dwheeler.com/oss_fs_eval.html