Challenge Chair
Annie Ying
McGill University
Canada
Committee Members
Nasir Ali | University of Waterloo, Canada |
Kelly Blincoe | University of Victoria, Canada |
Oscar Callau | University of Chile, Chile |
Bradley Cossette | McGill University, Canada |
Latifa Guerrouj | Concordia University, Canada |
Emitzá Guzmán | Technische Universität München, Germany |
Laura Inozemtseva | University of Waterloo, Canada |
Shane McIntosh | Queen's University, Canada |
Laura Moreno | University of Texas at Dallas, USA |
Sebastian Müller | University of Zurich, Switzerland |
Jaechang Nam | Hong Kong University of Science and Technology, China |
Luca Ponzanelli | University of Lugano, Switzerland |
Baishakhi Ray | University of California at Davis, USA |
Christoph Treude | Universidade Federal do Rio Grande do Norte, Brazil |
Important Dates
(all deadlines are set to 23.59.59, AOE Time)
|
|
Challenge |
|
Papers due: | February 27, 2015 |
Author notification: | March 16, 2015 |
Camera ready: | March 30, 2015 |
EasyChair 2015
Submit papers through EasyChair: |
easychair.org/conferences/?conf=msr2015 |
Mining Challenge
The International Working Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge we call upon everyone interested to apply their tools to bring research and industry closer together by analyzing a common data set. The challenge is for researchers and practitioners who bravely put their mining tools and approaches on a dare.
This year's challenge is on comparing and combining different information sources, on the Stack Overflow data set. Stack Overflow enjoys popularity among users, researchers, and even contests (e.g., MSR Challenge in 2013 and Kaggle). Being a collaboratively edited question answering site on computer programming, Stack Overflow naturally lends itself to diverse information sources: natural language text from the question and the post content; code fragments in the posts; votes and reputation of the users; and metadata such as tags provided by users, date of the posts, etc. We ask you to come up with a problem and present results that compare at least two settings involving single information sources or a combination of information sources.
For example, if you are interested in predicting the number of votes on a new Stack Overflow question, one possible challenge submission is to compare the predictive power of three settings on the number of votes on a Stack Overflow question: natural language text alone, code fragments alone, and the combination of text and code fragments. Here is another example: If you are interested in studying the readability of the code fragments in a Stack Overflow answer, a possible challenge submission is to investigate how the readability of the surrounding text and the reputation of the users each associates with the readability on the code fragments.
How to Participate in the Challenge
Participating in the challenge requires you to:
1. Download the data.
2. Report your findings in a four-page document.
3. Submit your report on or before, February 27, 2015
4. If your report is accepted, present your awesome findings at MSR 2015!
Challenge Data
We provide you with the latest official data dump on Stack Overflow content (updated on September 26, 2014), made available by Stack Exchange on the Internet Archive. This data includes the history of question and answer posts, tags, votes on the posts, and the reputation of the posters in XML format. For the schema, you can refer to a post on Stack Exchange. Another useful resource is the Stack Exchange Data Explorer which allows you to issue SQL queries directly against a copy of the data online. The Stack Overflow data is licensed under the Creative Commons BY-SA 3.0 license. A blog post by David Fullerton from Stack Exchange provides more information about the license.
When you use the data provided by the MSR 2015 challenge, we ask you to cite it as in the following:
@inproceedings{MSRChallenge2015,
author = {Annie T. T. Ying},
title = {Mining Challenge 2015: Comparing and combining different information
sources on the Stack Overflow data set},
booktitle = {The 12th Working Conference on Mining Software Repositories},
year = {2015},
pages = {to appear}
}
Challenge Report
The challenge report should describe the results of your work by providing an introduction to the problem being addressed, the information sources being compared and combined, the approach and tools used, your results and their implications, and conclusions. Keep in mind that the report will be evaluated by a jury. Make sure your report highlights the contributions and the importance of your work.
Challenge reports must be at most 4 pages long and must conform at time of submission to the ICSE (and MSR) 2015 Format and Submission Guidelines.
Submission Details
Submit your challenge report (maximum 4 pages) to EasyChair on or before February 27, 2015. Please submit your challenge reports to the "MSR 2015 Challenge Track". Author notification and cameraready dates are going to be March 16th and March 30th, respectively.
Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere during the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission.
Upon notification of acceptance, all authors of accepted papers will be asked to complete an ACM Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR 2015 conference. All accepted contributions will be published in the conference electronic proceedings.
Prize
We are grateful for IBM Research's sponsorship of the Mining Challenge this year. The best team will be awarded $200 worth of usage of the IBM Bluemix cloud platform and a $200 ThinkGeek gift certificate.