MSR 2015 May 16–17. Florence, Italy
The 12th Working Conference on Mining Software Repositories

Challenge Chair

Annie Ying

Annie Ying
McGill University

Committee Members

Nasir Ali University of Waterloo, Canada
Kelly Blincoe University of Victoria, Canada
Oscar Callau University of Chile, Chile
Bradley Cossette McGill University, Canada
Latifa Guerrouj Concordia University, Canada
Emitzá Guzmán Technische Universität München, Germany
Laura Inozemtseva University of Waterloo, Canada
Shane McIntosh Queen's University, Canada
Laura Moreno University of Texas at Dallas, USA
Sebastian Müller University of Zurich, Switzerland
Jaechang Nam Hong Kong University of Science and Technology, China
Luca Ponzanelli University of Lugano, Switzerland
Baishakhi Ray University of California at Davis, USA
Christoph Treude Universidade Federal do Rio Grande do Norte, Brazil

Important Dates

(all deadlines are set to 23.59.59, AOE Time)

Papers due: February 27, 2015
Author notification: March 16, 2015
Camera ready: March 30, 2015

EasyChair 2015

Submit papers through EasyChair:

Mining Challenge

The International Working Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge we call upon everyone interested to apply their tools to bring research and industry closer together by analyzing a common data set. The challenge is for researchers and practitioners who bravely put their mining tools and approaches on a dare.

This year's challenge is on comparing and combining different information sources, on the Stack Overflow data set. Stack Overflow enjoys popularity among users, researchers, and even contests (e.g., MSR Challenge in 2013 and Kaggle). Being a collaboratively edited question answering site on computer programming, Stack Overflow naturally lends itself to diverse information sources: natural language text from the question and the post content; code fragments in the posts; votes and reputation of the users; and metadata such as tags provided by users, date of the posts, etc. We ask you to come up with a problem and present results that compare at least two settings involving single information sources or a combination of information sources.

For example, if you are interested in predicting the number of votes on a new Stack Overflow question, one possible challenge submission is to compare the predictive power of three settings on the number of votes on a Stack Overflow question: natural language text alone, code fragments alone, and the combination of text and code fragments. Here is another example: If you are interested in studying the readability of the code fragments in a Stack Overflow answer, a possible challenge submission is to investigate how the readability of the surrounding text and the reputation of the users each associates with the readability on the code fragments.

How to Participate in the Challenge

Participating in the challenge requires you to:

1. Download the data.

2. Report your findings in a four-page document.

3. Submit your report on or before, February 27, 2015

4. If your report is accepted, present your awesome findings at MSR 2015!

Challenge Data

We provide you with the latest official data dump on Stack Overflow content (updated on September 26, 2014), made available by Stack Exchange on the Internet Archive. This data includes the history of question and answer posts, tags, votes on the posts, and the reputation of the posters in XML format. For the schema, you can refer to a post on Stack Exchange. Another useful resource is the Stack Exchange Data Explorer which allows you to issue SQL queries directly against a copy of the data online. The Stack Overflow data is licensed under the Creative Commons BY-SA 3.0 license. A blog post by David Fullerton from Stack Exchange provides more information about the license.

When you use the data provided by the MSR 2015 challenge, we ask you to cite it as in the following:

@inproceedings{MSRChallenge2015, author = {Annie T. T. Ying}, title = {Mining Challenge 2015: Comparing and combining different information sources on the Stack Overflow data set}, booktitle = {The 12th Working Conference on Mining Software Repositories}, year = {2015}, pages = {to appear} }

Challenge Report

The challenge report should describe the results of your work by providing an introduction to the problem being addressed, the information sources being compared and combined, the approach and tools used, your results and their implications, and conclusions. Keep in mind that the report will be evaluated by a jury. Make sure your report highlights the contributions and the importance of your work.

Challenge reports must be at most 4 pages long and must conform at time of submission to the ICSE (and MSR) 2015 Format and Submission Guidelines.

Submission Details

Submit your challenge report (maximum 4 pages) to EasyChair on or before February 27, 2015. Please submit your challenge reports to the "MSR 2015 Challenge Track". Author notification and cameraready dates are going to be March 16th and March 30th, respectively.

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere during the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission.

Upon notification of acceptance, all authors of accepted papers will be asked to complete an ACM Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR 2015 conference. All accepted contributions will be published in the conference electronic proceedings.


We are grateful for IBM Research's sponsorship of the Mining Challenge this year. The best team will be awarded $200 worth of usage of the IBM Bluemix cloud platform and a $200 ThinkGeek gift certificate.


The StackOverflow data was provided by Stack Exchange hosted on the Internet Archive.