Usage and attribution of Stack Overflow code snippets in GitHub projects

39 views
Skip to first unread message

Sebastian Baltes

unread,
Nov 11, 2016, 4:51:36 AM11/11/16
to inspectIT
Dear inspectIT team,

We are a group of researchers from the University of Trier in Germany
doing research on the usage and attribution of code snippets from Stack
Overflow in GitHub projects. The licensing of these code snippets is a
controversially discussed topic (see references below [3]). Currently,
all content on Stack Overflow, including source code, is licensed under
Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA 3.0 [1]). Among
other requirements, this license demands attribution and demands derived
work to be published under the same license.

Using regular expressions, we searched for copies of ten popular Java
snippets in the Google BigQuery GitHub data set [2]. We are now
contacting you, because we found one or multiple matches of these
snippets in your GitHub repositories. A match consists of a code snippet
from Stack Overflow along with a code snippet found in one of your
repositories.

Below, you will find links to the following files: (1) a CSV file
containing a list of all matches (all_matches.csv) and (2) a Java file
with the corresponding snippets from Stack Overflow and your repository
(all_snippets.java). You may use this data to update the files with a
proper attribution. The CSV file has a column named "referenced",
indicating if we found a link to Stack Overflow in the file.

Further, we would really appreciate it if you could provide feedback
using the online form linked below (takes about 5 to 10 minutes). The
first question in this form refers to one of the matches, which we
copied into a separate file (see file (3), snippet_for_survey.java). We
do not want to judge or blame developers copying code from Stack
Overflow, we are just interested in the reasons why (or why not) the
origin of Stack Overflow code snippets is indicated. Using your
feedback, we want to further improve our matching approach and finally
build a tool to support developers in attributing and maintaining code
copied or adapted from Stack Overflow.

https://www.unipark.de/uc/snippets-survey/?code=ca0ed09744fb12b6

Best regards,
Sebastian Baltes

-------------------------------
Sebastian Baltes
Software Engineering Group
University of Trier (Germany)
-------------------------------
web: sbaltes.com
e-mail: s.ba...@uni-trier.de
twitter: s_baltes
-------------------------------

Links to files:

(1)
http://st.uni-trier.de/~baltes/survey-snippets/4ce6eb5b924b75e64a8f3cb2ac4ad81d/all_matches.csv
(2)
http://st.uni-trier.de/~baltes/survey-snippets/4ce6eb5b924b75e64a8f3cb2ac4ad81d/all_snippets.java
(3)
http://st.uni-trier.de/~baltes/survey-snippets/4ce6eb5b924b75e64a8f3cb2ac4ad81d/snippet_for_survey.java

References:

[1] https://creativecommons.org/licenses/by-sa/3.0/
[2] https://cloud.google.com/bigquery/public-data/github
[3] Discussions about license of Stack Overflow code snippets:

"Do I have to worry about copyright issues for code posted on Stack
Overflow?"
(http://meta.stackexchange.com/q/12527)

"Can we get some explicit clarification on the *intended* legal
usage of code from SO answers?"
(http://meta.stackoverflow.com/q/286582)

"What is up with the source code license on Stack Overflow?"
(http://meta.stackexchange.com/q/25956)

"What is the license status of StackOverflow code snippets?"

(https://legalict.com/software/what-is-the-license-status-of-stackoverflow-code-snippets/)


"The MIT license - clarity on using code on Stack Overflow and Stack
Exchange"
(http://meta.stackexchange.com/q/271080)

"A new code license: The MIT, this time with attribution required"
(http://meta.stackexchange.com/q/272956)

Stefan Siegl

unread,
Nov 11, 2016, 5:27:28 AM11/11/16
to inspectIT, s.ba...@uni-trier.de
Hi Sebastian,

Thank you for your email. We took licensing really seriously and went from closed source to AGPLv3 to now Apache v2. We took special care to be legally correct and even hired legal consultants to help us out. The piece of code you found is taken from Stackoverflow (actually exactly 4 lines within the whole project :)). We pointed this out as well. The documentation of the method in question reads:
	/**
	 * Returns the human readable bytes number.
	 * <p>
	 * <b>IMPORTANT:</b> The method code is copied/taken/based from <a href=
	 * "http://stackoverflow.com/questions/3758606/how-to-convert-byte-size-into-human-readable-format-in-java"
	 * >stackoverflow</a>. Original author is aioobe. License info can be found
	 * <a href="http://creativecommons.org/licenses/by-sa/3.0/">here</a>.
	 *
	 * @param bytes
	 *            Bytes to transform.
	 * @return Human readable string.
	 */

Actually I believe that this is the correct handling, do you disagree?

We are always happy to help and filled the survery. Small note: I could not set the number of core contributors in your survey (got an error that the number has to be bigger than 0).

Best regards,
Stefan
Reply all
Reply to author
Forward
0 new messages