Skip to content. | Skip to navigation



Detecting License Inconsistencies in Large-scale Open Source Projects

Empirical study exposes license infringements in open source software and identifies the reasons behind these inconsistencies

Reduce, reuse, and recycle is a popular approach for conserving resources, and this can extend to the world of technology in the form of free and open source software (FOSS) distributed under OSS licenses. Katsuro Inoue at Osaka University's Graduate School of Information Science and Technology and Daniel. M. German at University of Victoria’s Department of Computer Science have used Ninka, a sentence-based license detection tool, to analyze source code files and identify inconsistencies between OSS licenses with the same origin.

FOSS can be freely used, modified, and redistributed by anyone. It offers the benefits of reducing development time while increasing product quality, security, and stability. Android, Apache, Firefox, and Linux are all familiar examples of FOSS.

In their recent study, Inoue and his team conducted an empirical evaluation of the evolution of software licenses over the development of the open source project Debian 7.5. This large-scale FOSS system comprises thousands of source code files, each containing software license descriptions that generally reside in the header comment. These licenses describe the requirements and conditions that should be followed when the code is reused, and they should not be changed without the copyright owner’s permission, or unless permitted under the terms of the license.

"It is important that FOSS developers check licenses so that they do not commit an infringement; but this is not a trivial task," Inoue explains. "License violation may occur when developers misunderstand the license of source files, which could result in legal disputes."

After identifying files that shared an origin, as demonstrated by their identical language and logic, the researchers applied Ninka, which Inoue and colleagues developed in 2010. Ninka can identify 110 licenses with 93% accuracy, and processes 600 files/minute. The research team found that license inconsistencies were not uncommon: out of 74,848 file groups, 5,359 (7.2%) contained at least one inconsistency.

The team performed a manual examination of the files' repository history to categorize these inconsistencies into those caused by legitimate changes by the copyright owner and additions, modifications, or choices made by reusers. The highest proportion of inconsistencies, at 98.4%, resulted from the license changing during the process of license evolution.

"Our analysis exposes the difficulty of discovering license infringements, and highlights the usefulness of determining and maintaining source code provenance," Inoue says.

The research team intends to improve Ninka and develop new methods to analyze the history of each source file to improve the assessment of whether license inconsistencies are legitimate.

Related link

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Professor Katsuro Inoue
Department of Computer Science
Graduate School of Information Science and Technology
Osaka University

1. Wu, Y., Manabe, Y., Kanda, T., German, D.M., Inoue, K. A method to detect license inconsistencies in large-scale open source projects. IEEE/ACM 12th Working Conference 12th Working Conference on Mining Software Repositories, 324–333 (2015).


2. German, D.M., Manabe, Y., Inoue, K. A sentence-matching method for automatic license identification of source code files. Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, 437–446 (2010).


open source software, software licenses, license inconsistencies, software reuse

This research project was supported by the Osaka University International Joint Research Promotion Program, which aims to further enhance research quality and promote globalization at Osaka University through advanced research with overseas collaborators. Professor Inoue jointly conducted this research with the following researcher: Professor Daniel M. German, University of Victoria, B.C. Canada.
back to top