Free software, free services but what about your data?
I care a lot about free software, not only as a Debian Developer. The use of software as a service matters as well because my principle free software development is on just such a project, licensed under the GNU Affero General Public License version 3. The AGPL helps by allowing anyone who is suitably skilled to install their own copy of the software and run their own service on their own hardware. As a project, we are seeing increasing numbers of groups doing exactly this and these groups are actively contributing back to the project.
So what is the problem? We've got an active project, an active community and everything is under a free software licence and regularly uploaded to Debian main. We have open code review with anonymous access to our own source code CI and anonymous access to project planning, open mailing list archives as well as an open bug tracker and a very active IRC channel (#linaro-lava on OFTC). We develop in the open, we respond in the open and we publish frequently (monthly, approximately). The code we write defaults to public visibilty at runtime with restrictions available for certain use cases.
What else can we be doing? Well it was a simple question which started me thinking.
The lava documentation has various example test scripts e.g. https://validation.linaro.org/static/docs/v2/examples/test-jobs/qemu-kernel-standard-sid.yaml
these have no licence information, we've adapted them for a Linux Foundation project, what licence should apply to these files?
Those are our own examples, contributed as part of the documentation and covered by the AGPL like the rest of the documentation and the software which it documents, so I replied with the same. However, what about all the other submissions received by the service?
LAVA acts by providing a service to authenticated users. The software runs your test code on hardware which might not be available to the user or which is simply inconvenient for the test writer to setup themselves. The AGPL covers this nicely.
What about the data contributed by the users? We make this available to other users who will, naturally, copy and paste for their own tests. In most cases, because the software defaults to public access, anonymous users also get to learn from the contributions of other test writers. This is a good thing and to be encouraged. (One reason why we moved to YAML for all submissions was to allow comments to help other users understand why the submission does specific things.)
Writing a test job submission or a test shell definition from scratch is a non-trivial amount of work. We've written dozens of pages of documentation covering how and how not to do it but the detail of how a test job runs exactly what the test writer requires can involve substantial effort. (Our documentation recommends using version control for each of these works for exactly these reasons.)
At what point do these works become software? At what point do these need licensing? How could that be declared?
I don't consider LAVA to be SaaSS although it is Software as a Service (SaaS). (Distinguishing between those is best left to the GNU document as it is an almighty tangle at times.)
The same problems affect trying to untangle sharing the test job data within LAVA.
Adding Licence text
The traditional way, of course, is simply to add twenty lines or so of comments at the top of every file. This works nicely for source code because the comments are hidden from the final UI (unless an explicit reference is made in the --help output or similar). It is less nice for human readable submissions where the first thing someone has to do is scroll passed the comments to get to what they want to see. At that point, it starts to look like a popup or a nagging banner - blocking the requested content on a website to try and get the viewer to subscribe to a newsletter or pay for the rest of the content. Let's not actively annoy visitors who are trying to get things done.
Adding Licence files
This can be done in the remote version control repository - then a single line in the submitted file can point at the licence. This is how I'm seeking to solve the problem of our own repositories. If the reference URL is included in the metadata of the test job submission, it can even be linked into the test job metadata and made available to everyone through the results API.
metadata: licence.text: http://mysite/lava/git/COPYING licence.name: BSD 3 clause
Metadata in LAVA test job submissions is free-form but if the example was adopted as a convention for LAVA submissions, it would make it easy for someone to query LAVA for the licences of a range of test submissions.
Currently, LAVA does not store metadata from the test shell definitions except the URL of the git repo for the test shell definition but that may be enough in most cases for someone to find the relevant COPYING or LICENCE file.
This could be a problem too. If users contribute data under unfriendly licences, what is LAVA to do? I've used the BSD 3 clause in the above example as I expect it to be the most commonly used licence for these contributions. A copyleft licence could be used, although doing so would require additional metadata in the submission to declare how to contribute back to the original author (because that is usually not a member of the LAVA project).
Why not Creative Commons?
Although I'm referring to these contributions as data, these are not pieces of prose or images or audio. These are instructions (with comments) for a specific piece of software to execute on behalf of the user. As such, these objects must comply with the schema and syntax of the receiving service, so a code-based licence would seem correct.
Finally, a word about what comes back from your data submission - the results. This data cannot be restricted by any licence affecting either the submission or the software, it can be restricted using the API or left as the default of public access.
If the results and the submission data really are private, then the solution is to take advantage of the AGPL, take the source code of LAVA and run it internally where the entire service can be placed within a firewall.
What happens next?
- Please consider editing your own LAVA test job submissions to add licence metadata.
- Please use comments in your own LAVA test job submissions, especially if you are using some form of template engine to generate the submission. This data will be used by others, it is easier for everyone if those users do not have to ask us or you about why your test job does what it does.
- Add a file to your own repositories containing LAVA test shell definitions to declare how these files can be shared freely.
- Think about other services to which you submit data which is either only
partially machine generated or which is entirely human created. Is that data
free-form or are you essentially asking the service to do a precise task on
your behalf as if you were programming that server directly? (Jenkins is a
classic example, closely related to LAVA.)
- Think about how much developer time was required to create that submission and how the service publishes that submission in ways that allow others to copy and paste it into their own submissions.
- Some of those submissions can easily end up in documentation or other published sources which will need to know about how to licence and distribute that data in a new format (i.e. modification.) Do you intend for that useful purpose to be defeated by releasing your data under All Rights Reserved?
I don't enable comments on this blog but there are enough ways to contact me and the LAVA project in the body of this post, it really shouldn't be a problem for anyone to comment.