Saturday 21 June 2014

Software Engineering - science or marketing?

The software development metholodogy, the essential of the man who wants to organise a band of ragtag programmers into a team of professional engineers who produce a working product that is delivered on time and meets the requirements of the project. The latter statement is always  point of contention as what was a requirement is no longer a requirement, and what wasn't a requirement is now completely critical to the success of the project. The common adage is that there are two things that you can guarantee about software of any reasonable complexity: That it will be late and contain bugs. Modern software development methodologies attempt to tackle this, and so I would like to understand exactly how well they do this - if at all. We will talk about software in the context of experimental science rather than of mathematical science, because that is by and large how the field of software engineering is practiced today outside of some highly specialised fields.

Let's talk about something else for a second. Let's talk about medicine. Medicine is the great bastion of empirical study in the 20th century. The use of experimental science in medicine has gone a long way in terms of treating disease. If you go to a hospital, and your disease is relatively well understood they will tell you how they will treat it and roughly how long it will take for that treatment to take effect or in the case of surgery, how long you're likely to take to recover. To anybody in software, let's be honest, that's magic. The refinement of medicine as a field in this regard is entirely down to the state of research in the field. I'd highly recommend the book Bad Science if you're at all interested in finding out the consequences of ignoring research in a medical setting and leave it as an exercise to the reader to consider how that might be analogous to the same situation for developers. 

So naturally this is a good time to look at the state of research in software engineering. Let's look at the high level. The replication study is an important part of experimental research - it is the thing that says "yes, somebody who didn't think of the original hypothesis is capable of getting those results too." It verifies the fact that the results weren't just a fluke, the result of selection bias or some similar phenomenon. It has been shown that while the state of replication studies in software engineering is improving "but the absolute number of replications is still small," (da Silva, 2014). In a lot of regards, this conclusion makes sense. Replication studies take time to complete and software engineering is still a relatively new field. However, it's worth noting that this highlights the immaturity of the field. 

Let's look at a more typical piece of research that tries to yield some meaningful conclusion about a particular software development methodology. This particular study, entitled Realizing quality improvement through test driven development was conducted by Microsoft Research and published in the journal Empricial Software Engineering. The conclusions drawn by the article are stark and seem to be highly in favour of the Test Driven Development methodology, it reduced defect rates between 40% and 90% in the projects studied, while only increasing development time by 15-35%. Sounds like a good trade-off right?

The paper was based on four case studies. This should ring some alarm bells - this figure isn't statistically significant. Case studies are not worthless, but are not particularly useful for drawing generalised conclusions that imply your project will act similarly. Not to mis-represent the study, the researchers acknowledge this also, stating that "a family of case studies is likely not to yield statistically significant results, though Harrison (2004) observes that statistical significance has not been shown to impact industrial practice." and later go onto say that they hope that the case studies contribute to the pool of empirical software engineering research; that we will hopefully be able to draw conclusions from such studies being conducted in different contexts. The (cited!) statement about statistical validity not affecting industrial practice is an interesting one, though. Indeed the article they cite there is entitled "Propaganda and software development".  Going into that article, it talks a lot about how peddlers of methodologies try to get people to "jump on the bandwagon" and use emotive language rather than reason to try to get people to use a methodology.

So let's briefly talk about Agile. Agile seems to be the poster child of modern development methodology in industry. It represents the antithesis of everything that the waterfall model stood for. It is definitely sold on reason and language more than it is sold on evidence (in the sense of research based evidence that I am talking about in this post). If you look at the Agile Manifesto, what I am saying is highly self evident. It's marketed more than it is argued for. I'm not necessarily saying that Agile or its child methodologies is the wrong way to go, and this post isn't talking about the specifics of the Agile methodology but rather what we really know about it.  Somewhat more sinister is Scrum, an Agile based development framework, which (successfully) sells people training courses. It makes some vague notions about following empiricism on its website, although at a cursory glance does not seem to link to any articles to that effect although does point at a few sciencey-looking graphs (with no citations of which to speak). Again, this is marketing above research.
Graphs!

But let's take off the tinfoil hat for a moment and go back to talking about research. So if we have some research, but realistically we cannot study large samples in one go, what can we do about that to start to draw generalised conclusions? That's right! The meta-analysis (this is the research tool which is used to aggregate many individual studies - so you can take many small studies and treat them as one large one, placing weight on the studies which have been conducted most properly). It turns out that there are some meta-analyses  on the subject of software engineering, which is good. I managed to find a meta analysis on the subject of Test Driven Development, which makes a nice comparison to the Microsoft study above. The conclusion that they came to were that the effects of TDD were relatively small, although larger if you only look at industrial studies rather than academic ones (Rafique, 2013).


What I'm not really trying to do here is to draw any conclusions on what we should or should not be doing in regards to any particular software engineering technique. I am rather trying to understand whether software engineering, as an industry, really knows what it's doing. I suspect that the answer is really "sort of" - in the sense that there are some subjects which have been reasonably well studied, but we're doing a lot of it because management thought it was a good idea or the marketing material for it seemed quite good. Similar problems in terms of research in software engineering are seen in social sciences, because it's very hard to reliably study groups of people. The other spin I had on this post was that I might have titled it "Software engineers are not sociologists, but should they be?" in that same vein. I kind of imagine that all such subjects are kind of like medicine was pre-germ theory, which is incredibly exciting when you consider what might be around the corner when we discover the 'next big thing' in terms of software engineering methodology - although worrying if you consider the analogy of what we're doing at the moment.


References to non-hyper-linked papers (can't link to the full text because I've accessed them through my university and distributing them would likely be naughty):

Silva, Fabio ; Suassuna, Marcos ; França, A. ; Grubb, Alicia ; Gouveia, Tatiana ; Monteiro, Cleviton ; Santos, Igor

Empirical Software Engineering, 2014, Vol.19(3), pp.501-55

Yahya Rafique and Vojislav B. Misic, IEEE transactions on software engineering, 2013

Monday 16 June 2014

Net Neutrality

Recently there has been a lot of noise about Net Neutrality. It is one of the few politically charged discussions that I actually care about.


There are lots of good reasons to keep net neutrality around, and to preserve the lack of censorship on the Internet from government bodies. There are lots of good reasons, from the perspective of governments and internet service providers to destroy it. The ability to convey information is power. The Internet is the pen of today. If the pen is mightier than the sword, the Internet is mightier than Fat Man.

Recently in Europe, the UK has decided to fight the EU's propositions to conserve Net Neutrality. The UK government has taken the reactionary stance of "what about the children!?" and it's ridiculous.


The two main points which are being made are:


1: We need the power to block websites to help parents and stop children from viewing inappropriate content!
Let me be extremely clear here. There is no reason what so ever that such filtering needs to be done at the ISP level. In fact, I would argue that this is extremely poor infrastructure design, as if people want to filter different content, they should be free to decide what content is filtered. It would be better to do filtering at the home network level, potentially with support from the ISP to set that up. The filtering could then be customised per user. This would really help parents, rather than assuming that the government knows best.



2: We need to block child pornography!
Let's look at what this proposes to do. It tries to very indirectly reduce the amount of child abuse – the argument here being supply and demand – people demand it and the easiest way is to supply it over the Web. So lawmakers have decided that blocking child pornography at the ISP level is the most effective way of breaking that supply, and so hopefully the amount of child abuse related to producing such material should go down. There are a number of flaws, though.



  • Since the demand is now there, people will still find a way to supply it, whether that supply happens over the web or not.

  • Even if you block this material, your filtering is unlikely to be 100% accurate. People will still find some
    • Worryingly, this is likely to be the most recent content which has not been added to the ISP's filtering list. There is sadly no guarantee that this will even reduce related child abuse.
  • Proxies. Again.
  • Traceability of material is still not affected meaningfully. People could still exchange such materials; compressed, encrypted images and videos as anything. You could easily encode such data as a bunch of random words chosen from the x most frequently used English words.
So unfortunately we're left with the situation whereby you either have to pare down the internet to some shell of itself to enforce any of what Net Neutrality opponents suggest is necessary.
It's sort of insulting to everybody's intelligence to suggest that eroding net neutrality in this way is at all likely to reduce child abuse in any meaningful way and is highly manipulative in that it tries to form the implication that if you don't agree with the UK government on this front, that you have no empathy for children or some bullshit like that. Unfortunately, we only have the ability to protect children on our own soil. There is no magic wand we can wave to end child abuse in other countries. There is no joining of hands, singing of songs and gently eroding our civil liberties that will change that.

The worrying thing to me is that despite all of this, the people who make these “child friendly” assertions are likely to be completely aware of how impotent their proposals are. This should be raising some eyebrows about what their aims actually are and how them coming to fruition will affect us further down the line. We cannot exchange our civil liberties for a false silver bullet.



So what do we do?
For self styled techno-geek people like myself, I think the best thing we can do is be concrete. Words are cheap. If we can come up with demos and point at real technologies which are better alternatives or illustrate how ridiculous a proposal is, then we take away anybody's ability to legitimately argue to the contrary. So maybe future blog posts will feature quotes from politicians and Net Neutrality opponents, along with a piece of technology that renders their point moot.


And yes, this post does not even begin to go into traffic shaping. Maybe I'll talk about that another time, but I wanted to focus on censorship and why the government's claim to that power in regards to the Internet is absurd.

Anyway, what do you think? Let me know!

If it's good enough for Google it's good enough for me!

Some things come about every so often for me - like I'll want to install some free Unix-like operating system on my desktop, or I'll want to go through a Pokemon game. Sometimes that itch is to set up a blog for whatever reason. Usually what happens with these kinds of things is I'll first set it up on a simple blog website, decide it's not good enough, then register with some webhost, set all that up, decide that's a job well done and go do something else because I can't decide what to write about on whatever overly specific topic I decided on.

This time, I'm going to keep this quite general about technology, or whatever things happen to be going through my head that in true blog style I don't expect anybody to actually attempt to read.

As the title states "if it's good enough for Google, it's good enough for me" - I have no particular delusions of grandeur about this. It's a blogspot blog so I have somewhere to stick ideas and project updates on.  Sometimes I have lots of ideas, other times not so much. There's no real incentive to go and set up a full website for it at this point and I fully plan on forgetting about this.

The name is just something I thought sounded neat - I'm into vision and machine learning topics at the moment, so you might hear a bit about that. It'll probably mostly be technology related.

A bit about me: I'm currently a computer science undergraduate, general computer scientist enthusiast and full time software engineer (student) for the next month or so.

Anyway, if you got through all that, congratulations!