Category Archives: Big Data

Stating the Obvious: Data Security & Requirements Analysis

Yesterday it was announced that SystmOne, an IT system used by the UK National Health System (NHS) may have a design flaw potentially allowing patient data to be read by thousands of NHS workers. This story illustrates that there can be too much emphasis on security bugs and not enough on design flaws, when it comes to assessing data security risks.

The potential design flaw in the NHS IT system may have lead to 1 in 3 GP (General Practitioners) surgeries being vulnerable to a data breach. This has possibly exposed 26 million peoples’ medical records to being read by strangers (Donnelly, 2017).

The system has an “enhanced data sharing feature” which when switched on allowed an individual’s data to be accessed by a local hospital. The idea behind this was sound – more effective treatment by giving local hospitals access to patient records.

However when switched on this data sharing feature meant that receptionists, clerks and care workers at care homes, other hospitals and even prisons could possibly read an individual’s private medical data.

You would think that those who commissioned SystmOne and those that built it understood what “enhanced data sharing” meant. Why was it not recognized that there needed to be data access restrictions by role (e.g. care worker’s data access rights vs. doctor’s data access rights)?

Arguably this was the result of poor requirements analysis – requirements analysis being the process by which user expectations for a product/system are identified. This should be the first step in the six steps of the Software Development Lifecycle. This process is used in software development to help ensure that the product meets the needs of the business/organization that will use it.

One factor in poorly written requirements is the client believing that a requirement is so obvious that they do not need to state it. It may have seemed self evident to those commissioning the NHS patient record system that only some types of NHS workers should be able to access the data.  So it was never said and data sharing restrictions by role were never built into the system.

So how can you get the client to verbalize all these implicit requirements? This should be done via requirements elicitation for which there is a variety of methods including:

  • Interface and documentation analysis which may reveal organization’s data protection and security policies.
  • Conceptual aids through business cases, storyboarding and scenario driven strategizing which can help identify previously unstated security requirements.
  • Observation may highlight users’ interaction with data revealing unspoken data access controls/polices in organizations.
  • Also a good idea generally for getting requirements is to make sure that in interviews and focus groups the client is asked “Why does the system/process have to be this way?” – helping separate a requirement from design (Vincent, 2015).

Overall the IT/Software Development industry must remember that without quality requirements legitimate features of a system can be built that are security vulnerabilities. Whilst identifying bugs is important equal consideration should be given to preventing design flaws. Without this the IT/Software Development industry contributes to the uphill battle organizations face with ensuring data protection and security.

 

The Challenges Of Big Data

I went to the Data Science Austin Pop up conference  a couple of weeks ago. It is a day long conference with talks designed for data scientists, developers and executives.

It was a very interesting series of talks. One presentation called Back to The Future and Data Analytics  resonated with me.

The talk suggested that we have  overestimated the power of Big Data. Big Data refers to large datasets.  With the recent development of powerful data analytics that could process these large data sets it was thought that deep insights into companies and industries could be produced.   It was thought that this would  vastly increase the efficiency and profitability of companies and industries.

However, an estimate was offered during the talk that data scientists and analysts spend 80% of their time doing data scrubbing and only 20% of their time doing analysis.

This resonates with me.  Whist not being a  data scientist or analyst my job requires some data analysis. But to get clean data worth analyzing requires a lot of data scrubbing. I use the formula Concatenate, Left, Right and Proper far more often than If statements or VLookUps. Most of my time is spent separating data into columns, combing numbers and words in columns and stripping away unnecessary information. I have often felt frustrated that most of my time is spent preparing data to be analyzed rather than actually analyzing it.

So is there any hope that in the future we can avoid having to do so much data scrubbing?

I think that the answer will depend on the industry. There are some industries for which to perform any analysis on will always require significant data scrubbing.

Healthcare is one of them.   The variety of terms used to describe the same disease or treatment  means that it is very hard to perform automated data analysis on healthcare data. It may be obvious to any doctor that different terms refer to the same condition but a computer may need to learn what terms equal the same condition or treatment. The “messiness”of the data slows a computer’s processing down.

So even if we have powerful data analytics solutions that can process large data sets, if the data is poor quality the analysis may not be accurate. Arguably industries that rely less on human record keeping such as industries that use satellite imaging or measure seismic activity may produce cleaner data and therefore good analysis.

But clean data is not enough to produce good analysis. You need three things: clean data, large data sets, and consistently accurate humans. And don’t forget that tricky matter of interpretation. It is probably a lucky data scientist that gets to work with all conditions present.

It will be interesting to see how the challenges of Big Data can be overcome. But right now the promise of  Big Data is far from being realized. Perhaps we need to remember the old IT adage “Garbage In” and “Garbage Out”.