Data in the age of exponential growth

This post was originally posted on the MAJOR website.

Data. So simple right? A few words or numbers connected together. Easy peasy. Maybe a handful of fields representing some context. Even better. Perhaps, there's many rows of that data. A small collection, enough to have some form of "record". Maybe 10 items of data. Good stuff.

We want to identify some trends though, so let's get that data recorded over a period of time. Excellent, now we have something to gain insights from. Already at 100 rows? Well, that grew quickly. I do just want a few more records to collect though. At 1000 items now? How's that? Ah, exponential scale, I see. A few more, multiplied by the time I wanted it for. No worries.

There's some extra fields that I want to record as well so I'll get those added, and it's helpful to know these bits of information just in case. 10,000. Our use case has expanded now so actually we need to connect a second stream of data with the first and track it more regularly. 500,000. There's some legacy data in there now, but we don't really want to change anything in case we break something. 3 million.

A year has passed now, we've got some great usable data here now. 45 million. We've expanded the relationship between data a couple more times and increased our frequency to hourly instead of daily.

10 billion.

How on earth am I going to decipher anything from that?

It's so easy for data to balloon up or scale beyond anything you could've imagined. Therefore it's really important to be cognizant of what data you capture, what data you store and how you use it. Most of us use a variety of different online tools to capture data from: website analytics, Customer Relationship Management systems, newsletter sign-ups, heatmap tracking, social media accounts, heck, even email.

There's what you must do from a legal (see GDPR) stand-point and then there's what you should do, more of an ethical view - this is the one I want to talk to.

Collection

Starting off at the beginning, collecting your data. When deciding what you want to collect, only collect what you unequivocally need. If you need to collect more data in the future, then add that data set and backfill if possible.

The number of forms I've filled out over the years where there's a field and I'm left wondering, "what on earth do you need that for?". If your users encounter that situation, or you think they might, be transparent about what you're collecting and why. If you feel like you can't say, that should speak volumes.

Consent

Whilst some data is collected implicitly for things such as system logs or security purposes, much of the data we interact with will be to do with a user or their actions. Anything specific or identifiable (see Personally Identifiable Information, PII) on a user must be obtained with informed consent prior to collecting that data.

Be honest and clear with your users around what you want to collect and why. The more limited the data set, the more likely the consent. If certain data sets are needed for "legitimate interest" purposes (where data isn't required by law but is of clear benefit to both parties, there's limited privacy impact on the individual and that they should reasonably expect you to use their data in that way), then ensure that only those fields required are actually required.

Security

Now you've asked consent for and collected the data, it's your responsibility to look after it. Ensuring it's kept secure, away from unauthorised eyes, is paramount for the user - who has trusted you with their data, and your trust rating with those that interact with you.

Strong passwords, encrypted data (either in transit or at rest), access controls, multi-factor authentication, whatever is appropriate for the sensitivity of the data stored or transmitted.

Minimisation

Reducing the data to only what you need is beneficial for a multitude of reasons. It's better for the environment (less data storage), it's better for your users (lower data footprint), it's lower risk for you if an access, deletion or portability request arrives, and it's lower cost (though not always for you of course).

This could be via the fields/parameters, rows/entries or the time/history of your data. Ensure you only store and therefore keep the data you need, purge and strip out erroneous or incorrect data, and only store it for the duration you actually need it for.

Third-parties

Including other people or entities into the equation complicates things. Was your consent valid? Is it necessary to share it with them? How would they secure and protect the data?

Multiple parties introduces multiple points of failure - be cautious when sharing your data.

Intended purpose vs actual usage

More applicable to larger scale applications or tools, but how you want your data to be used versus how it is actually used is important to note as well.

Data practices may perpetuate racial or ethnic biases that can lead to discrimination such as facial recognition trained predominately on white, western faces. Or algorithms that lead to discriminatory behaviour against gender and sexual minorities. Or insecure data practices that puts victims of domestic violence at risk by allowing abusers to access their information.

It's a priority to consider and address the potential impact on vulnerable groups when outlining and constructing data practices. This includes implementing robust privacy protections, conducting ethical data assessments, and considering the potential biases and consequences of data-driven decisions.

Education

Everyone in your organisation should be aware of their data, the data they collect or are responsible for, the potential for risks and how to mitigate them.

Training on best practices to prevent inadvertent breaches, how to manage and maintain data, what should and shouldn't be obtained, the compliance requirements for the applicable laws and regulations (e.g. GDPR, CCPA, etc) are all hugely important.

Building privacy by design into your workflows when dealing with data. What content implications are there? How should marketers build forms? How should developers implement features? How should designers consider data when crafting?

The age of exponential data growth

The digital age has bestowed upon us the power of data on an unprecedented scale. Even if the data you are collecting may seem small, collectively with all the other "small" samples of data this equates to a colossal volume of information stored online.

This power comes with an equally unparalleled responsibility. By embracing ethical data practices, transparency, consent, security, minimization, and education, we can harness this power for the betterment of society while safeguarding the rights and dignity of individuals. In this era of data abundance, ethical stewardship is not merely an option; it is an imperative that defines our path forward.

Data is a modern currency. Some might argue, more valuable than traditional fiat currencies. You wouldn't be flippant with large volumes of money or leave it lying around for anyone to pick up.

Built during lockdown
© MMVIII - MMXXIII
#1695412069