What is Big Data and A.I?

The term big data refers largely to the massive amounts of information and data that has come with the growth of the internet, and with the growth of the mobile internet in particular.

To understand big data, you really need to understand databases, and the relationship between information and how it is used.

What is a Database

Imagine that you own a business which employs say 10 to 15 people, and when each one joined the company they will fill in an application form that details say  30 / 40 specific pieces of information about themselves.

This information would be their surname, christian names, date of birth, place of birth, previous jobs, start date,  qualifications, skills etc.

You would probably also want to record information such as their starting salary, additional payments, date of annual appraisals, pension contributions etc.

Once you have collected all the application forms for all the employees,  you would want to have a system where you could record it all and access any of the information whenever you needed it.

The easiest way probably would be to set up a spreadsheet, where you allocate a row to each individual employee, and fill out each piece of information in a particular cell going along the row.

Once you had done this for all employees, you would have 10 to 15 rows of information going a long spreadsheet,  and maybe 30 or 40 columns going down the spreadsheet which gave you the collected information for the specific areas for all the different employees.

That quite simply is a database.

Databases have been around pretty much since paper was invented, but have only really become significant with the advance of computational power, firstly with mainframes and lastly with PCs and the internet, and currently through cloud computing.

A database can be the information collected by a community organisation with three or four members,  or a massive multinational with hundreds of thousands of employees scattered across the globe.

The common factor in most databases is that you have very specific areas of information, which can be stored in very logical ways, itemised and analysed by virtue of  their field or category.

Growth of Big Data

Many experts claim that 90% of all the information available in the world today has been generated in the last two years (@2018). Whilst this is a difficult claim to verify, it is most likely true that somewhere near this figure is probably reasonably accurate. The growth in online information has come about through the massive expansion of the mobile Internet, and the different types of data that have been produced.

Big Data Types

When people talk about big data. what they are really referring to is the information that has been generated on smart phones, desktop computers, trading platforms, different learning machines. the various types of programs that have generated big data include blogs,  video sharing platforms, social networks, podcasts etc.

The sheer volume of these combined posts and tweets and webpages is almost too big to comprehend any meaningful level.

Big Data Analysis

Big data is not simply about the sheer volume of data and information that is generated at the moment (2018)  it is also about how this information can be stored used and analysed.

Aside from huge privacy issues, there are real questions about who has access to this information and what it can be used for.

Companies want to use it to be able to target individuals specifically for advertising and products, governments want to use it for a range of different purposes, some probably more devious than others.

The problem from an analysis point of view, is that the information generated by way of social networks and tweets etc does not fit into a traditional database as outlined above.

This has meant that the manipulation of data to generate extra focus  is virtually impossible. This means that other ways have had to be found to analyse information in order to be able to use it as other people see fit.

Internet of Things

What ever the accurate figure is as to the level of information that has been generated today in 2018, it is going to be dwarfed by the amount of information that will be generated over the next five or 10 years with the massive growth of the Internet of things, more clearly explained here.

The significance of the Internet of Things in relation to big data  is that it seems to be open season for virtually everything related to an individual’s life to be made wireless, so that companies and governments can get access to the information about how people live their lives.

This presents huge issues not only in terms of privacy, but also in terms of security. The more that people’s homes, cars, clothes, wearables, pets etc are connected to each other and to the Internet, the more at risk they are of some type of cybercrime, and the more need there is for some type of cyber security program and some type of cyber insurance to cover the risk.

The Four V’s of Big Data

Quite often reference is made to what are known as the Four V’s of big data. These are most commonly volume, variety, velocity and  voracity.

Volume refers to the sheer scale of data and information that is generated minute by minute across the globe.

Variety refers to the different types of data and information that are generated, from audio to video to written, with the advent of virtual worlds and 3-D world’s this could change significantly.

Velocity refers to the sheer speed at which this information is generated, and the problems in terms of analysing it that are relevant to that.

Veracity refers largely to the accuracy of the information or data that is produced. Given that companies and governments want to rely on this information in order to analyse it, there are real difficulties and problems in terms of verifying how accurate it is.

Hadoop

Hadoop  is an open source software system, run by Apache, that is effectively the current de facto way of analysing  big data.

What it essentially does is to break the data down into significantly smaller chunks, direct these chunks to a wide range of different computers which can analyse it efficiently, and then these computers send back the results to Hadoop, which collects it and generates the finished analysis.

Machine Learning and Artificial Intelligence

Machine learning and artificial intelligence are often linked to big data, because it is recognised that it is virtually impossible for any human to effectively be able to analyse and make sense of the data has been produced.

This has given free rein to companies to produce some type of process of  artificial intelligence which can analyse and make sense of  big data. The implications from a societal point of view, and from a privacy point of view, are pretty terrifying to a lot of people, but there seems to be no appetite by any government or organisation to really try and put some sort of break on it.

The supposedly benefits of artificial intelligence are sold as being a legitimate reason for developing it at breakneck speed, with examples given such as Netflix and Amazon, and governments or cities ability to use data to improve public services within those cities.

These claims are at best probably highly dubious, and give credibility to the speed with which this whole process is taking place. The issue of security and privacy seems to be completely ignored or marginalised, with those who raise them being looked at or talked about as almost Luddite.

It may well take some major catastrophe in terms of cyber security to wake people up to the reality of what is happening, and the inherent risks associated with this breakneck speed approach to technological change and advancement.

 

 

Comments are closed.