Category: Big Data

What is Big Data and A.I?

The term big data refers largely to the massive amounts of information and data that has come with the growth of the internet, and with the growth of the mobile internet in particular.

To understand big data, you really need to understand databases, and the relationship between information and how it is used.

What is a Database

Imagine that you own a business which employs say 10 to 15 people, and when each one joined the company they will fill in an application form that details say  30 / 40 specific pieces of information about themselves.

This information would be their surname, christian names, date of birth, place of birth, previous jobs, start date,  qualifications, skills etc.

You would probably also want to record information such as their starting salary, additional payments, date of annual appraisals, pension contributions etc.

Once you have collected all the application forms for all the employees,  you would want to have a system where you could record it all and access any of the information whenever you needed it.

The easiest way probably would be to set up a spreadsheet, where you allocate a row to each individual employee, and fill out each piece of information in a particular cell going along the row.

Once you had done this for all employees, you would have 10 to 15 rows of information going a long spreadsheet,  and maybe 30 or 40 columns going down the spreadsheet which gave you the collected information for the specific areas for all the different employees.

That quite simply is a database.

Databases have been around pretty much since paper was invented, but have only really become significant with the advance of computational power, firstly with mainframes and lastly with PCs and the internet, and currently through cloud computing.

A database can be the information collected by a community organisation with three or four members,  or a massive multinational with hundreds of thousands of employees scattered across the globe.

The common factor in most databases is that you have very specific areas of information, which can be stored in very logical ways, itemised and analysed by virtue of  their field or category.

Growth of Big Data

Many experts claim that 90% of all the information available in the world today has been generated in the last two years (@2018). Whilst this is a difficult claim to verify, it is most likely true that somewhere near this figure is probably reasonably accurate. The growth in online information has come about through the massive expansion of the mobile Internet, and the different types of data that have been produced.

Big Data Types

When people talk about big data. what they are really referring to is the information that has been generated on smart phones, desktop computers, trading platforms, different learning machines. the various types of programs that have generated big data include blogs,  video sharing platforms, social networks, podcasts etc.

The sheer volume of these combined posts and tweets and webpages is almost too big to comprehend any meaningful level.

Big Data Analysis

Big data is not simply about the sheer volume of data and information that is generated at the moment (2018)  it is also about how this information can be stored used and analysed.

Aside from huge privacy issues, there are real questions about who has access to this information and what it can be used for.

Companies want to use it to be able to target individuals specifically for advertising and products, governments want to use it for a range of different purposes, some probably more devious than others.

The problem from an analysis point of view, is that the information generated by way of social networks and tweets etc does not fit into a traditional database as outlined above.

This has meant that the manipulation of data to generate extra focus  is virtually impossible. This means that other ways have had to be found to analyse information in order to be able to use it as other people see fit.

Internet of Things

What ever the accurate figure is as to the level of information that has been generated today in 2018, it is going to be dwarfed by the amount of information that will be generated over the next five or 10 years with the massive growth of the Internet of things, more clearly explained here.

The significance of the Internet of Things in relation to big data  is that it seems to be open season for virtually everything related to an individual’s life to be made wireless, so that companies and governments can get access to the information about how people live their lives.

This presents huge issues not only in terms of privacy, but also in terms of security. The more that people’s homes, cars, clothes, wearables, pets etc are connected to each other and to the Internet, the more at risk they are of some type of cybercrime, and the more need there is for some type of cyber security program and some type of cyber insurance to cover the risk.

The Four V’s of Big Data

Quite often reference is made to what are known as the Four V’s of big data. These are most commonly volume, variety, velocity and  voracity.

Volume refers to the sheer scale of data and information that is generated minute by minute across the globe.

Variety refers to the different types of data and information that are generated, from audio to video to written, with the advent of virtual worlds and 3-D world’s this could change significantly.

Velocity refers to the sheer speed at which this information is generated, and the problems in terms of analysing it that are relevant to that.

Veracity refers largely to the accuracy of the information or data that is produced. Given that companies and governments want to rely on this information in order to analyse it, there are real difficulties and problems in terms of verifying how accurate it is.

Hadoop

Hadoop  is an open source software system, run by Apache, that is effectively the current de facto way of analysing  big data.

What it essentially does is to break the data down into significantly smaller chunks, direct these chunks to a wide range of different computers which can analyse it efficiently, and then these computers send back the results to Hadoop, which collects it and generates the finished analysis.

Machine Learning and Artificial Intelligence

Machine learning and artificial intelligence are often linked to big data, because it is recognised that it is virtually impossible for any human to effectively be able to analyse and make sense of the data has been produced.

This has given free rein to companies to produce some type of process of  artificial intelligence which can analyse and make sense of  big data. The implications from a societal point of view, and from a privacy point of view, are pretty terrifying to a lot of people, but there seems to be no appetite by any government or organisation to really try and put some sort of break on it.

The supposedly benefits of artificial intelligence are sold as being a legitimate reason for developing it at breakneck speed, with examples given such as Netflix and Amazon, and governments or cities ability to use data to improve public services within those cities.

These claims are at best probably highly dubious, and give credibility to the speed with which this whole process is taking place. The issue of security and privacy seems to be completely ignored or marginalised, with those who raise them being looked at or talked about as almost Luddite.

It may well take some major catastrophe in terms of cyber security to wake people up to the reality of what is happening, and the inherent risks associated with this breakneck speed approach to technological change and advancement.

 

 

What does Big Data and Predictive Analysis Really Mean ?

One of the problems with understanding Big Data, is that the term itself means different things to different people !

Some of the most common questions asked are :

 – What is Big Data all about?

 – What is the Big Data Market?

 – Why is it important to use Data?

 – What is Strong Data?

 – What si Big Data Used For

There are however two main areas of the term that most people recognise, and in reality cover the present reality. It is important to realize that we are really only just at the beginning of what Big Data means, what the implications are and its relationship to Artificial Intelligence.

The two main areas referred to above are the amount of information posted by and about people online, and the amount of information made available by governments, business’s etc. The scale of the amount of information that is posted online is almost too big to quantify.

What is important to realize is that this type of information, i.e. blog posts, social media posts, videos etc don’t fit into the traditional format of a database, and as such cannot be analysed in the same way. In relation to Big data, this mean new ways have to be found to both store and analyse this information.

In terms of governmental information, Big data is at its simplest just that. Huge amounts of data/information that are produced by governments, businesses and other organisations, some of which is made public, some of which is kept private.

The issues around big data are complex and varied. The primary concerns have to be those of privacy and cyber insurance/cyber security.

The sheer volume of big data, however you may come to define that term, means that a significant number of different people and different networks will be involved in processing and using the information. The privacy issues of big data are significant.

However much the information is anonamized, the implications regarding the data being hacked and personal information on individuals being gathered is significant. Identity theft is a significant cyber insurance and cyber security issue, yet one that remains under the radar for many people.

In most instances, cyber security has at its core the issue of a data breach occurring, and the implications thereof. Given the sheer fact that big data implies a huge increase in the volume of data being processed, both structured and unstructured data, the number of servers and networks involved also going to be significantly increased.

The logic therefore dictates that the risk of cyber security threats that would apply to one network within a company or organisation will be multiplied many times over.

Big Data – Government

Most governments in the West actively encourage the release of big data relating to many areas of government and life generally. This in part is because governments believe it shows an openness in their storage of information, which may be true or not, and because it allows an unprecedented level of predictive analysis of trends and behaviour in society generally.

The US government website has huge amounts of big data available relating to a wide range of areas, listed below.

Agriculture

Climate

Consumer

Ecosystems

Education

Energy

Finance

Health

Local Government

Manufacturing

Maritime

Oceans

Public Safety

Science and research

BIG DATA and BUSINESS

Whether or not many businesses want to get involved with big data, often they do not really have a choice.  It is more an issue of how they analyse and use the data that is flowing through them, both to enhance their business and also to promote their industry.

The issue is really about how to make sense of the huge volumes of data in ways that benefit their company as opposed to being overwhelmed by it.

Predictive Analysis

Predictive analysis is the phrase that has given to the manipulation of data into formats and charts that make sense of the information in a way that is useful. Predictive analysis of the data as to add value to any organisation, government or business.

It has to help them understand potential future trends, both in their underlying business as well as consumer or citizen habits. Predictive Analysis will to a large extent also show likely developments in individuals lifestyles and behaviours, as part of a wider pattern.

Such analysis will inevitably mean more information being gathered from consumers/ordinary people in order that predictive analysis have some meaning. This inevitably raises even more privacy concerns and cyber security threats and need for cyber insurance planning.

BIG DATA and ARTIFICIAL INTELLIGENCE

The whole area of artificial intelligence is relatively new, but one being heavily invested in by the major tech companies. The aim behind a lot of artificial intelligence research is to allow it to automatically analyse and manipulate the data by itself, without the need for human intervention.

The growth of artificial intelligence and robotics is one that will profound effect on the issue of big data and how it is used. The cyber security implications need to be part of any process of form regarding the storage, usage and predictive analysis of the data by whoever is storing it.

BIG DATA and HADOOP

Hadoop is an open source structure that can be used to store and manipulate sets of big data. Hadoop acts as a system that allows it to monitor clusters of computers to allocate types and amounts of different  jinformation in the most efficient manner possible.