Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing.
Four Application of Deep learning for Big Data:
Deep learning, with artificial neural networks at its core, is a new and powerful tool that can be used to derive value from big data. Most of the data today is unstructured, and deep learning algorithms are very effective at learning from, and generating predictions for, wildly unstructured data. Following are several ways deep learning is being applied to big data sets.
- Text Classification and Automatic Tagging
Deep learning architectures including Recurrent Neural Networks and Convolutional Neural Networks are used to process free text, perform sentiment analysis and identify which categories or types it belongs to. This can help search through, organize and make use of huge unstructured datasets.
- Automatic Image Caption Generation
Deep learning is used to identify the contents of an image and automatically generate descriptive text, turning images from unstructured to structured and searchable content. This involves the use of Convolutional Neural Networks, in particular very large networks like ResNet, to perform object detection, and then using Recurrent Neural networks to write coherent sentences based on object classification.
- Deep Learning in Finance
Today, most financial transactions are electronic. Stock markets generate huge volumes of data reflecting buy and sell actions, and the resulting financial metrics such as stock prices. Deep learning can ingest these huge data volumes, understand the current market position and create an accurate model of the probabilities of future price movements.
However deep learning is mainly used for analyzing macro trends or making one-time decisions such as analyzing the possibility of company bankruptcy; it is still limited in its ability to drive real-time buying decisions.
- Deep Learning in Healthcare
Deep learning, particularly computer vision algorithms, are used to help diagnose and treat patients. Deep learning algorithms analyze blood samples, track glucose levels in diabetic patients, detect heart problems, analyze images to detect tumors, can diagnose cancer, and are able to detect osteoarthritis from an MRI scan before damage is caused to bone structures.
Key Challenges of Deep Learning and Big Data
While deep learning has tremendous potential to help derive more value from big data, is still in its infancy, and there are significant challenges facing researchers and practitioners. Some of these challenges are:
- Deep learning needs enough quality data: as a general rule, neural networks need more data to make more powerful abstractions. While big data scenarios have abundant data, the data is not always correct or of sufficiently high quality to enable training. Small variations or unexpected features of the input data can completely throw off neural network models.
- Deep learning has difficulty with changing context: a neural network model trained on a certain problem will find it difficult to answer very similar problems, presented in a different context. For example, deep learning systems which can effectively detect a set of images can be stumped when presented by the same images, rotated or with different characteristics (grayscale vs. color, different resolution, etc.).
- Security concerns: deep learning needs to train and retrain on massive, realistic datasets, and during the process of developing an algorithm, that data needs to be transferred, stored, and handled securely. When a deep learning algorithm is deployed in a mission-critical environment, attackers can affect the output of the neural network by making small, malicious changes to inputs. This could change financial outcomes, result in wrong patient diagnosis, or crash a self-driving car.
- Real-time decisions: much of the world’s big data is streamed in real time, and real time data analytics is growing in importance. Deep learning is difficult to use for real time data analysis, because it is very computationally intensive. For example, computer vision algorithms went through several generations over the course of two decades, until they became fast enough to detect objects in a live video stream.
- Neural networks are black boxes: organizations who deal with big data need more than just good answers. They need to justify those answers and understand why they are correct. Deep learning algorithms rely on millions of parameters to reach decisions and it is often impossible to explain “why” the neural network selected one label over another. This opacity will limit the ability to use deep learning for critical decisions such as patient treatment in healthcare or large financial investments.
Future work on deep learning in big data analytics:
In the prior sections, we discussed some recent applications of Deep Learning algorithms for Big Data Analytics, as well as identified some areas where Deep Learning research needs further exploration to address specific data analysis problems observed in Big Data. Considering the low-maturity of Deep Learning, we note that considerable work remains to done.
In this section, we discuss our insights on some remaining questions in Deep Learning research, especially on work needed for improving machine learning and the formulation of the high-level abstractions and data representations for Big Data. An important problem is whether to utilize the entire Big Data input corpus available when analyzing data with Deep Learning algorithms. The general focus is to apply Deep Learning algorithms to train the high-level data representation patterns based on a portion of the available input corpus, and then utilize the remaining input corpus with the learnt patterns for extracting the data abstractions and representations.
Deep Learning has an advantage of potentially providing a solution to address the data analysis and learning problems found in massive volumes of input data. More specifically, it aids in automatically extracting complex data representations from large volumes of unsupervised data. This makes it a valuable tool for Big Data Analytics, which involves data analysis from very large collections of raw data that is generally unsupervised and un-categorized. The hierarchical learning and extraction of different levels of complex, data abstractions in Deep Learning provides a certain degree of simplification for Big Data Analytics tasks, especially for analyzing massive volumes of data, semantic indexing, data tagging, information retrieval, and discriminative tasks such a classification and prediction.
Future works should focus on addressing one or more of these problems often seen in Big Data, thus contributing to the Deep Learning and Big Data Analytics research corpus. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
Big data is the fuel of the modern economy. It is used by the world’s biggest companies to provide products and services that are changing the face of society. The digital economy, digital lifestyle, mobile devices and applications that have become an inseparable part of daily life, are all driven by big data.