Big Data Vs Small And Wide Data: When Less Is More

Guest: Justin Cobbett (LinkedIn)
Company: Akamai (Twitter)
Show: TFiR: T3M

Gartner, Inc. predicts that by 2025, 70% of organizations will shift their focus from big data to small and wide data. That is just two years away. In this episode of TFiR: T3M, Akamai Product Marketing Manager Justin Cobbett shares his insights on the current state of data –big, small, and wide– and where the industry is headed.

Highlights of this video interview:

The ChatGPT model was trained using 570 gigabytes of data from books sources, Wikipedia, websites, and articles. Reddit changed its API because it wants to get paid for their information, which is used by the largest companies to train their language models.
Thousands and thousands of medical images and charts are currently being processed to be able to help doctors make a diagnosis.
Small data refers to datasets with less than 1,000 rows or columns. You’re dealing with megabytes rather than exabytes of data. The focus is more on analytical techniques that look for useful information.
Wide data is having a wide variety of sources, but the amount of data you might have for each individual may be very small. Patterns are analyzed based on that.
A study conducted by Harvard Business Review showed that having their coders train the AI on a small set actually improved larger models to begin with.

Benefits of small and wide data:

There are companies that are looking to get into the data game, but don’t have the budget, the capacity, or the expertise to do it. Small and wide data helps these enterprises and small/medium businesses.
There are pre-trained language models that have been compiled with deep learning. You can use something like ImageNet, which is essentially an open repository of images that are labeled and structured for zero-shot and one-shot learning.
It’s going to fill gaps, complement big data, and allow people who have never been able to even sniff the industry in the first place to get involved.
Good for use cases such as forecasting natural hazards for events that rarely occur, where little to no data exist, predicting the risk of disease in a population that does not have health records or a rare disease where you just don’t have a lot of information on it.
The analytical processes and philosophies that can go into small and wide data are going to provide a full picture of use cases and a lot more tools to work with.

What’s ahead for the industry:

Analysts will be able to do a lot more with some of the existing tools.
AI tools will evolve.
There are language models that have already been processed that you can buy or even obtain for free. Because these models are pre-trained, so computational use and GPU usage goes down and storage is next to nothing compared to anything for big data.
Databases or repositories like ImageNet have a lot that developers and researchers can use.
Gaps will be filled for areas where we just don’t have the large data.
The mindset will shift from getting as much data as you can get to getting the best data and how to use it efficiently with different models like a zero-shot and one-shot to get the most out of it.
Companies, especially those without the budget and computational power, will be getting a lot more for their money and will be able to make a bigger difference in their projects.
Contrary to the notion that AI will replace everyone, the human element is going to increase. AI is doing a lot of the grunt work to provide these deep learning models that are then available for smaller teams to adjust. People still have to show essentially when the systems are doing well and where they need to be reinforced. The algorithms have to be adjusted accordingly.
There will be a need for people who know what to do with the data they’re getting.
There will be a boom in terms of meeting people that are AI-friendly.
People who are able to work with things like ChatGPT are going to be more efficient than they would have been beforehand.
People who are able to help train these models and then use them to improve their jobs are going to be the future. For example, there is a medical coding study where the nurses and nurse practitioners were working along with these models to help train them.

This summary was written by Camille Gregory.

You may also like

What IBM’s acquisition of HashiCorp means for Terraform licensing

Apiiro wants to be the Diamond Standard for Application Security Posture Management

Akamai further fortifies its API Security with latest PCI DSS Compliance

If Iron Man has Jarvis, Transposit has Tanya

Bringing vCluster to Rancher is healthy competition: Lukas Gentele

How to choose best observability practices: Julian Fischer