Blog

Home / Resources / Blog Post

What is a Data Lake

Written by Nexlogica Team

August 19, 2022


James Dixon described the data lake:

If you think of a data mart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

data lake is essentially a single data repository that holds all your data until it is ready for analysis, or possibly only the data that doesn’t fit into your data warehouse. Typically, a data lake stores data in its native file format, but the data may be transformed to another format to make analysis more efficient. The goal of having a data lake is to extract business or other analytic value from the data.

Data lakes can host binary data, such as images and video, unstructured data, such as PDF documents, and semi-structured data, such as CSV and JSON files, as well as structured data, typically from relational databases. Structured data is more useful for analysis, but semi-structured data can easily be imported into a structured form. Unstructured data can often be converted to structured data using intelligent automation.

Data lake vs data warehouse

The major differences between data lakes and data warehouses:

  • Data sources: Typical sources of data for data lakes include log files, data from click-streams, social media posts, and data from internet connected devices. Data warehouses typically store data extracted from transactional databases, line-of-business applications, and operational databases for analysis.
  • Schema strategy: The database schema for a data lakes is usually applied at analysis time, which is called schema-on-read. The database schema for enterprise data warehouses is usually designed prior to the creation of the data store and applied to the data as it is imported. This is called schema-on-write.
  • Storage infrastructure: Data warehouses often have significant amounts of expensive RAM and SSD disks in order to provide query results quickly. Data lakes often use cheap spinning disks on clusters of commodity computers. Both data warehouses and data lakes use massively parallel processing (MPP) to speed up SQL queries.
  • Raw vs curated data: The data in a data warehouse is supposed to be curated to the point where the data warehouse can be treated as the “single source of truth” for an organization. Data in a data lake may or may not be curated: data lakes typically start with raw data, which can later be filtered and transformed for analysis.
  • Who uses it: Data warehouse users are usually business analysts. Data lake users are more often data scientists or data engineers, at least initially. Business analysts get access to the data once it has been curated.
  • Type of analytics: Typical analysis for data warehouses includes business intelligence, batch reporting, and visualizations. For data lakes, typical analysis includes machine learning, predictive analytics, data discovery, and data profiling.

You can read more about Data Lake here.

Nexlogica has the expert resources to support all your technology initiatives.
We are always happy to hear from you.

Click here to connect with our experts!

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Related Articles

Nexlogica Won Box Hackathon for Good!

Nexlogica Won Box Hackathon for Good!

On September 28–30th, Nexlogica participated in the first Hackathon for Good to help out The Nature Conservancy. The Nature Conservancy’s content has exponentially grown as time progresses. As employees leave to continue on their career journey at other places, TNC is...

How will Artificial Intelligence Change IT Recruitment?

How will Artificial Intelligence Change IT Recruitment?

Artificial Intelligence (AI) is fundamentally reshaping the recruitment landscape. Consequently, it comes as no surprise that 43% of Human Resources professionals have already integrated it into their hiring procedures. The swift adoption of AI can be attributed to...

How Can AI Strengthen Cybersecurity?

How Can AI Strengthen Cybersecurity?

AI, a formidable force in modern technology, holds the potential to revolutionize the landscape of cybersecurity. While offering unprecedented capabilities, it also introduces significant considerations for security, privacy, and ethics. In this blog post, we will...

Pay with Your Eye and Face: The Benefits of PayEye Technology

Pay with Your Eye and Face: The Benefits of PayEye Technology

PayEye is a European fintech company with Polish roots and global reach, that has developed the world’s first commercial biometric glance payment service and express e-payeye payment for the e-commerce market. PayEye is not only a technology that allows biometric...

Stay Up to Date With The Latest News & Updates

Join Our Newsletter

Keep up to date with the latest industry news.

Follow Us

Lets socialize!