A database is made up of all the information that a company receives. This information can arrive through multiple channels, like email, social media, sales, accounting, vendor data, etc. People don’t know what to do with all this information, so the best thing is to centralize it in one place to understand it linearly.
What is Big Data?
The concept of Big Data can be better understood if we first understand certain terms like the following:
- Unstructured data: It’s data that doesn’t have a defined structure, it’s the information that isn’t structurally organized.
- Structured data: It’s data that is designed, organized or structured to be better understood.
- Database: It’s the organized set of structured information or data. Generally, it works with database software or engines that allow to store, process and extract information from the database.
- Cloud computing: It’s a set of computing services available on the Internet that are offered by a company (the main ones are Amazon, Google, Microsoft). This allows our data to be stored in the cloud, where it’s processed.
- Data warehouse: It’s a large data warehouse of extreme amounts of information.
- Machine learning: When computers are able to learn patterns to forecast with them.
Big Data is a large volume of information from different sources, with different structures. This information arrives at a fast speed of changes, of new data that arrives constantly, which makes it almost impossible to analyze or process using traditional means. To manage this, we need a lot of computing power, that is to say, cloud computing, which gives us more power at a lower cost.
The 5 V’s of Big Data
1. Volume: That there’s a large amount of information difficult to process using traditional means.
2. Variety: That data is different and comes from multiple sources.
3. Velocity: That data constantly changes and new data arrives permanently.
4. Veracity: Identifying which data is true and reliable, and which isn’t.
5. Value: Knowing how to determine how important this information is based on the goal that the company wants to achieve.
ETL (Extract, Transform, Load) process: The way by which all data is managed.
- Extraction: Phase in which data is extracted or captured from all sources and it’s centralized.
- Transformation: Phase in which data is standardized, that is to say, cleaning data so that it looks like it comes from a single source.
- Load: Loading or saving information.
Why is data so important?
To answer this question, let’s see the data life cycle:
- Capture: First, we have to enter or capture data that comes from different sources.
- Storage: Second, once we go through the ETL process, our data is organized and saved or stored in one place.
- Data processing and analysis: It consists of finding patterns taking into account what we are looking for.
- Exploration and visualization: Showing the information so that decision-makers can make decisions considering already processed data.
So, data is important because it displays users’ behaviors, trends, forecasts, etc., and taking data into account, decisions can be taken finding solutions to social or business problems.
Thank you very much for getting here. We hope this material has been useful! If so, don’t forget to share the blog with your colleagues, like the post on social media or comment on your return.