Understanding data formats, data types/structures, and how data types are identified in fields/values

Azia ReedAzia Reed
7 min read

Table of contents

Interested in finding out more about the different types of data? The best approximation of how much data is collected every day is roughly 2.5 quintillion bytes. That’s a vast variety of different data types, structures and many, many ways to collect that data depending on its use or purpose. Although that may seem overwhelming, several great learning platforms help people, just like you and me, to learn this information. You may be an aspiring Data Analyst, Data Scientist, or just a data fanatic! Whatever your reason for the many platforms that cover this information the Google Data Analytics Certificate via Cousera does an excellent job of putting into scope what you need to know as a Data Analyst. Google offers the Google Data Analytics Certificate where individuals can get started to learn “in the high-growth field of Data Analytics” that they can earn a professional certificate. The great thing I particularly love about this Certificate is that it is composed of 8 courses each focusing on a different aspect of Data Analysis tools needed to thrive in an ever-growing data market. You will receive a Certification for each course you complete and once you finish all 8 courses, you will receive your Google Data Analytics Certificate, and that's just the start! They also give you access to career resources with the ability to share your resume across local and national employers.

I am currently taking the Google Data Analytics Certificate myself - on the Prepare Data for Exploration Course 3. I wanted to share a bit on what I am currently learning through this course as it may inspire or give someone the confidence and information, they need to pursue this Certificate. I will follow the way the Google Data Analytics course introduces learning concepts but will not go into depth with everything covered as this is just to gain familiarity and basic knowledge of the topics discussed.

Each course is broken down into weeks - each course covering about 4-5 weeks on average. This blog will cover Week 1 of Prepare Data for Exploration. This week, the topics focus on collecting data, different data formats, data types and structures, as well as how data types are identified in fields and values.

Collecting Data Just as a refresher, data is defined as facts and statistics collected together for reference or analysis. For collecting data, it is important to know all of the various places data is stored and where it can be collected for use. A few data sources include: -First party data: data collected by an individual or group using their own resources. For example, if you work for an ice cream company and are tasked with a question: how much ice cream does your company sell? You can look at the data sources provided by your company because only your company would have access to that data (information). -Second Party Data: data collected by a group directly from its audience and then sold. For example, let’s say you work as a TV host. Company A comes to you and says: “We want you to ask your audience, who likes pizza? and who likes cheeseburgers?”. Then you are told to collect that data (via a survey) and sell that data to Company A. -Third Party Data: data collected from outside sources who did not collect it directly. For example, in the previous example I mentioned how Company A collected data from the TV host and its audience. Because Company A did not collect the data themselves and the data did not come from their own company, it's considered third party data.

As you will notice while studying most terminologies in data (tech filed), you will notice that many definitions are antonyms of each other. So, if you can remember one definition for a concept, it will be easier for you to remember the definition of the other concept. I hope this helps, but we will start to see examples of what I mean in just a bit.

Different data formats and structures Data formats are pretty much split into two categories of classification: Quantitative and Qualitative. To help me remember the difference between the two I highlight the beginning of each word - for Quantitative and Qualitative each give a hint to their meaning. Quantitative = quantitative and Qualitative = quality. Quantitative data: can be counted or measured; you can break it down even further: Discrete data: is data that is counted and has a limited number of values. For example, because there is no monetary value between 1 and 2 cents, the data is limited. Where continuous data is data that is measured and can be close to any numeric value. For example, let’s use the movie Resident Evil. If Resident Evil’s running time is 110.0356 minutes, then it is continuous because after whole minutes, its able to calculate seconds and it can be broken down into a smaller component. On the other hand, Qualitative data: is data that cannot be measured or counted; you can break it down even further: Nominal data: is a type of data that is categorized without a set order. For example, asking people Did you like the Resident Evil movie? Yes, or no?” and can you guess what Ordinal data would be defined as? You guessed it! Ordinal data: is a type of data that is categorized with a set order or scale. For example, asking peopleOn a scale of 1-5, how much did you enjoy the Resident Evil movie?”. A few more data formats discussed in the course are Internal and External Data. Internal data: data that lives in their own company’s systems and External data: is data that lives or is generated outside of an organization/company. Structured data is defined as data that is organized in a row and column such as spreadsheet or table where unstructured data is the exact opposite which has no easily identifiable structure. To touch a bit more on structured data, this data type is typically depicted as a data model: organizing data elements and how they relate to one another. Data elements: are simply the pieces of information that goes into a model. For example, people’s names, addresses, bank information, etc.

How data types are identified in fields and values There are a variety of data types that help any data professionals correctly identify what data is needed for what task. Data type: is a specific kind of data attribute that tells what kind of value the data is. Most of the data types discussed are in spreadsheets because it is a Google learning platform, but much of what you will learn is transferable to other platforms such as Microsoft Excel. The Data types mentioned are: -Booleans: data type with only 2 possible values. For example, Yes or No and True or False. -Number -Text or string: sequence of characters/punctuations that contains textual information. When working with data tables, it's important to remember that rows = records and columns = fields as well as attributes: a piece of information which determines the properties of a field or tag in a database or a string of characters in a display. In other words, it's the title at the top of each column in a table.

Lastly, we covered the difference between wide data and long data. Wide data: is data where every data subject has a single row with multiple columns to hold the values of various attributes of the subject. Long data: is the opposite and describes how each subject will have data in multiple rows. The easiest way I like to think about it is: Wide data continues across the row meaning there are less rows and long data continues down the column meaning there are less columns.

These were the main concepts covered in the week 1 of the 3rd course in the Google Data Analytics Certificate. I hope covering these topics and breaking them down helps anyone currently enrolled in the program. I also hope that any prospective students who were considering enrolling will definitely take advantage of this great professional certificate program.

0
Subscribe to my newsletter

Read articles from Azia Reed directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Azia Reed
Azia Reed