Exam Essentials

Describe the characteristics of relational data. Relational databases are data storage technologies that organize data into tables that can be linked based on data common to each other. Database tables store data as rows and are organized into a set number of columns. Relationships between tables allow users to easily query data from multiple tables at the same time. These databases also enforce schema integrity and maintain ACID rules, which makes them a good option for storing structured data. Relational databases are also critical for analytical workloads such as data warehouses because they structure data in a way that is easy to serve to reporting applications.

Describe the characteristics of nonrelational data. Nonrelational data is data that requires flexibility in the way it is stored. Semi-structured data such as JSON and XML and unstructured data such as images or videos are some examples of nonrelational data. NoSQL databases can be used to store data with constantly changing schemas without forcing it to conform to a fixed structure. This allows queries that write and read data from these databases to potentially perform much faster than a relational database. Object storage can be used to store unstructured data such as binary files that cannot be stored in a database.

Describe batch data. Batch processing is the practice of transforming groups, or batches, of data at rest. Data involved in batch processing solutions can come from any number of data stores, including relational, nonrelational, and files stored in object storage. Batch processing jobs can be scheduled to run at fixed time periods or when an event occurs, such as when a new transaction is added to a transactional data store. These jobs can process large amounts of data at a time and can be relied on to produce very accurate results since processing can take significant time to complete.

Describe streaming data. Stream data is a continuous flow of data from some source. Data is processed in real time as it arrives in stream processing solutions. Each piece of data in a stream is often referred to as an event or a message and typically arrives in an unstructured or semi-structured format such as JSON. Streaming data solutions begin by ingesting data from a source such as an IoT sensor into a real-time message engine system that queues data for live processing or into an object store for on-demand processing. A stream processing engine such as Azure Stream Analytics is then used to transform the data, typically by time window, and write the transformed data either to an analytical data store or directly to a dashboard. Stream processing is ideal for solutions that require real-time analytics and do not need to process a large amount of data at once.

Describe the concepts of data processing. Data processing is the methodology used to ingest raw data and transform it into one or more informative business models. Data processing solutions will either ingest data in batches or as a stream and can either store the data in its raw form or begin transforming it. Data can undergo several transformations such as being filtered, normalized, and aggregated before it is ready to be reported on. Data processing pipelines must include activities that are repeatable and flexible enough to handle a variety of scenarios.

Describe extract, transform, and load (ETL) and extract, load, and transform (ELT) processing. Extract, transform, and load (ETL) is a data processing technique that extracts data from various sources, transforms the data according to business rules, and loads it into a destination data store. Data transformation takes place in a specialized technology and includes multiple operations. Before data is loaded into production tables, the data is typically stored in staging tables to temporarily store it as it is being transformed.

Extract, load, and transform (ELT) is like ETL but differs from ETL workflows only in where the transformations occur. Instead of using a separate transformation engine, ELT workflows transform data in the target data store. Data that is stored as flat files in scalable object storage such as Azure Data Lake Store Gen2 is mapped to a schema in the destination data store. This schema-on-read approach allows the destination data store to perform the necessary transformations on the data without needing to duplicate data. In these scenarios, the destination data store is typically a massively parallel processing (MPP) technology such as Spark or Azure Synapse Analytics, which are capable of processing large amounts of data at a time.

Describe how analytics tell the story of a business’s past, present, and future. The maturity of an organization’s data analytics journey can be summarized by how well they are able to implement each category of analytics. From easiest to hardest, the analytics categories are descriptive, diagnostic, predictive, prescriptive, and cognitive. Descriptive analytics answer questions about what has happened, diagnostic analytics answer why things have happened, predictive analytics answer questions about what will happen, prescriptive analytics answer questions about what actions should be taken to achieve a target, and cognitive analytics derive inferences from existing data and patterns.

Describe data visualization techniques. Data visualization techniques can be broken down into three core categories: analytical, reporting, and dashboarding. Analytical tools allow users the ability to access and manipulate very granular levels of data. Data scientists can use these tools to create highly customized visualizations to display insights over several different scenarios. Reporting tools give analysts the ability to organize data into informational summaries to monitor how different areas of the organization are performing. Reports built with these tools can be either static or dynamic depending on how interactive analysts want their reports to be. Dashboarding tools provide quick overviews of the most relevant visuals to decision makers. Dashboards empower decision makers to quickly act on opportunities or threats as they arise.

Choosing the right type of infographic to display information is critical to the success of a data analytics solution. Poorly chosen visualizations can be hard to interpret or, worse, convey the wrong message. Another important aspect is the design of each visual. It’s not enough to choose the correct chart or graph for the job, but analysts must also be consistent with the aspect and color scheme for their visuals. This will help keep end users focused on any insights that are displayed rather than being overwhelmed by clashing color patterns and inconsistent sizing.

Categories

Archives

Exam Essentials – Core Data Concepts

Exam Essentials

Written by Deborah Lange

Leave a Reply Cancel reply