Data Minimization: The New Challenge for Data Architectures

Netflixing your data: What is data minimization, what influence does it have on data architecture and how does data virtualization enable you to reduce redundant data ?

24 February 2022 (9-17h CEST)
Location: Live Online Event (@YOUR DIGITAL WORKPLACE)
Presented in English by Rick van der Lans
Price: 640 EUR (excl. 21% VAT)

This event is history, please check out the NEXT SESSION

Check out our related in-house workshops:

Google BigQuery in Practice (INHOUSE WORKSHOP - On Request)
Apache Spark Hands-On Training (In-Company) (INHOUSE WORKSHOP - On Request)
Het Logisch Datawarehouse - Architectuur, Ontwerp en Technologie (INHOUSE WORKSHOP - On Request)
Business Intelligence en Datawarehousing Fundamentals (INHOUSE WORKSHOP - On Request)
The Hadoop Ecosystem (INHOUSE WORKSHOP - On Request)
Big Data Oplossingen voor BI (INHOUSE WORKSHOP - On Request)
Lean Business Analyse (INHOUSE WORKSHOP - On Request)
Minimum Viable Products (MVPs) Demystified (INHOUSE WORKSHOP - On Request)
Data Vault in a Day (INHOUSE WORKSHOP - On Request)

Learning Objectives

Why do we Organise this One-day Seminar on Data Minimization ?

Data minimization must be a dominant guiding principle for any new data architecture. When data minimization is applied, architects strive for reducing the number of data copies.

Studies indicate that quintillions of data are being produced every day. However, most of it is not new or original data, but copied data. For example, data about a specific customer can be stored in a transactional system, a staging area, a data warehouse, several data marts, and in a data lake. Even within one database, data can be stored several times to support different data consumers. Additionally, redundant copies of the data are stored in development and test environments. But business users also copy data. They may have copied data from central databases to private files and spreadsheets. The growth of data is enormous within organizations, but a large part consists of non-unique, redundant data. Also, data infrastructures are currently made up of data lakes, data hubs, data warehouses, and data marts. And all these systems contain overlapping data. Additonally, organizations exchange data with each other. Often, the receiving organizations store the data in their own systems, resulting in even more copies of the data.

And now with new regulations concerning data privacy, such as GDPR, architects must be even more careful with storing copies of personally identifiable data. The more copies, the more complex conforming to these regulations is.

In the old days, several reasons existed to create data copies. But database server performance, cloud technology, and network speed have improved enormously, making it needless in most cases. Unfortunately, new data architectures are still designed in which data is stored redundantly. We think too casually about copying data and storing it redundantly. We create redundant data too easily, and this unrestrained duplication must stop. Copying data has many drawbacks and challenges:

Higher data latency
Missed opportunities
Complex data synchronization
More complex data security
More complex data privacy
Higher development costs
Higher maintenance costs
Higher technology costs
More complex database administration
More complex metadata administration
Reduced data quality

Rick explains why data minimization is important:

This is where data minimization comes in. It is a recommended design principle for any new data architecture. Data minimization rests on two pillars. Firstly, data-on-demand is preferable to data-by-delivery. Users must be able to access data when they require it without unnecessary needs to create copies of the data. Secondly, accessing original data is preferable to accessing copied data.

During this seminar, Rick van der Lans explains how you can work towards a data-on-demand architecture and with which solutions and technologies this becomes a reality. He will discuss, among other things, what data minimization is, what influence it has on data architectures, and how data virtualization enables you to reduce redundant data.

This masterclass is a LiveOnline event. This means that there will be an expert speaker in a virtual meeting room along with you and other highly interested participants. We will make your learning experience as immersive and interactive as we have done in the past 25+ years, but now in a live, online environment. Besides answering your appetite for knowledge and your questions, we will stimulate the interactivity between the speaker and the participats, and between participants.

Learning Objectives: What will you learn in this masterclass ?

This masterclass has various modules:

How the design principle called data minimization is related to simpler data architectures
What the two pillars of data minimization mean: data-on-demand and accessing original data
What the real drawbacks are of creating too many copies of the data are, including higher data latency, complex data synchronization, more complex data security and privacy, and higher development and maintenance costs
How new database, integration, and cloud technology can help to design simpler data architectures that contain less copied data
What the effect is of applying data minimization to data warehouse and data lake architectures
How managed-file-transfer can be replaced by data-on-demand, and how the number of data flows between organizations can be reduced
How data architectures should be designed from the perspective of data processing specifications and not data stores

Who should attend this masterclass ?

This one-day masterclass is aimed at everyone who is working with data storage, architecture and processing in your organisation, and are dreaming of a data-on-demand architecture.

Your role can range from Chief Data Officers (CDO) and technology planner to ICT and enterprise architect, and from data analyst, data warehouse designer, data architect, solution architect, to data engineer, data scientist and data consultant.

Full Programme

Presented by Rick van der Lans

8.45h - 9.00h

We welcome the participants in our digital lobby (Zoom waiting room)

9.00h

Introduction to this masterclass

1. Introduction

What is data minimization?
The influence of data minimization on data architectures
Pillars of data minimization
From data-by-delivery to data-on-demand
From copied data to original data
Reasons why data minimization is important
Risks of unrestrained copying and repeated storage of data
The business advantages of data minimization

2. New technologies can simplify data architectures

Analytical SQL database servers and their distributed, share-based architecture
Translytical database servers: combining transactions and analysis
Cloud technology offers the required stability and centralisation of data
Data virtualization enables reduction of redundant data
Messaging technology

3. Applying data minimization to current data architectures

From traditional data warehouse architectures to logical data warehouse architectures
From physical data lake with zones and tiers to virtual data lakes
From data lakehouses to logical data lake houses
From data fabrics to logical data fabrics
How to transform a managed-file-transfer solution used between organizations to a query-based architecture?
Keeping track of data history only once
The impact of data minimization of data privacy aspect

12.45h - 13.45h

Break for (Bring your own) Lunch

4. Data track diagrams for designing data architectures

What are data track diagrams?
Designing a data architecture based on data processing specifications
From data track diagrams to data minimization
Do not design from a database-centric point of view

5. From data-by-delivery to data-on-demand

Disadvantages of data exchange using files (data-by-mail)
Advantages of data-on-demand
Accessing geographically dispersed data sources
What can we learn from Netflix?

6. Closing Remarks

General recommendations for implementing data minimization
'Youtubing' your data

17.00h

End of this Masterclass, Opportunity to continue the conversation and Q&A until 17.30h

Speakers

Rick van der Lans (R20/Consultancy BV)

Rick van der Lans is a highly-respected independent analyst, consultant, author, and internationally acclaimed lecturer specializing in data architectures, data warehousing, business intelligence, big data, and database technology. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com.

He has presented countless seminars, webinars, and keynotes at industry-leading conferences. For many years, he has served as the chairman of annual Data Warehousing and Business Intelligence Summit in The Netherlands.

Rick helps clients worldwide to design their data warehouse, big data, and business intelligence architectures and solutions and assists them with selecting the right products. He has been influential in introducing the new logical data warehouse architecture worldwide which helps organisations to develop more agile business intelligence systems.

Over the years, Rick has written hundreds of articles and blogs for newspapers and websites and has authored many educational and popular white papers for a long list of vendors. He was the author of the first available book on SQL, entitled including Introduction to SQL, which has been translated into several languages with more than 100,000 copies sold. Recently published books are Data Virtualization for Business Intelligence Systems and Data Virtualization: Selected Writings.

He presents seminars, keynotes, and in-house sessions on data architectures, big data and analytics, data virtualization, the logical data warehouse, data warehousing and business intelligence.

Questions about this ? Interested but you can't attend ? Send us an email !