Knowledge Graphs and Data Governance
/When I first heard about knowledge graphs within Data Governance, I found it a really hard concept to grasp and it felt like stepping into uncharted territory. I think what was difficult was trying to understand how the abstract idea of knowledge graphs could translate into real-world benefits in the work we do with data in Data Governance specifically. Now, after some great discussions with Ed Mathia (one of my expert Guest Coaches, who is an expert on this topic) I can safely say I am in a much better place to talk about the importance of knowledge graphs in Data Governance - and I think it’s a really important topic for others working in Data Governance to grasp too.
So, having been inspired by this topic cropping up in one of my regular monthly sessions with my associates and expert guest coaches, let’s now have a closer look at knowledge graphs and Data Governance in this blog.
What is a knowledge graph?
Generally, a knowledge graph is a knowledge-base (facts about the world) that is stored in a graph structure (not a table), that ensures computers can manipulate data based on its meaning. It is a powerful tool for organising and representing data, focusing on how different data points are connected. It allows users to easily visualise relationships and hierarchies within data, offering a more interconnected and insightful view of information.
However, there are two more specific meanings graphs:
The first meaning of a graph refers to the underlying data structure, where the emphasis is on how data points are related. This version is often used in business contexts, where people rely on graphs to make sense of interconnected data. For example, a metadata graph (like a Data Catalog) can show how different data tables are connected or how one system feeds into another.
The second meaning, called a knowledge graph, was introduced by Tim Berners-Lee in 2009. Knowledge graph refers to a more advanced idea of the semantic web - where the meaning of the data is documented in a way that computers can “understand” and use it. Tim Berners-Lee did this using something called RDF triples. RDF triples organise data in a way that computers can understand better. Instead of regular text, information is set up as subject-predicate-object statements. For example, "Airplane X (subject) uses (predicate) Engine Y (object)." This format helps machines understand and work with the relationships between different pieces of data, and is very efficient. Let’s take a look at how this works.
In a knowledge graph, things like people, products, or places are called "nodes" or "classes," and the connections between them (like relationships between people or links between products and locations) are called "edges." These edges show how different things are connected, making the graph a useful tool for representing real-world relationships. Knowledge graphs are popular because they make it easier to understand and manage large amounts of data. Look at the image below as an example.
The top part is a table that shows 2 people with occupation, school and spouse. But when we get to Einstein’s spouse we have a problem. He had two spouses and there was not enough room. We would have to change the table to add a 2nd spouse column or extract the spouse column to a new table. With the knowledge graph below, we don’t have to make big changes to the database, we just add another node and users will get both spouses when they search. This is a (very simplified) version of the Google Knowledge base. When I searched for Albert Einstein, I saw a page with information about his birth, death and spouses, and it suggested Marie Curie as someone I might be interested in because they are connected on the graph through the ‘scientist’ node (your results may vary). The Google Knowledge base enhances regular search because it allows them to provide useful data based on the meaning of the data, just not special search terms.
Knowledge graph use cases and Data Governance
Graphs are being used across many industries to improve data management. Some general examples include:
Retail: Graphs are used for product recommendations and upselling, tailoring suggestions based on customer preferences and purchase history.
Finance: In the financial industry, they help with anti-money laundering (AML) efforts and Know Your Customer (KYC) procedures by uncovering relationships between accounts and transactions.
Healthcare: Knowledge graphs aid medical research and improve diagnoses by connecting disparate medical data points, offering a big picture view of patient information or drug interactions.
Entertainment: Streaming platforms and media services use knowledge graphs to power recommendation engines, suggesting content based on user behaviour, preferences and connections to other media. For example, in a film knowledge graph, you could explore connections between actors and the movies they worked on together.
Knowledge graphs offer a more flexible way to visualise data compared to static lists or tables. They help identify patterns, especially in fields like graph data science and machine learning. For example, in drug discovery, pharmaceutical companies use knowledge graphs to show connections between different molecules. By studying patterns from current antibiotics, graph machine learning models can find or predict new drugs with similar or better properties.
In Data Governance, knowledge graphs help organisations manage their data by showing how different datasets are related. It is an excellent choice for Data Catalogues since it makes it easier to organise data, follow rules and ensure compliance. They give a clear view of how data sources interact, making it simpler to track where data comes from and automate compliance tasks. We’ll explore this more later in the blog.
Benefits of using knowledge graphs in Data Governance
While they started out in specific industries, they are now being used widely across many different fields. So, as is hopefully becoming clear, knowledge graphs offer a powerful way to manage, integrate and understand data, transforming how businesses approach Data Governance. By providing a structured yet flexible framework, knowledge graphs not only make data more accessible but also improve the ability to query and navigate complex relationships between different data entities.
Here are some of the more specific benefits of using knowledge graphs in Data Governance:
Understanding and Managing Data: Knowledge graphs give a complete picture of a company's data. They make it easier to see what data the company has, where it's stored, how it's shared and who is using it.
Integration of Multiple Sources: One of the main benefits of knowledge graphs is that they can combine data from different places. By linking data from various sources, companies can get a complete view of their information. This is really helpful for businesses with complex data, like aircraft manufacturers, where it’s important to understand how aircraft models, engines, and airlines are connected for the business to run smoothly each day.
Flexibility and Scalability: Unlike traditional databases that use fixed formats, knowledge graphs are flexible and can show connections between different types of data without needing a set structure. This flexibility makes it easier for organisations to understand large amounts of data easily.
Why you need Data Governance for knowledge graphs
While there are many benefits of knowledge graphs for Data Governance, it actually works both ways in that knowledge graphs also need the support of a strong Data Governance initiative to work well.
Without proper governance, there’s a risk of connecting wrong or misleading data, which can ruin the value of the whole knowledge graph. If the connections between data points are incorrect, the insights you get from the data can be wrong. Simply put, Data Governance and knowledge graphs work together: good governance keeps the knowledge graph accurate, and the knowledge graph helps you see how data is connected, making it easier to keep data clean, understood and well managed.
How knowledge graphs work in Data Governance
So, knowledge graphs play a crucial role in Data Governance by structuring data in a way that enhances efficiency. As we touched on at the start of the blog, at the core of a knowledge graph are RDF triples, which represent data in a machine-readable format. This structure is very supportive of Data Governance functions because it helps computers understand and process relationships between data points.
What's even better is that knowledge graphs are getting smarter with the help of artificial intelligence (AI). AI helps machines understand text better, find new connections and adjust to new information. This makes knowledge graphs perfect for situations where data from different sources needs to be analysed and shown based on what users are looking for. By clearly showing how data is related, knowledge graphs make it easier to check and improve data processes, supporting better Data Governance across the organisation.
It’s all about chatting and finding out
I, for one, am very glad that I now understand the basics of knowledge graphs in Data Governance. I feel it's something valuable for anyone involved in managing data to know and I want to give a big thank you to Ed (connect with him on LinkedIn here) for his support with understanding this topic (and in case you are wondering he kindly agreed to review this blog to make sure that I’m not getting the message wrong!)
And don’t forget - if you are a member of my DG Launch Pad or coaching programmes, you can schedule a coaching call with an expert guest coach. These personalised sessions offer a great opportunity to dig deeper, share ideas and learn from industry experts. This blog post is a perfect example of how our understanding of a topic improves through these discussions. So, if you're a client, reach out to schedule your next session. I'd love to see you in one soon!