The Rocky Horror Data Show: AI

Data shouldn’t be a wild and untamed thing, but sometimes it is just that - wild… and untamed. And unfortunately for our friend Tim, he’s about to find out just how wild and untamed data can be. As this is ‘The Rocky Data Horror Show’… where the data is not what it seems.


Things at the Magical Wish Factory have been ticking along nicely since we last checked in with our Data Governance heroes, Janet and Tim – they’ve built and are in the process of implementing a Data Governance framework that’s had great senior stakeholder buy-in and things seem to be going well.

But, as with all things at the Magical Wish Factory, it’s not long before someone comes along to upset the apple cart – it wouldn’t be a Horror Data Show if not, eh?

Just when Tim thinks he’s finally got a handle on all the data the MWF produces, he’s called into a meeting with the board who drop a bombshell… they want to use AI to speed up the processing and granting of wishes!

Tim isn’t surprised the higher-ups are pushing for this move – trying to run before they could walk was what got the Magical Wish Factory into the data disaster that Tim has spent most of his time here untangling… there’s also a huge buzz around the technology just now but, is it right for every organisation? And more specifically is it right for the MWF? Tim doesn’t think so. In fact, he can see some major red flags – not least of all, the fact that although Data Governance is in full swing, it’s not yet quite established enough to be introducing such new technologies into the mix.

Tim tries to explain: “It’s no surprise that there's a rush of companies trying to adopt this technology Janet, but there’s only so much resource to go around, and that means that for some, the question of what should come first, AI or Data Governance, can be a little like the chicken and egg debate…”

Head of IT Janet picks up: “Artificial intelligence works by mimicking human processes by ingesting large amounts of ‘training’ data and analysing it for correlations and patterns and using these patterns to make predictions about future states.

“For example, a chatbot - the kind you may encounter on an online retailer’s website or in place of technical support - is fed examples of text chats and can learn to produce lifelike exchanges with people and provide help and assistance, or an image recognition tool can learn to identify and describe objects in images by reviewing millions of examples.

“All AI starts out as a program or algorithm written and taught by a highly skilled programmer. The learning aspect of AI programming focuses on acquiring data and creating rules for how the AI will turn the data into actionable information. These rules, which are called algorithms, provide computing devices with step-by-step instructions for how to complete a specific task.”

The reasoning aspect of AI programming focuses on choosing the right algorithm to reach your desired outcome and the self-correction process is designed to continually fine-tune algorithms and ensure they provide the most accurate results possible.

So, as a consequence, if you've got missing or inaccurate data, your wrong and potentially inaccurate data can and will guide these exciting technologies that your organisation has spent a fortune on in the wrong direction and so they will make the wrong decisions and the consequences could be costly and maybe even disastrous.

That all means one thing: AI needs the right data in order to learn. And when you’re halfway through a Data Governance initiative that is really not the right time to be introducing AI – the data simply isn’t ready for it.

For the Magical Wish Factory, that means the correct wishes may not get granted if AI decides the wrong criteria against which to assess them and some people may get wishes they do not deserve or more than their quota.

So, what will the senior stakeholders at the Magical Wish Factory decide to do…

Stay tuned for episode seven of The Data Governance Coach’s series ‘The Rocky Horror Data Show’ and follow the adventures of Tim and Janet as they try to implement a successful data governance initiative at the Magical Wish Factory.

Comment

How to identify Data Owners, where multiple areas of the organisation use the same data?

Identifying Data Owners is something that many people struggle with when they are starting out in Data Governance. When multiple different teams use that data it can often confuse matters, which is why I think this is a good question to answer.

It is something I come across an awful lot when helping my clients do data governance.

and I'm sure you will too, because unless the data is only used by one area, it's often not clear-cut who the data owners should be.

If you have read any of my other articles about data ownership you will know that In my experience, it really is important that you only have one data owner per data set, because if you have any more you end up with a situation where a number of people get together and they discuss and debate the data, but they rarely come to any conclusions and they just totally derail your data governance initiative. Simply, it doesn't work.

I believe very strongly that you should have just one data owner per data set and, yes, it can be challenging if you have multiple people using the same data and even more challenging when they all want to own the data but there are a number of different ways of dealing with this.

My preferred way is to see if I can break down the data and identify different chunks of it that can be split across multiple data owners. For example, for one organisation, which was an insurer, we had a big debate over who owned the customer data and the head of underwriting believed quite strongly that they owned it, but the head of marketing also believed that they owned customer data and when we sat and talked to them we actually agreed that they owned different subsets of it.

We broke it down and we had customer risk details owned by the underwriting area

and customer contact details were owned by the marketing team and that worked very well… for a few months.

They got on and did everything we asked them to do as part of doing data governance

until we got to the day when somebody reported a data quality issue with postcode data

and my heart sank because I had this horrible feeling that I knew what was going to happen.

I was right, when we asked them both who owned the postcode, they both said they did, and they both believed it was them.

Interestingly, because we had split the data out, they’d both been doing the role and got their head around it for a few months. We got them back together and discussed it

and at that point, the head of marketing said “You know what, I actually don't think I am the data owner. I think I'm a key consumer of the data and I want my needs and requirements of that data to be considered, but I don't think I am the right person to be the data owner.” We then put all customer details back into one data set and had one data owner.

The moral of that story is… splitting it sometimes works, and sometimes it doesn't.

Simply, you have to be flexible and understand that you may need to change it again

further down the line, but it doesn't stop you from trying.

The other way of doing it if splitting the data into subsets it isn't an option, is to look which area or team really dictates the standards by which that data is captured. Do you have somebody that is setting the rules and saying this is how we do this? Because in which case they should be the data owner.

If you've got a number of other teams elsewhere in your organisation that are allowed to set the rules around that data (and that is a very rare circumstance) I sometimes come up with a two-level data owner model, but I prefer to use that as an absolute last resort because adding any complexity to your data governance framework makes it harder to embed and harder to make it successful.

Don't forget if you have any questions you’d like covered in future videos or blogs please email me - questions@nicolaaskham.com.

Or if you would like to know more about how I can help you and your organisation then please book a call using the button below.

Comment

Data Governance Interview with ChatGPT

Data Governance Interview with ChatGPT

Back in 2021 I wrote this blog about why you need to have Data Governance in place before your organisation embraces Artificial Intelligence (AI).

There have been so many advances in easily accessible AI technologies recently that I am getting lots of questions about why you do need to worry about Data Governance first. So I started to draft a blog to address this and then decided that perhaps it would be interesting to ask a well know AI tool, ChatGPT, for its thoughts on the matter.  Please read on to discover our interesting discussion on the topic:

Please introduce yourself and explain why you are interested in Data Governance

Hello! I am ChatGPT, a large language model trained by OpenAI. I am interested in a wide range of topics, including data governance, because of the crucial role that data plays in our society and economy today.

Data is increasingly being recognized as a valuable asset, and organizations are investing heavily in collecting, storing, and analyzing data to gain insights and make better decisions. However, as the amount of data being generated grows exponentially, so do the challenges of managing, securing, and utilizing this data effectively.

Data governance provides a framework for addressing these challenges and ensuring that data is managed in a way that maximizes its value while minimizing risks. I am fascinated by the various aspects of data governance, including data quality, compliance, and ethics, and how they intersect with various industries and domains.

Furthermore, data governance is crucial for the development and deployment of AI, which is becoming increasingly prevalent in our lives. As an AI language model, I understand the importance of data governance in ensuring that AI models are developed and deployed ethically, responsibly, and with a focus on fairness and accountability.

In conclusion, I am interested in data governance because of its importance in managing data effectively, promoting ethical and responsible AI development, and ensuring that data is used in a way that benefits individuals, organizations, and society as a whole.

What are your thoughts on the value of Data Governance to an organisation?

As an AI language model, I can tell you that Data Governance is an essential practice for any organization that wants to effectively manage its data assets. Data governance is the process of managing the availability, usability, and integrity of the data used by an organization.

Here are some of the ways in which data governance can add value to an organization:

Improved Data Quality: With proper data governance, an organization can ensure that its data is accurate, complete, and consistent. This, in turn, can help to improve the quality of decision-making and enable the organization to achieve its goals more effectively.

Compliance: Data governance can help an organization to comply with data-related regulations and laws such as the GDPR or HIPAA. By establishing policies and procedures for data collection, usage, and retention, an organization can avoid legal and financial penalties associated with non-compliance.

Cost Reduction: Proper data governance can help to reduce the costs associated with managing data by eliminating duplicate or unnecessary data, ensuring that data is stored in the most appropriate and cost-effective way, and optimizing data management processes.

Overall, data governance is a critical practice that can help organizations to effectively manage their data assets, improve decision-making, comply with regulations, and reduce costs.

Is Data Governance important for AI?

Yes, data governance is extremely important for AI. This is because AI algorithms rely heavily on data to learn and make predictions. If the data used to train an AI model is inaccurate, incomplete, biased, or inconsistent, the model's predictions and recommendations can be unreliable or even harmful.

Proper data governance is therefore essential to ensure that the data used to train AI models is of high quality, accurate, and representative of the real world. This involves establishing policies and procedures for data collection, storage, and usage.

Additionally, data governance can help to address the issue of algorithmic bias in AI. Bias can be introduced into an AI model if the data used to train it is biased or if the model's algorithms are designed with inherent biases. Data governance can help to mitigate these issues by ensuring that data is diverse, representative, and unbiased, and by establishing ethical guidelines for AI development and deployment.

In summary, data governance is critical for AI as it ensures the quality and reliability of the data used to train AI models, reduces the risk of algorithmic bias, and promotes ethical AI development and deployment.

Thank you for taking the time to answer my questions

You're welcome! It was my pleasure to assist you. If you have any more questions in the future, don't hesitate to ask!


If you have any questions you’d like covered in future videos or blogs please email me - questions@nicolaaskham.com.

Or if you’d like to know more about how I can help you and your organisation then please book a call using the button below.



1 Comment

What is a Data Office?

Picture of office to represent a data office

The answer to this question is going to be quite short and succinct, much like the question itself… A ‘Data Office’, in my opinion, is just another term for having a central team that looks after Data Governance. 

The vast majority of clients that I have worked with over the years would probably call this a Data Governance team or a data quality team, but I do know a number of organisations who would call that a Data Office.

In my experience, a Data Office has perhaps a scope that is maybe a little wider than a traditional Data Governance team. They are the team which is supporting Data Governance and Data Quality within the organisation but may have a wider remit and also support Master Data Management and other data management activities.

So, it is perhaps a term that is not a direct comparison to a Data Governance team, but it's very similar.

If you have a central Data Governance team and you hear people talking about a Data Office, you're not lacking in something. As long as you've got a central support for your Data Governance Framework, it doesn't really matter what you call it.

But I find this is a good example of something that we do an awful lot as Data Governance professionals… we spend the majority of our days talking to business users and asking them to draft definitions for the data that they use, but at the same time, we use terminology that we don't always define or perhaps we use terminology that we don't use consistently and I think this is something that we definitely need to improve upon.

So, if you have a Data Office, fabulous. Make sure you explain what that means when you're sharing it with the business and if you have a Data Governance team, likewise.

You need to explain what that means to the business and use that definition consistently.

Don't forget if you have any questions you’d like covered in future videos or blogs please email me - questions@nicolaaskham.com.

Or if you’d like to know more about how I can help you and your organisation then please book a call using the button below.

Comment

Tiankai Feng - Data Governance Interview

Tiankai has been working in data all his career with experience in Data Analytics, Data Science and most recently, Data Governance. He’s most passionate about the “human” part of data – collaboration, communication & creativity – and uses entertaining unconventional ways to make data more fun, for example through his music.

How long have you been working in Data Governance?

I switched from Data Analytics to Data Governance in January 2022, leading a Data Governance team dedicated to Product Data as a domain.

Some people view Data Governance as an unusual career choice, would you mind sharing how you got into this area of work?

The hiring manager that recruited me for Data Governance has explained to me that this role needs somebody who makes the area more visible and more understood by others – and I really liked that challenge.

My learning curve had not been as steep as before anyway already in my previous role, so this change was very welcome and I was excited to “switch sides”.

What characteristics do you have that make you successful at Data Governance and why?

I think I have three key characteristics that make me a different Data Governance as leaders: people focus, communication and transparency.

People focus – I don’t only pay attention to all data governance team members being motivated by working on things and enjoy, I also try to make sure cross-functional collaboration is based on intrinsic motivation, which has to be based on relationships, not just a “transactional” view of business.

Communication – I like to communicate and get really creative in my communication, which especially Data Governance could benefit a lot from. No matter if it it’s a rap song about data governance, launching an internal talk show  to upskill the wider organization or making a “data heroes” campaign to propagate data ownership and data stewardship – I keep surprising people with Data Governance content and thereby make it memorable.

Transparency – Data Governance is all about transparency, but seeing small progress in Data Governance as wins is also part of it. Combining a “making progress” mindset with high transparency means showing impact continuously, thereby changing perception of Data Governance into being a value driver, not just a cost center.

Are there any particular books or resources that you would recommend as useful support for those starting out in Data Governance?

“Disrupting Data Governance” by Laura Madsen is a great book to have impulses for a new way of doing Data Governance. It especially shows the contrast of “old world” information vs “new world” data and how Data Governance has to be adapted. It inspired my Data Governance a lot.

What is the biggest challenge you have ever faced in a Data Governance implementation?

Resistance from misunderstandings – everybody agreeing that Data Governance should be implemented and embedded in business processes, but than realizing that everybody has very different expectations towards what Data Governance should and should not do.

Spending a lot of efforts in communication and upskillings is key.

Is there a company or industry you would particularly like to help implement Data Governance for and why?

I think different industries come with different challenges that are very exciting, so I would not say I am against any specific industry. I’m a continuous learner and optimist so I hope my career will surprise me!

What single piece of advice would you give someone just starting out in Data Governance?

I would have three points of advice:

  1. You can not overcommunicate in data governance – do it from the very beginning and don’t stop

  2. Leadership sponsorship is a make or break – always keep that sponsorship relationship and ensure that your DG efforts are supported

  3. Build active advocates in your organization – external word of mouth is always more effective to drive collaboration than “you tooting your own horn”

Finally, I wondered if you could share a memorable data governance experience (either humorous or challenging)?

Recording a Data Governance themed Christmas Song with Nicola herself was an absolute highlight for me!

‘tis not the season right now, but the performance and our lyrics are still great. Have fun.


Comment

Is Access Management part of Data Governance?

Email inbox

I was emailed the question recently and, in my haste, to provide an answer, I simply replied ‘no, it is not - Access Management is part of Data Security, so it's not a Data Governance activity, and the asker very rightly came back to me and challenged me on this.

They said, “Oh, well, that's really odd because I thought that Data Owners were responsible for Access Management” and I realised that I should have perhaps given a more full answer to the person who'd asked the question – so let’s do this now, first by look at what Access Management really is.

What is Access Management?

Access management is primarily an information security, IT and data governance process used to grant access to valid users and prohibit invalid users.

Typically, AM is used in conjunction with identity access management (IAM). Identity management creates, provisions and controls different users, roles, groups and policies, whereas AM ensures that these roles and policies are followed.

An AM-based application/system stores the different user roles and their profiles, and process user access requests based on the data/profile/roles.

A common misunderstanding

Access Management, deciding who should or shouldn't have access to our data,

is definitely part of a Data Security activity, along with agreeing perhaps the levels of encryption and security classifications. But the worlds of Data Governance and Data Security do overlap.

Now, if you read the blog I posted a few months ago about the difference between Data Governance and Data Management you will remember that I talked about the DAMA DMBOK wheel.

Data Governance is in the middle of that wheel which lists all the other data management disciplines in it because Data Governance provides a foundation for many other data management disciplines and, in the case of Data Security, it is around the roles and responsibilities because we will have worked hard to find the right people to be Data Owners for the data.

Now, we might want to only talk to these people in terms of whether the data is good enough quality, whether we have definitions for it, and how we resolve issues with that data but if we've found the right people, then they're also the right people to make other decisions about that data.

It is quite often the case that I will work with the Data Security team at my client's

to make sure that their approach is aligned with the Data Governance framework and that it is the Data Owner that is asked to approve access requests to their data so, you can see how this confusion arises because it would become the responsibility of the Data Owner to approve access requests. However, if we're being purist about it, it's not a Data Governance activity.

Don't forget if you have any questions you’d like covered in future videos or blogs please email me - questions@nicolaaskham.com.

Or if you would like to know more about how I can help you and your organisation then please book a call using the button below.

Comment

Critical Data Elements and How Much Data Do We Have to Hold About Them?

hand holding pebbles and pouring them onto the floor to represent how much data we should hold about critical data elements

Such are the demands of modern business it is important to understand that we cannot put the same level of data governance onto all data that we have in our organisation. There is probably neither the appetite nor the resources to do that and nor would it probably bring enough benefits.

Therefore, we need to make sure we are working out which is our most important data and putting most of our efforts into that. I believe there is quite often the temptation to believe

that the data governance team are responsible for finding the critical data and I would challenge that.

Would you expect one person or a very small number of people in your organisation to know every single piece of data and to know which is the most important? Surely it is the people in the business that understand what data is the most important to them. So, I think it's really important that you get the business to identify what is important to them.

How do we know which data is the most important?

We need to do this in a consistent and logical manner. Your organisation should agree to some set criteria for how you're going to identify which is the most important data within your organisation and there are many ways of doing this.

I personally like to identify three different levels. For example, you could have critical, important and “everything else”, or a tiered system like high, medium and low.

But whatever you chose to call them always aim to have three levels because this is how you are going to differentiate three different amounts of effort:

High is the really important data and that's where you put all your effort, you will want definitions, data quality rules and data lineage in place.

Medium is data that's fairly important, you'd like some controls and understanding of it,

but we don't need to go quite so overboard as we do the really important stuff.

And when it comes to everything else… Well, we might never get to do data governance on everything else. Maybe we will when we sorted out the other things, but we've clearly identified that it's really low in the pecking order.

Once you’ve agreed your categories you can write some criteria according to what your organisation does and how they generally measure things anyway and allow business users to consistently evaluate their data against it to come up with the right classification for their data.

Setting out criteria for measuring the importance of data

This method means you need to work with different people across your whole business early on in your data governance journey.  I call this a stage Data Discovery.

You should run workshops with your key stakeholders for probably every function or department in your organisation and get them to tell you what data they use and what data they produce. Once you've done that, you can agree which data they may be the data owners for and you can also then ask the data owner to tell you which of that data is the most important to them.

Now, some data will be used by multiple people in different parts of your organisation and so it's important that this is done with every single function because one function might say “that data is absolutely critical to us” and another department may only refer to that data as more of a contextual reference and therefore it won’t be as high on their list of priorities.

So make sure you get all viewpoints so that you can identify its overarching level which will be the highest level that anybody has classified it as.

The second part of the question is, what do you hold about it?

This is the kind of information that you're going to document in your data glossary, and I've seen people go totally overboard in collecting all sorts of stuff.

As with all things that are data governance, I believe the best thing is to start with something really simple. This isn't an academic exercise.

We are documenting this data so that it can add some value so that the organisation can understand what it is, where it is, what is being used for and whom we should go to make decisions on it.

Cover your basics, capture that information and you should have a really good starting place and then after that talk to your business users, ask them if they were looking up that data to find out more about it but don't start trying to capture everything. You'll be giving people lots of work to do and then they're going to rail against you, whereas, in reality, we want them to understand, this isn't too onerous a task, it’s going to have some really valuable benefits in the long term.

Don't forget if you have any questions you’d like covered in future videos or blogs please email me - questions@nicolaaskham.com.

Or you’d like to know more about how I can help you and your organisation then please book a call using the button below.

Comment

Who owns the Data in a Data Warehouse?

A lot of the confusion that rises around data governance is organisations trying to figure out who owns what data and who is responsible for that data – and I am willing to admit that even I made the odd mistake when I first started out. Thankfully, I have the benefit of hindsight and I’ve made some of the mistakes so that you don’t have to!

So, let’s look at data warehouses… I can understand that people think it would be so much easier if we could just have one person owning that data in the data warehouse. That's exactly what I did when I was first starting out in Data Governance because it was really easy. We already had a business owner of the data warehouse so I went to them and asked them if they would own the data as well - and they agreed!

However, it was only after that that I started finding that this actually caused more problems than it solved, because if you have one person owning that data how could they possibly understand everything about that data and know everything about it?

I would ask the person to make decisions on that data and they’d go

“Oh, I don't know about it because I don't know why that data is collected. It just happens to be in the data warehouse.”

Another scenario that I would frequently see is the owner of the data warehouse coming to me to complain that some of the data in the data warehouse was poor quality. I'd go to the person who owns the data in System A where that data was coming from, and I'd say “the data warehouse data owner says you're sending rubbish data” and they would reply “No, I'm not. My data is wonderful. They're corrupting it when they load it and transform it into the data warehouse.”

And in this no-win scenario, the data governance officer is piggy in the middle getting sent between systems whilst each side blamed the other for the poor quality data and nothing got solved and nothing got sorted. This is why I realised the hard way that having one person owning the data in a data warehouse really doesn't help.

What we have to do is have one data owner for one data set wherever it is in your organisation, because we need one person who can make decisions about it and be held accountable for the quality of that data - not multiple people making different and potentially conflicting decisions about data that actually just happens to reside on multiple different systems.

Simply, find the most appropriate person to own the data – and do not just let this role fall on the owner of the data warehouse; often they don’t have a complete overview or understanding of the data.

It really is worth the effort of finding the right person upfront. It takes a little bit more effort, but you will be putting in place a good foundation for your Data Governance framework.

Don't forget if you have any questions you’d like covered in future videos or blogs please email me - questions@nicolaaskham.com.

Or if  you would like to know more about how I can help you and your organisation then please book a call using the button below.

Comment