Hey there stranger!

Sign up to get access.

How to Find Duplicates and create a Duplicate Detector

About this Tutorial

Create a duplicate detector with a few google sheet formulas.

Video Transcript

 Hello, welcome. We're gonna find duplicates. This is an amazingly hard problem that comes up so often in Google Sheets. I've devoted an entire video to it, and I'm showing you a few ways to do it. Some very quick, some very bad, some very destructive. And I'll share with you an add-on you can get, but you don't even need to buy the add-on.

You can just do what I'm doing here just in case you don't wanna spend the extra money. So say we have a list of names. This is our test here. It could be any kind of data. We have a bunch of duplicates on the bottom, and we may not know that, or we know it, or we just don't want duplicates. One thing you can do is you can highlight this row.

You can use the built-in feature in Google Sheets called Remove Duplicates. You click that, you select your column, you say, yes, the data does have. Header row, you click move duplicates, and immediately we have no more duplicates. That's good. If you wanna be destructive and wanna just delete all the duplicates, totally fine.

But sometimes, and actually most of the time when we have duplicates, we wanna understand why our. Has duplicates. Maybe because multiple people enter data. Maybe we have a timestamp that says which one was came first. We might have extra data on one of these rows and not on the other. But we definitely wanna just see which duplicates we have.

So absolutely you can go and grab the power no, not power tools. Remove duplicates, add-on. It is from able bits, but it's $60 lifetime. $20 a year. Totally fine if you want to go do that. But let me show you a really quick, free way to do this. And because you're a member of Better Sheets, we can def definitely do this better.

How do we find this we are gonna do this. We're gonna, I'm gonna do this whole formula, but we're just gonna show a little text and say, Hey, we found a duplicate on Row X. How do that? First off, let me share with you the core of this. It's the match function. So match, and we're matching a 58.

We're gonna match it with all of column A, the column that it's in. We're gonna do comma and now. This is how like how much match we wanna do is zero. We want to get an exact match and now we have a row that it's in. So if we go up to 41, row 41, see there it is. Someone might not know that. Or if we pay copy and paste that well here, sorry.

Copy paste this. Match up and down the entire thing. We have now a bunch of numbers and this becomes a rather hard to see, right? We have to see 26. We have to match this in it with our eyes, right? 26, 27. Okay. All of these are matching with their row. Oh no, we have 37 here, but 54 here. We wanna quickly see this.

One way we can is by doing an if function and we want to take match and we wanna say equals row. We use rope, parentheses. To say, what is this row here? I'll show you what this function does. Equal row. It literally tells you what row we're on. So now we're taking this row function, comparing it to the match function and saying if they are true within, what should we do?

If row that's on equals match, honestly do nothing. So the first thing is, if it's true, do nothing. If it's false, however, If Roe does not equal the match function, we want to know it. So we wanna say match. We do the same thing. We do the same formula match A 58 A to a zero, and now we see it.

Now we can take this formula, copy pasted over here, copy pasted over all of these,

do all of them, and now it's all blank. Let's do the entire, so now it's in totally blank. If everything is fine, it's fine. We don't need to know if we, if it doesn't have a duplicate, we're fine. But now we have all these duplicates and we know the, that they're the second ones, right? We know this is the second occurrence and it's on row 35.

We can even add a little message here. We can go concatenate eight. It's hard to, Found a found duplicate on row and parentheses. Now we have a little message that says, Hey, we found a duplicate. And so this column now. Shows us we have duplicate. That's really cool, right? We can use this formula and we can, it's very core, just use the match.

But if you're using this time and time again, or you want to share this with someone else, if you wanna say, Hey, go check out column B, it'll tell you a message. If found a Du Pitt, you can use this as a number of ways to use this formula. But that's not all now shown that there's a duplicate. But say we have, say hundreds of cells or thousands or tens of thousands, and we don't really want to be moving our Cursor through here.

We don't have wanna have to literally visually go through all of these, right? We want just a short table that shows here's all of the duplicates. How do we then, here's what I did. One easy thing you might want to do is maybe first you wanna just know how many duplicates you have. So I use a unique row here.

I, I say, you can say equals unique. And you say A to a, and in that you just get all of the unique ones and then you can count it. So count A count, all unique and we have all of the unique, we wanna probably take out the header, so let's do a two and now we get 50. So that's what unique rows is.

Then we get total rows. We just say equals count all count A, A two to. That's 56. So here we have now unique rows, total rows, and we can go duplicates and we can say equals this minus this great six. Oops, great. Six. And so at least we have an idea, right? With a little. Couple account formulas and using this unique, we know how many there are, but we really wanna see which ones are duplicates, right?

We want to do a little bit better than that. So here I have this all in one. So I took all these two formulas out of them together. So we don't really need to know all of this, right? What we have to do is count all, then minus count, all of the unique, put that all in one. So we don't need all three.

So that makes it a little cleaner. And now we want to know where are the duplicates. So in the, in, in just a few minutes ago I showed you the match and I showed you that we want to show. Where the match is. But in this case, what I wanna do is I wanna know what row we are on. The second duplicate is on, so I need this column and we can do this again here.

I'll share with you. I'll go equal match A, we're gonna match A through A and zero there for the match. So now we have the match. If they're equal, right? We're go if the match is equal to the row, we want to know nothing. But if the match and the row are different, we wanna know the row that the second one is on so we go match.

So if match a two. Eight a zero is equal to row that it's on. You want the row now. If nothing happens here, nothing happens there. Nothing happens. Nothing happens until we get to a row that has a duplicate. And now if the match and the road do not match, we want to know, Hey, where's the duplicate? So it says here I'm on row 52.

So you use this column to filter. So we now have a filter and we say if the filter give me column A, but filter out everything that's not zero, so nothing, right? But if it's above zero, so if it's even one or two or row five or six, then give me what's in row A. Okay? And so we get this great list of here's the D.

And now we do that again. We do the filter, but we filter row B or column B, and we say in column B, give me every row that's above zero. And so now we have all of the rows that the duplicates are on and we can put that in the same filter. And as we go down, I'll show you something interesting happens. We can get rid of these, right?

We can say, let's go here, let's do that. Let me keep that in view and as we scroll down and scroll down here. Okay this one, I know we, we already had it, so let's delete that. And it takes it off of this filter so we can get rid of the duplicates. Or if we're like, Hey, Roseanne, we actually want to get rid of the first occurrence.

It's this is the problem. Okay, get rid of that. Don't need that anymore. And now there's no duplicate anymore. So even though the second one's still there, It's not it's a, it's fungible. It changes as we change things, right? So now it's very dynamic to what we're doing, right? We're getting rid of all of these, but we're getting rid of them where we need to get rid of them w in exactly the way we need to.

So this helps you find the duplicates, get rid of them in a much cleaner, nicer way than just marking them and having to find them gets rid of them in. Or even if you want to pass on this information, this is a much nicer way than giving them this whole big data set and say, Hey, go to column B, find the road that has the number and find this, Hey your duplicate.

You have two duplicates. They're on row 56 and 57. A nice little table. Much better way to share this, duplicate information with people you're working with. Hopefully that was, let me know what else you want to do with the kits and what other problems you may have with duplicates. I'm happy to help you solve more and more problems.

Thanks, bye.