Hello. We're working today on finding the most common words. I've already done this in this document, but it is really complicated and the number one reason is because there's too many Call too many things to look through. Too many words. This is article titles from a newsletter that I curate.
I has nine 1,974 rows. It ends up being about 130,000 words. And the process is the same, but it gets really complicated because it gets, it's over the limit of some of. Formulas that I'm using. So instead of walking you through this, basically what I did is I, I can walk you through the steps. I split the titles into words.
I then figured out that there are E to N. So this is equals column. This is 66 columns of words, 66, and there's nine 1,974 rows times 66, meaning there's 130,000 cells here. And what we're going to, what the idea is, is you take each of these columns and then you stack them into one column, sort of here.
But because a sheet doesn't do very well when you have tens of thousands of rows I had to split it up into three. So there's words, and then I had to get the unique words.
And then I did a count. So once I got the unique words, I did a count and now I skipped a bunch of steps because I had to do this three times and then combine each of those. So I had to find the unique words of each of the words, columns. Then the unique of all and then the unique of the unique. So, , to be quite honest, it's much easier to go through the simpler one where I just get to get the actual total and then I get the results like this, and I'll show you one pitfall that's gonna happen.
it's gonna happen to us too, is the, the words are, there's lowercase and uppercase and all types of cases, and then they're, they, they show up on the unique, but then they're counted all the same. So the count if looks at these all as the same, but the unique words, the unique formula looks at them as different.
So what we get is, The, the tutu and, and my newsletter's all about influencer marketing. So you see influencer use a lot, influencers Instagram, and then you get these tiny words on, in for marketing with how, and so our analysis is gonna be pretty our raw data is gonna be pretty not crummy. It's not bad data.
It, it's actually good data because this is actually the most common word is the, but that. You could get that if you just looked at the most common words in the English language. It's probably the so what we really need to do is get, we can pick through this, but I, I'll get you to this point. We're gonna do a, we're gonna, I'm gonna walk through every step and we're gonna do it with just a few of these right now.
So instead of 1,974, I got 34 rows here. We're gonna go through this process and and you'll see how this, I. So I'm gonna delete all the other rows first. And then I'm gonna actually say, okay, here's the titles. I'll call this titles. Then I'm gonna duplicate this actually,
because I don't want to have to delete all the rows and columns again. But I do wanna just delete all of that. I'm gonna do equals split. We're gonna split. We're gonna back over here. Titles, we're gonna split all this. Okay. What we're gonna split it by is going to be a space. So we're gonna say the, the eliminator is, what are you splitting it by, and we're gonna say true that it's gonna add the rows automatically or the columns automatically.
Oh, and we have to do one more thing is this is not usually. An array. So we need to do array formula around this and now we get it all. So we only have to do one. We don't have to copy and paste it all the way down. And so you saw that re, that referral error said reference error said we wanted to add the columns and it did automatically.
So let's look at it. Let's see. Okay, so I just fixed the formatting a little to show you everything on here. So what we have done now is we have each word in its individual cell. We have a lot of other stuff in here, but that will be fine. We don't have to do, you can clean your data beforehand, but I haven't done it yet.
You can clean it from all these like, sort of pipes and things you don't want. You should clean that data beforehand, but you. Really have to, if it, if it's not gonna be within the most top 10, 20 common words you use. All right. So just to re recap, I have a array formula split. That's all I've done so far.
Everything else has been formatting. I have now up to ab, so that is going to be, we can even check. That's 28 columns and there's something here. Yeah, this is the longest one here. Okay. So what we're gonna do, we're gonna stack each of those, let's call this words. . Now we're gonna find, now we're gonna put all the words, okay, now we want all the words into one column.
How do we do that? We go equals and we're gonna just put it in curly brackets. And inside the curly brackets we're gonna do words, exclamation point A to a and then semicolon words, exclamation point, b2b. And we're gonna do this all the way down. . There's a little bit of a weird thing that I'll do here to show you what's up, but here's the old I'm gonna show you once right here.
So here's all the words in columns A and B, columns A and B. Here we'll take a look at what's, where's Emily? It's probably gonna be let, I'm gonna guess it's gonna be somewhere in the high thirties row. Yeah. So it just stacked right on top of each other. All of these words, but it's only done so far. Words, A to a, columns A and B.
And I'm gonna show you a really scratchy, quick way, weird way to do this if we wanna get all of these columns. Okay. So the idea is we want to get columns A to a B, we want it to say words exclamation point A to a, words exclamation point B to. We're gonna do this on a scratch page cuz it doesn't have to do with anything.
We're gonna just get that the text. We're gonna do equals what do we need? We need a to a we want to then do equals actually we need to do, we're gonna get a rough error. So let's do this referral error cause it's referring to itself, so it needs to refer to itself. The other one then we go equals formula text B one.
And now, . We have it A So now we're gonna copy and paste it all the way down. We need a couple more. This is a really junky, quicky dirty way to get this text if you don't want to type it out all the way. We need just one more then we, we need, cause we're gonna need A to B. Me.
Okay. So we have A to B. Now we have all of this text here. A to A B. A B, we wanna do command C shift command V for paste value. So now it's not formula text, it's actually that is to text, but it thinks it's a formula because it equals sign. So we go we do this, we do command F,
click on the options we want to take away equals one that speci. I'll search within the place all. Z done. And now we have in one row all A to a, B to b, cdc, all that. All we need now we just do join and we're gonna join with words exclamation point. And then we're gonna do, in front of that, we're going to do a semicolon.
And what do we want to join with? This is, B, two to two. And now in one cell we have all of the, all of the columns we want, except for the first one, we have A to a. So how do we add on top of that? We go equals concatenate. And we just add words, exclamation point,
and we, Nate, the two. I don't need a question mark here. And now we have words, exclamation point A to a. We have all the texts we need. That's all we did on the scratch. We just got that text. So in all words, all we do is copy and paste inside of this curly brackets. Copy and paste that text and you'll see what happens.
Now it all terms, cuz those are actually the columns we want. We did a real quick and dirty. And it's gonna give us a reference there. It's gonna have, have to add a bunch of rows. It does it. And now we have all the way down here. We have hundreds and hundreds and hundreds and hundreds and hundreds of words.
In fact, we have, let's see, not some, what we wanna do, we want count. We have 410 words here. We have all the words that appeared in all of those titles, and now we're gonna do what? Now what we're gonna do is we're gonna find the unique words and then we're gonna count them. Okay? On this tab, all I have is the unique I'm calling this unique.
So what we wanna do is want to go unique. Unique, and we're gonna get all words, all words, exclamation point A to a, and we're gonna get those and it's gonna be in the order that they appear. So I actually wanted an alphabet. So let's go sort whether it's the range, that's the range. Who on the sort and true.
So now it's numbers and then words. And actually it has punctuation first, but let's do the count. We go equals count if, and our count if is gonna be all words, A to a, a. All here and we're gonna have a count. Now we can drag that all the way down. Actually we can go down to the end. We see there's only down to Zoella, so let's delete all this just so we don't have all of those zeros.
And so now you can see you, your YouTube YouTube's YouTuber was, was is the different punctuations. Again, you can clean that up if you want. There's a couple rejects replacements you can do if you just want pure words, but it gets a little muddy. You can start seeing duh, duh, the two different doess, but they have the same count.
And now we wanna sort it. Okay, then this is pretty simple. We're just gonna duplicate this and we're gonna call it sorted. And we're gonna just, instead of all of this, we're gonna delete all of that. We're gonna go equal sort. And what do we wanna sort? We wanna sort this range of a B, and actually we want to do a two and we wanna sort by column two ascending.
We don't, we want actually a descending from the highest to the lowest we. And now we should have all of the words, the influencer, influencers, Instagram. And so now we have all of the different types, and this is now where we can take this raw data and move it into like a summary. You can either do this by hand, you can say, okay, I just want the top 10, so I'm gonna take influencer.
I'm gonna add those together. Instagram. media, social. Maybe you want to keep the phrase social media, so you're like, oh, let me just find the amount of social, if this is probably five. Social media is, this is probably one phrase and this raw data will help you get, will get you to that final step of saying, here's the most used words or phrases.
But you need to do all this math to get to that point. I'm gonna share with you this this sheet. I am. Also going to share with you this sheet, if you wanna look at how if you want to be able to see how did I do more? Very complicated. And, and once you master this you'll be able to tell, you'll be able to see where it breaks.
It breaks sort of around when you have like 50,000 words. I would, I would say that's a good break point. If you have more than 50,000 words, then try to start breaking them into sections. It gets really, really complicated cuz you have to. not just the words, but the unique words. And then you have to add the uniques together and get a unique of the uniques.
And that's here at the uni uni I put here, but I'll share you on this sheet as well. So you can have both the simple version, which is, which you have less than 50,000 words or the co more complicated version when you have many, many, many, like hundreds of thousands of words. Hopefully this is helpful and it gets you where you're going.