Hey there stranger!
Sign up to get access.
Zunaid Asks: How to get Get Meta from URLS? Answer: Use ImportXML()
About this Tutorial
You'll see every webpage is d. And I got the title programmatically with import xml. I got the image here with import XML and the function or formula image. And then I got a meta descript or some type of description. You'll see. That Wikipedia articles don't have a meta description, but you'll see why soon.
So how did I do this? Let's let's copy this and I'll just show you from the beginning. If you already can tell how to do this, then you don't have to watch the rest of this video. But I'm just gonna go step by step on how to do this. For you. So here we have just the U URLs and then we wanna get the title first, which is pretty simple.
Title's, pretty simple. Image is not as simple. And meta description is similarly, not as simple, but you'll see how to do it and you'll be able to then riff on it as much as you want. So let's go to SpaceX Wikipedia, that page. So this is this link. You can. See this is the same link. What I want to do is I actually wanted Tori right click view page source.
So that's over here. And this is the source. This will tell us how to get the information we want. So we want the title. It is literally title. So all we have to do is use import xml, which is generally how you use, how you capture HTML data because there is the import html function, but it imports as you see here.
It imports data from a table or list on an HTML page. What we really want to import xml. And then the URL is gonna be A two, which is over here on. To the left, the XPath query this, the title one's pretty simple. It's just title, slash title. And if you get it right on the first go, it'll tell you what it is here.
And that's it. That's the title. So we can grab that and copy and paste that formula down here. And we have the title image is the next one. That's pretty hard. Just one more thing about the title is you can grab anything here any, you can grab the entire body of the page. You can grab what HTML import html does is grab these like tables.
But you can grab anything. You just know how to, you need to know how to format it, which the sort of bracket stuff like title is pretty simple. Okay. Next is an image. So what I did is I went to the meta name meta tag. And here's OG image. There might be other images, so for social shares, and a page can have Facebook image, Twitter image.
They'll have different images here. But this one only has one has OG image. And I just wanna note what this says. So it's meta property equals og image content equals this url. So the, if we would, if we did just prop meta, it would give us all of. Text outside of these brackets, but we actually want to get this text inside of the content and we don't want all of the metas, we just want the meta with the property of OG image.
So here's how we do it. I will show you an example. So this is the end result. It says meta, and then in brackets we put the AT property. Function, which is a class at means class, and we are looking for the class of property where the, which is equal to OG image. So the class is property that was over here in the orange or brown part.
That's the class. And we're looking for OG image where it equals OG image, and that's the only one we're looking for. And then we close the bracket and then add at content, because we don't want the con, we don't, here, I'll actually show you what this looks like if we don't put the at content. So we want the image first.
We need to get the image url and then we can put it into an image. Function. So what if we don't get the content? This is what happens. Nothing because there's nothing in the. Brackets, right? So title gives us the title because that's outside of the brackets, but because there's nothing here, we need to say, Hey, oh, we don't actually want the meta property, OGM image.
We don't want the text there. We want actually, what's the text inside of at content? So we do the class content and we get that. Now we have, we can copy and paste that down here, and we have images for each one, but we don't have the image in the, your in the. You are more than welcome to stop here and wrap the URL of the image.
But we could also go one step further and say image and put that into bracket parenthesis. And now it goes and it actually uploads the image to our Google sheet. So this is pretty cool to get some visual representation into our document. You can use that as little thumbnail or something to look through it, but pretty cool way to get some.
Bring some images into a web page Google sheet. The meta description is a harder part because there is no meta description here. So what I did is, you can see like why there's no med, like literally there's just no meta description. They do. Wikipedia does something different here. They have these notes or they have rolled note.
Here I'll show you what I did. I grabbed yeah, meta diviv roll equals note. So I went here. I found roll. Let's see. Yeah. Here. Roll equals note. And I was like, oh, this is, one of, these is probably, this article is about the rocket manufacturer. Okay, that's probably it. But there's many of them. So how do you parse through many of them?
This is the problems we use. Let me show you what the original looks like and then we'll use index to figure out which one we want to use. So we do equals, and now here what I did is I want to get everything with the class Diviv, not class, something else. Every diviv that has the class role, that equals note.
And when I do that, you'll see, oh my God, it's a lot of stuff and it's a lot of stuff in a lot of different. Rows and columns. So each of these are in rows and columns as well. And how we reference rows and columns is index. Let's see. We want let's just grab this like main article one. And so what we're going to do is go index and we want to at the end do comma, which road do we want?
We want the third row. We want the second column, and that tells us where it is. If you look over here, tells US history of space. Boom, we got it. But now when we copy and paste that down, we get some reference errors. Why do we get that? It says that it's not within the function. So let's take off index of this one and see what we get.
We only get one thing here. We only get two things. So each one is gonna have its own unique we might want to do index. This might not be the best way, but I'm just showing it to you. Get. Okay. We want index one, one that's only gives us main article. Maybe we want index one and column two. Great. That is some kind of description.
Obviously it is much, much easier if you actually have a meta description in a webpage. Wikipedia doesn't. But you can try to figure out as well with this formula here. And you're more than welcome to copy this and look up import XML stuff because I didn't do a great job in explaining exactly how this works, but I just know that.
Because, I don't know, I have to look it up almost every time that I use it. So it, this is looking up every diviv with the class of role that equals note. With the, I forgot what it was called, but when it's class, like the data, the I item note. And then you're gonna get a big list. In this case, you didn't get a big list in the title cause we didn't even use any classes here.
We just did title image. You may end up with multiple images, that's fine. Then you can use index as well. But here we used image. Hopefully that was helpful to you. And sufficed as an answer, if any of this. If you wanna go deeper, ask a follow up question. If you are confused by anything, I'm more than happy to share some resources with you.
But do look into like import xml. Exactly how to use this because this is using XPath and you really need to just read the XPath documentation again. I have to read that XPath documentation almost every time I use it. Cause I don't use it. I don't use it super often. It's not like a daily thing.
I'm importing X ml information I used to, so it's like a little fuzzy after three years of not using it every day. But the XPath documentation is spot on and helped me through all these problems and also just troubleshooting and. Trial and error through this because sometimes you think you got it and then you put in brackets something that you're like, oh, I shouldn't have done the brackets.
There are some common pitfalls here with XML as well because you'll get multiple things. Use, you can use Index to fix that. You can, I think you can even use filter for that. But I usually use Index to get exactly the one I want. And hopefully this is helpful to you. And you're more than welcome to use this sheet.
I, I made sure.