Process of Importing XML

Performing a a quick search turns up with a site listing 100 blogs,

http://top-site-list.com/top-100-lists.php

It’s manual and time consuming having to click on each one to copy the link. I want to get the list of URLs into a spreadsheet as quick as possible.

1) First we take a look at the source code of the page and we see something like this:

import xml
import xml

You can see that all of the blogs are in the li element. Make a mental note!

2) Now fire up your G docs spreadsheet and add the URL in cell A1:

http://top-site-list.com/top-100-lists.php

(where A1 is the cell with the URL of the page). We get this back:

import xml
import xml

3) As you can see, it contains the blog names so we’re getting there. But our query is also getting a whole load of other stuff we don’t want. So let’s look in the code and see if we can isolate the list of blog items. I find the “inspect element” control in Google Chrome excellent for visualising this. As you hover over the code, it highlights the section of the page that applies to it.

 

We can see that all the blogs are a separate tr element so let’s try something like

=importxml(A1,"//tr")

Follow

Leave a Reply

Your email address will not be published. Required fields are marked *