1. Locate Elements By Name
It is a standard practice to define unique ids for web elements in an HTML code. However, there may be cases when these unique identifiers are not present. Instead, the names are there; then we can also use them to select a web element.
Here is the code snippet that demonstrates the use of <find_element_by_name> method. Below code opens Google in the browser and performs a text search.
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(20)
driver.close()
If the HTML code has more than one web element with “@name” attribute, then this method will select the first web element from the list. If no match occurs, a NoSuchElementException gets raised.
2. Locate Elements By ID
We use this method when the Id attribute for the element is available. It is in fact, the most reliable and the fastest way to locate a particular web element on an HTML page. An Id will always be unique for any object on a web page. So, we should prefer using Id attribute for locating the elements over other available options.
Here is the code snippet that demonstrates the use of the <find_element_by_id> method. Below code opens Google in the browser and performs a text search.
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_id("lst-ib")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(20)
driver.close()
If more than one web elements have the same value of id, attribute, this method will return the first element for which the id matches. It will raise a NoSuchElementException if there is no match.
3. Locate Elements By Link Text
We use this method for selecting hyperlinks from a web page. If multiple elements have the same link text, then this method selects the first element with a match. It works only on links (hyperlinks), that is why we call it <Link Text locator>.
Here is the code snippet that demonstrates the use of <find_element_by_link_text> method. Below code opens Google in the browser and performs a text search. After that, it opens a Hyperlink with link text as “Python Tutorial.”
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(5)
elem = driver.find_element_by_link_text("Python Tutorial")
elem.click()
time.sleep(20)
driver.close()
4. Locate Elements By Partial Link Text
For locating the element by using the link text method, we need to provide the complete Link text. However, the partial link text method enables us to select a hyperlink by giving only a part of the link text.
In the above example if we use the partial link text method, then the code will become as.
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(5)
elem = driver.find_element_by_partial_link_text("Python")
elem.click()
time.sleep(20)
driver.close()
This code opens the Python tutorial web page as in the above code.
5. Locate Elements By Xpath
Another useful method to locate an element is using an XPath expression. We use XPath when a proper id or name attribute is not present in the code to access that element.
XPath allows locating an element using the Absolute (not the preferred way), or the Relative XPath. Absolute XPaths determines the location of an object from the root (html). However, using Absolute XPath is not an efficient method.
It is because if we make even a slight change in the web page code. Absolute XPath will change, and the webdriver may not be able to locate the element with the old one.
In case of Relative XPath, we try to locate a nearby element for which an id or name attribute is given (ideally a parent element). Now we can calculate the XPath of the target element relative to this nearby element. The chances of this XPath to change is very less, thus making our tests more robust.
Thus, both of these ways help us to locate an element that does have an id or name attribute.
XPath locators can also use attributes other than id and name for locating the element.
To understand the Absolute and Relative path, let’s take the following HTML code for user SignUp.
<html>
<body>
<form id="signUpForm">
<input name="emailId/mobileNo" type="text" />
<input name="password" type="password" />
<input name="continue" type="submit" value="SignUp" />
<input name="continue" type="button" value="Clear" />
</form>
</body>
<html>
Now we will try locating different elements present on the page using XPath.
Here are the XPaths that will help Selenium Webdriver to locate the form element.
form_element = driver.find_element_by_xpath("/html/body/form[1]")
It is the Absolute path. It will fail if we do any change to the HTML code.
Now following are the Relative Xpaths.
form_element = driver.find_element_by_xpath("//form[1]")
It marks the first form element.
form_element = driver.find_element_by_xpath("//form[@id='signUpForm']")
It uses the id attribute with value as “signUpForm” to locate the element.
We can locate <emailId/mobileNo> element in similar manner as.
email_input = driver.find_element_by_xpath("//form[input/@name='emailId/mobileNo']")
It will return the first form element having ‘input’ child element. This input element has an attribute named ‘name’ and the value ’emailId/mobileNo’.
email_input = driver.find_element_by_xpath("//form[@id='loginForm']/input[1]")
It will select the first ‘input’ child element of the ‘form’ element. Where the form element has the attribute named ‘id’ and the value ‘signUpForm’.
email_input= driver.find_element_by_xpath("//input[@name='emailId/mobileNo']")
It directly goes to the ‘input’ element having an attribute name with value as ’emailId/mobileNo’.
6. Locate Elements By CSS Selector
This method allows you to locate elements by class attribute name.
It will return the first element matching the input attribute. If the search fails, the method throws the NoSuchElementException.
For illustration, assume the below HTML code:
<html>
<body>
<div class="round-button">Click Here</p>
</body>
<html>
The above code has a single div element of “round-button” class type. To access a CSS class, you can use dot (.) symbol. The below syntax represents the CSS selector for the “round-button” class.
div.round-button
With the below code, you can locate the target div element following the CSS locator strategy.
get_div = driver.find_element_by_css_selector('div.round-button')
7. Locate Elements By Tagname
This method allows you to find a web-element by specifying the tag name.
It will return the first element having the specified name. If the search doesn’t succeed, the method will throw the NoSuchElementException.
For illustration, assume the below HTML code:
<html>
<body>
<title>Hello Python</title>
<p>Learn test automation using Python</p>
</body>
<html>
The above code has a title tag with some text. You can find it using the below code.
get_div = driver.find_element_by_tag_name('title')
8. Locate Elements By Classname
This method allows you to locate elements based on the class name.
It will return the first element with the given class name. If the search doesn’t succeed, the method will throw the NoSuchElementException.
For illustration, assume the below HTML code:
<html>
<body>
<div class="round-button">Click Here</div>
</body>
<html>
The above code has a class with a name. You can find it using the below code.
get_div = driver.find_element_by_class_name('round-button')
Quick Wrap Up – Locate Elements Using Selenium Python
We hope that you now know how to use the locators and find elements using them.
If you indeed have learned from this class, then care to share it with your colleagues. Also, connect to our social media (Facebook/Twitter) accounts to receive timely updates.
Best,
Introduction
When you surf online, you occasionally visit websites that show content like videos or audio files which are dynamically loaded. This is basically done using AJAX calls or sessions where the URLs for these files are generated in some way for which you can not save them by normal means. A scenario would be like, for example, you visited a web page and a video is loaded through a video player like jwplayer after 1 or 2 secs. You want to save this but unfortunately you couldn’t do so by right-clicking on it and saving it as the player doesn’t show that option. Even if you use command line tool like wget or youtube-dl, it might be possible but for some reason it just doesn’t work. Another problem is that, there are still some javascript functions that need to be executed and until then, the video url does not get generated in the site. Now the question is, how would you download such files to your computer?
Well, there are different ways to download them. One way is to use a plugin or extension for the browser like Grab Any Media or FlashGot which can handle such requests and might allow you to download it. Now the thing is, lets say you want to download an entire set of videos that are loaded using different AJAX calls. In this case, the plugins or extensions, might work but it takes a long time to manually download each file. A better method would be to write a small script that automates this process. This tutorial aims to teach you guys on how to use the selenium web driver and do simple tasks like downloading dynamically loaded content in a website using python.
Prerequisites
For this tutorial, you need to have atleast some knowledge on how to program in python. If you don’t know anything about it, then I would suggest you to check out this site : http://www.tutorialspoint.com/python/. It is a great starting point to learn a new language and you can quickly learn the basics.
So, before we start, I would like to give an small introduction to the modules that I am going to use in my python script. The system that I’m using is a Ubuntu Studio 14.04. In order to install the modules, you can use python-pip and also you might need to have administrative privileges. Here are the modules as follows :
- Selenium Web Driver : The selenium framework is a suite of tools that can be used to test web applications and also automate the web browser tasks. By using it’s provided API, you can do simple tasks like automating administration work for a website or some website-related maintenance, by sending commands to the browser. It is supported in various programming languages like Python, Java, Javascript, PHP, C#, Perl, Ruby. To install this framework for python, just type the following command in the terminal :
- BeautifulSoup : The bs4 is a HTML/XML parser that does a great job at screen-scraping elements and getting information like the tag names, attributes, and values. It also has set of methods that allow you do things like, to match certain instances of a text and retrieve all the elements that contain it. You can install this module like this :
pip install beautifulsoup4
- Python-Wget : This module is a python port for the wget command-line program. Its easy to setup and you can quickly download videos or files to your system by using its API. You can also follow the “traditional” method of downloading files, like using the standard
urllib
module or by doing a subprocess call to the wget command-line program, but for this tutorial, I will be using this module to get the job done. Here is how you install it :
- PhantomJS (Optional) : The PhantomJS is a headless browser (doesn’t have a front-end GUI and everything works at the backend) that is used for web page interaction. It is similar to the selenium web driver but the difference is that, it is headless. For this tutorial, I will be using the basic Firefox web driver via selenium, but you can test this out if you do not want a browser to popup every time the script runs. You can download the phantomjs executable from their homepage, but I advise you to use a downgraded version of this as the latest one might not be compatible with selenium module.
The Concept
The idea is basically :
- To get the web page using the selenium web driver.
- Parse and extract the video or audio urls from the html page using BeautifulSoup.
- Download the files to the system using wget.
Step 1
The first step we need to do is import the necessary modules in the python script or shell, and this can be done as shown below :
import os
import sys
import wget
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
From the selenium module, we import the following things :
- webdriver : This submodule has the functionality to initialize the various browsers like Chrome, Firefox, IE, etc.
- Keys : This allows us to send key presses or inputs to the web driver.
- WebDriverWait : The
WebDriverWait
is similar to the sleep function of the time
module. This function tells the webdriver to wait for "n" seconds.
- expected_conditions : There are common conditions that the web driver takes into account like, for example, a condition would be for an element to appears in the browser or when the title of the page is somename. Here is a list of all the available conditions :
- title_is
- title_contains
- presence_of_element_located
- visibility_of_element_located
- visibility_of
- presence_of_all_elements_located
- text_to_be_present_in_element
- text_to_be_present_in_element_value
- frame_to_be_available_and_switch_to_it
- invisibility_of_element_located
- element_to_be_clickable - it is Displayed and Enabled.
- staleness_of
- element_to_be_selected
- element_located_to_be_selected
- element_selection_state_to_be
- element_located_selection_state_to_be
- alert_is_present
- By : The
By
class allows us to select HTML elements in the web driver by class name, id, xpath, name, hyperlinks, etc.
Step 2
Now, according to the concept, for a single video url that is loaded using an AJAX call, we need to get the web page using the selenium webdriver. This is done as follows :
driver = webdriver.Firefox() # if you want to use chrome, replace Firefox() with Chrome()
driver.get("http://www.example.com") # load the web page
When you execute this in the python shell or via the script (after you import the modules), you will observe that, a firefox browser will popup and a page will be loaded into it. If you want to use the PhantomJS and stop the browser from popping up, then just replace the webdriver.Firefox()
with webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
.
Step 3
Here is the tricky part, what you need to do is extract the video urls from the web page. As every website is designed differently, you don’t have an accurate solution. You would need to manually check for a pattern or the video element that is dynamically loaded. This can be done by looking at the browser’s developer console. From the previously mentioned scenario, lets say the video is dynamically loaded using a AJAX call after 1 sec you visit the website. Then you would need to wait till the video is loaded and then get the element. So, for that you can write the script in this manner :
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID, "the-element-id")))
src = driver.page_source
When you execute these two lines in the python shell, it will tell the browser to wait for 50 seconds by default until the element with the specific id appears or is visible on the screen, and then get the html source. Now if the element doesn’t have an ID, there are other ways you can get the specific element/tag. Lets say the element has a certain class, then you can just replace the By.ID
with By.CLASS_NAME
. Here is the entire list of attributes for the By
class object :
- ID
- XPATH
- LINK_TEXT
- PARTIAL_LINK_TEXT
- NAME
- TAG_NAME
- CLASS_NAME
- CSS_SELECTOR
Step 4
Once you get the HTML source, you would need to parse it and extract the video tag from it. This is done as shown below :
parser = BeautifulSoup(src,"lxml")
list_of_attributes = {"class" : "some-class", "name" : "some-name"}
tag = parser.findAll('video',attrs=list_of_attributes)
The list_of_attributes
is a python dictionary, with (key,value) pairs which specify the tags attributes. The parser.findAll()
searches the entire HTML source and gets the video tags with the specific attributes. This generates a multi-dimensional array and is stored in the tag
variable.
Step 5
The next step is to get the url from the video tag and finally download it using wget. We can do this by writing the script in this manner :
n = 0 # Specify the index of video element in the web page
url = tag[n]['src'] # get the src attribute of the video
wget.download(url,out="path/to/output/file") # download the video
Depending on the number of videos loaded in the web page, you can specify which video you want to download. This can by done changing the value of n
.
Step 6
Finally, once the job is done, we close the driver :
The script
Now, when you put all the pieces together, and with some additional functionality to login to a website, you will get something like this :
import os
import sys
import wget
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("http://www.example.com")
elem = driver.find_element_by_id("email")
elem.send_keys("Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.")
elem = driver.find_element_by_id("pwd")
elem.send_keys("userpwd")
elem.send_keys(Keys.RETURN)
driver.get("http://www.example.com/path/of/video/page.html")
WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID, "the-element-id")))
src = driver.page_source
parser = BeautifulSoup(src,"lxml")
list_of_attributes = {"class" : "some-class", "name" : "some-name"}
tag = parser.findAll('video',attrs=list_of_attributes)
n = 0
url = tag[n]['src']
wget.download(url,out="path/to/output/file")
driver.close()
Conclusion
What we did in this tutorial is, to create a small script that automates the process of downloading a file which is dynamically loaded. The above script works for a single url. If you want to download multiple files, then you would need to manually grab the tags and dynamic content information of each website and store them in json or xml file. Then you would need to read that file and pass it through a for
loop. I created another small script that does this job. Its not full proof but is a good starting point for you guys to get an idea on how to do it. It also accepts command line arguments that allow you to download either a single video file or by taking in a file that contains all the video urls. You can get the script here : https://gitlab.com/snippets/8921.
Please keep in mind that, you will encounter some websites which are so secure, that even though what you do, you just cannot download that video or file. This is because, they are designed in such a way that, the urls for the files are generated with unique id and is embedded into the site.
I hope you find this tutorial useful and learned something new.
Source : https://www.techbeamers.com/locate-elements-selenium-python/
https://dvenkatsagar.github.io/tutorials/python/2015/10/26/ddlv/