Useful pieces of code for using Selenium and dealing with common errors

Importing

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

Starting up a browser

# If you want to open Chrome
driver = webdriver.Chrome()
# If you want to open Firefox
driver = webdriver.Firefox()

But Python can’t find chromedriver

driver_path = '/Users/yourname/Desktop/foundations/chromedriver'
driver = webdriver.Chrome(executable_path=driver_path)

But Python can’t find Chrome/Firefox

options = webdriver.ChromeOptions()
options.binary_location = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe"
driver = webdriver.Chrome(chrome_options=options, executable_path="C:/Utility/BrowserDrivers/chromedriver.exe")
Visiting a page

driver.get('http://www.nytimes.com')

Typing in a form

text_input = driver.find_element_by_id('name_input')
text_input.send_keys('Katherine')

from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_name('cityname'))
select.select_by_visible_text('Houston')

Clicking buttons or links

search_button = driver.find_element_by_id('sch_button')
search_button.click()

Scrolling (if you get an error something’s not in view, `ElementNotVisibleException`)

button = driver.find_element_by_class_name('load-more-btn')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

Trying to get something that might not exist

try:
  search_button = driver.find_element_by_id('sch_button')
  search_button.click()
except:
  print("It didn't work")

Getting text and attributes

# Get the text of an element
element.text

# Get the href of a link
element.get_attribute('href')

# Get the HTML inside

element.get_attribute('innerHTML')

Installing Selenium and ChromeDriver on Windows

Want to use Selenium to scrape with Chrome on Windows? Let’s do it!

We’ll need to install a couple things:

Selenium, which allows you to control browsers from Python
ChromeDriver, which allows software to control Chrome (like Selenium!)

Installing ChromeDriver

STEP ONE: Downloading ChromeDriver

First, download ChromeDriver from its terribly ugly site. It looks like a scam or like it was put together by a 12 year old, but I promise it’s good and cool and nice.

You’ll want chromedriver_win32.zip. That link should download 2.40, but if you want something more recent just go to the page and download the right thing.

STEP TWO: Unzipping ChromeDriver

Extract chromedriver_win32.zip and it will give you a file called chromedriver.exe. This is the magic software!

STEP THREE: Moving ChromeDriver somewhere sensible

Now we need to move ChromeDriver somewhere that Python and Selenium will be able to find it (a.k.a. in your PATH).

The easiest place to put it is in C:\Windows. So move it there!

If you can’t move chromedriver there, you can always just tell Python where it is when you’re loading it up. See Selenium snippetsunder “But Python can’t find chromedriver”

Installing Selenium

If you google about Selenium, a lot of the time you see things about “Selenium server” and blah blah blah - you don’t need that, you aren’t running a huge complex of automated browser testing machines. You don’t need that. We just need plain ol’ Selenium.

Let’s use pip3 to install Selenium for Python 3.

pip install selenium

Installing Chrome

Oh, you also need to make sure you have Chrome (or Firefox) installed and it lives in one of the normal places applications do.

If Python can’t find Chrome/Firefox, you can always just tell Python where it is when you’re loading it up. See Selenium snippetsunder “But Python can’t find Chrome/Firefox”

Test it

Want to make sure it works? Run the following to pull all of the headlines from the New York Times homepage.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.nytimes.com")
headlines = driver.find_elements_by_class_name("story-heading")
for headline in headlines:
    print(headline.text.strip())

1. Locate Elements By Name

It is a standard practice to define unique ids for web elements in an HTML code. However, there may be cases when these unique identifiers are not present. Instead, the names are there; then we can also use them to select a web element.

Here is the code snippet that demonstrates the use of <find_element_by_name> method. Below code opens Google in the browser and performs a text search.

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(20)

driver.close()

If the HTML code has more than one web element with “@name” attribute, then this method will select the first web element from the list. If no match occurs, a NoSuchElementException gets raised.

2. Locate Elements By ID

We use this method when the Id attribute for the element is available. It is in fact, the most reliable and the fastest way to locate a particular web element on an HTML page. An Id will always be unique for any object on a web page. So, we should prefer using Id attribute for locating the elements over other available options.

Here is the code snippet that demonstrates the use of the <find_element_by_id> method. Below code opens Google in the browser and performs a text search.

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_id("lst-ib")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(20)

driver.close()

If more than one web elements have the same value of id, attribute, this method will return the first element for which the id matches. It will raise a NoSuchElementException if there is no match.

3. Locate Elements By Link Text

We use this method for selecting hyperlinks from a web page. If multiple elements have the same link text, then this method selects the first element with a match. It works only on links (hyperlinks), that is why we call it <Link Text locator>.

Here is the code snippet that demonstrates the use of <find_element_by_link_text> method. Below code opens Google in the browser and performs a text search. After that, it opens a Hyperlink with link text as “Python Tutorial.”

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(5)
elem = driver.find_element_by_link_text("Python Tutorial")
elem.click()
time.sleep(20)

driver.close()

4. Locate Elements By Partial Link Text

For locating the element by using the link text method, we need to provide the complete Link text. However, the partial link text method enables us to select a hyperlink by giving only a part of the link text.
In the above example if we use the partial link text method, then the code will become as.

from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://google.com")
driver.maximize_window()
time.sleep(5)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Techbeamers")
inputElement.submit()
time.sleep(5)
elem = driver.find_element_by_partial_link_text("Python")
elem.click()
time.sleep(20)

driver.close()

This code opens the Python tutorial web page as in the above code.

5. Locate Elements By Xpath

Another useful method to locate an element is using an XPath expression. We use XPath when a proper id or name attribute is not present in the code to access that element.

XPath allows locating an element using the Absolute (not the preferred way), or the Relative XPath. Absolute XPaths determines the location of an object from the root (html). However, using Absolute XPath is not an efficient method.

It is because if we make even a slight change in the web page code. Absolute XPath will change, and the webdriver may not be able to locate the element with the old one.

In case of Relative XPath, we try to locate a nearby element for which an id or name attribute is given (ideally a parent element). Now we can calculate the XPath of the target element relative to this nearby element. The chances of this XPath to change is very less, thus making our tests more robust.

Thus, both of these ways help us to locate an element that does have an id or name attribute.

XPath locators can also use attributes other than id and name for locating the element.

To understand the Absolute and Relative path, let’s take the following HTML code for user SignUp.

<html>
<body>
<form id="signUpForm">
<input name="emailId/mobileNo" type="text" />
<input name="password" type="password" />
<input name="continue" type="submit" value="SignUp" />
<input name="continue" type="button" value="Clear" />
</form>
</body>
<html>

Now we will try locating different elements present on the page using XPath.

Here are the XPaths that will help Selenium Webdriver to locate the form element.

form_element = driver.find_element_by_xpath("/html/body/form[1]")

It is the Absolute path. It will fail if we do any change to the HTML code.

Now following are the Relative Xpaths.

form_element = driver.find_element_by_xpath("//form[1]")

It marks the first form element.

form_element = driver.find_element_by_xpath("//form[@id='signUpForm']")

It uses the id attribute with value as “signUpForm” to locate the element.

We can locate <emailId/mobileNo> element in similar manner as.

email_input = driver.find_element_by_xpath("//form[input/@name='emailId/mobileNo']")

It will return the first form element having ‘input’ child element. This input element has an attribute named ‘name’ and the value ’emailId/mobileNo’.

email_input = driver.find_element_by_xpath("//form[@id='loginForm']/input[1]")

It will select the first ‘input’ child element of the ‘form’ element. Where the form element has the attribute named ‘id’ and the value ‘signUpForm’.

email_input= driver.find_element_by_xpath("//input[@name='emailId/mobileNo']")

It directly goes to the ‘input’ element having an attribute name with value as ’emailId/mobileNo’.

6. Locate Elements By CSS Selector

This method allows you to locate elements by class attribute name.

It will return the first element matching the input attribute. If the search fails, the method throws the NoSuchElementException.

For illustration, assume the below HTML code:

<html>
<body>
<div class="round-button">Click Here</p>
</body>
<html>

The above code has a single div element of “round-button” class type. To access a CSS class, you can use dot (.) symbol. The below syntax represents the CSS selector for the “round-button” class.

div.round-button

With the below code, you can locate the target div element following the CSS locator strategy.

get_div = driver.find_element_by_css_selector('div.round-button')

7. Locate Elements By Tagname

This method allows you to find a web-element by specifying the tag name.

It will return the first element having the specified name. If the search doesn’t succeed, the method will throw the NoSuchElementException.

For illustration, assume the below HTML code:

<html>
<body>
<title>Hello Python</title>
<p>Learn test automation using Python</p>
</body>
<html>

The above code has a title tag with some text. You can find it using the below code.

get_div = driver.find_element_by_tag_name('title')

8. Locate Elements By Classname

This method allows you to locate elements based on the class name.

It will return the first element with the given class name. If the search doesn’t succeed, the method will throw the NoSuchElementException.

For illustration, assume the below HTML code:

<html>
<body>
<div class="round-button">Click Here</div>
</body>
<html>

The above code has a class with a name. You can find it using the below code.

get_div = driver.find_element_by_class_name('round-button')

Quick Wrap Up – Locate Elements Using Selenium Python

We hope that you now know how to use the locators and find elements using them.

If you indeed have learned from this class, then care to share it with your colleagues. Also, connect to our social media (Facebook/Twitter) accounts to receive timely updates.

Best,

Introduction

When you surf online, you occasionally visit websites that show content like videos or audio files which are dynamically loaded. This is basically done using AJAX calls or sessions where the URLs for these files are generated in some way for which you can not save them by normal means. A scenario would be like, for example, you visited a web page and a video is loaded through a video player like jwplayer after 1 or 2 secs. You want to save this but unfortunately you couldn’t do so by right-clicking on it and saving it as the player doesn’t show that option. Even if you use command line tool like wget or youtube-dl, it might be possible but for some reason it just doesn’t work. Another problem is that, there are still some javascript functions that need to be executed and until then, the video url does not get generated in the site. Now the question is, how would you download such files to your computer?

Well, there are different ways to download them. One way is to use a plugin or extension for the browser like Grab Any Media or FlashGot which can handle such requests and might allow you to download it. Now the thing is, lets say you want to download an entire set of videos that are loaded using different AJAX calls. In this case, the plugins or extensions, might work but it takes a long time to manually download each file. A better method would be to write a small script that automates this process. This tutorial aims to teach you guys on how to use the selenium web driver and do simple tasks like downloading dynamically loaded content in a website using python.

Prerequisites

For this tutorial, you need to have atleast some knowledge on how to program in python. If you don’t know anything about it, then I would suggest you to check out this site : http://www.tutorialspoint.com/python/. It is a great starting point to learn a new language and you can quickly learn the basics.

So, before we start, I would like to give an small introduction to the modules that I am going to use in my python script. The system that I’m using is a Ubuntu Studio 14.04. In order to install the modules, you can use python-pip and also you might need to have administrative privileges. Here are the modules as follows :

Selenium Web Driver : The selenium framework is a suite of tools that can be used to test web applications and also automate the web browser tasks. By using it’s provided API, you can do simple tasks like automating administration work for a website or some website-related maintenance, by sending commands to the browser. It is supported in various programming languages like Python, Java, Javascript, PHP, C#, Perl, Ruby. To install this framework for python, just type the following command in the terminal :

pip install selenium

BeautifulSoup : The bs4 is a HTML/XML parser that does a great job at screen-scraping elements and getting information like the tag names, attributes, and values. It also has set of methods that allow you do things like, to match certain instances of a text and retrieve all the elements that contain it. You can install this module like this :

pip install beautifulsoup4

Python-Wget : This module is a python port for the wget command-line program. Its easy to setup and you can quickly download videos or files to your system by using its API. You can also follow the “traditional” method of downloading files, like using the standard urllib module or by doing a subprocess call to the wget command-line program, but for this tutorial, I will be using this module to get the job done. Here is how you install it :

pip install wget

PhantomJS (Optional) : The PhantomJS is a headless browser (doesn’t have a front-end GUI and everything works at the backend) that is used for web page interaction. It is similar to the selenium web driver but the difference is that, it is headless. For this tutorial, I will be using the basic Firefox web driver via selenium, but you can test this out if you do not want a browser to popup every time the script runs. You can download the phantomjs executable from their homepage, but I advise you to use a downgraded version of this as the latest one might not be compatible with selenium module.

The Concept

The idea is basically :

To get the web page using the selenium web driver.
Parse and extract the video or audio urls from the html page using BeautifulSoup.
Download the files to the system using wget.

Step 1

The first step we need to do is import the necessary modules in the python script or shell, and this can be done as shown below :

# The standard library modules
import os
import sys

# The wget module
import wget

# The BeautifulSoup module
from bs4 import BeautifulSoup

# The selenium module
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

From the selenium module, we import the following things :

webdriver : This submodule has the functionality to initialize the various browsers like Chrome, Firefox, IE, etc.
Keys : This allows us to send key presses or inputs to the web driver.
WebDriverWait : The WebDriverWait is similar to the sleep function of the time module. This function tells the webdriver to wait for "n" seconds.
expected_conditions : There are common conditions that the web driver takes into account like, for example, a condition would be for an element to appears in the browser or when the title of the page is somename. Here is a list of all the available conditions ¹ :
- title_is
- title_contains
- presence_of_element_located
- visibility_of_element_located
- visibility_of
- presence_of_all_elements_located
- text_to_be_present_in_element
- text_to_be_present_in_element_value
- frame_to_be_available_and_switch_to_it
- invisibility_of_element_located
- element_to_be_clickable - it is Displayed and Enabled.
- staleness_of
- element_to_be_selected
- element_located_to_be_selected
- element_selection_state_to_be
- element_located_selection_state_to_be
- alert_is_present
By : The By class allows us to select HTML elements in the web driver by class name, id, xpath, name, hyperlinks, etc.

Step 2

Now, according to the concept, for a single video url that is loaded using an AJAX call, we need to get the web page using the selenium webdriver. This is done as follows :

driver = webdriver.Firefox() # if you want to use chrome, replace Firefox() with Chrome()
driver.get("http://www.example.com") # load the web page

When you execute this in the python shell or via the script (after you import the modules), you will observe that, a firefox browser will popup and a page will be loaded into it. If you want to use the PhantomJS and stop the browser from popping up, then just replace the webdriver.Firefox() with webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']).

Step 3

Here is the tricky part, what you need to do is extract the video urls from the web page. As every website is designed differently, you don’t have an accurate solution. You would need to manually check for a pattern or the video element that is dynamically loaded. This can be done by looking at the browser’s developer console. From the previously mentioned scenario, lets say the video is dynamically loaded using a AJAX call after 1 sec you visit the website. Then you would need to wait till the video is loaded and then get the element. So, for that you can write the script in this manner :

WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID, "the-element-id"))) # waits till the element with the specific id appears
src = driver.page_source # gets the html source of the page

When you execute these two lines in the python shell, it will tell the browser to wait for 50 seconds by default until the element with the specific id appears or is visible on the screen, and then get the html source. Now if the element doesn’t have an ID, there are other ways you can get the specific element/tag. Lets say the element has a certain class, then you can just replace the By.ID with By.CLASS_NAME. Here is the entire list of attributes for the By class object ² :

ID
XPATH
LINK_TEXT
PARTIAL_LINK_TEXT
NAME
TAG_NAME
CLASS_NAME
CSS_SELECTOR

Step 4

Once you get the HTML source, you would need to parse it and extract the video tag from it. This is done as shown below :

parser = BeautifulSoup(src,"lxml") # initialize the parser and parse the source "src"
list_of_attributes = {"class" : "some-class", "name" : "some-name"} # A list of attributes that you want to check in a tag
tag = parser.findAll('video',attrs=list_of_attributes) # Get the video tag from the source

The list_of_attributes is a python dictionary, with (key,value) pairs which specify the tags attributes. The parser.findAll() searches the entire HTML source and gets the video tags with the specific attributes. This generates a multi-dimensional array and is stored in the tag variable.

Step 5

The next step is to get the url from the video tag and finally download it using wget. We can do this by writing the script in this manner :

n = 0 # Specify the index of video element in the web page
url = tag[n]['src'] # get the src attribute of the video
wget.download(url,out="path/to/output/file") # download the video

Depending on the number of videos loaded in the web page, you can specify which video you want to download. This can by done changing the value of n.

Step 6

Finally, once the job is done, we close the driver :

driver.close() # closes the driver

The script

Now, when you put all the pieces together, and with some additional functionality to login to a website, you will get something like this :

# The standard library modules
import os
import sys

# The wget module
import wget

# The BeautifulSoup module
from bs4 import BeautifulSoup

# The selenium module
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Firefox() # if you want to use chrome, replace Firefox() with Chrome()
driver.get("http://www.example.com") # load the web page

# for websites that need you to login to access the information
elem = driver.find_element_by_id("email") # Find the email input field of the login form
elem.send_keys("Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.") # Send the users email
elem = driver.find_element_by_id("pwd") # Find the password field of the login form
elem.send_keys("userpwd") # send the users password
elem.send_keys(Keys.RETURN) # press the enter key

driver.get("http://www.example.com/path/of/video/page.html") # load the page that has the video

WebDriverWait(driver, 50).until(EC.visibility_of_element_located((By.ID, "the-element-id"))) # waits till the element with the specific id appears
src = driver.page_source # gets the html source of the page

parser = BeautifulSoup(src,"lxml") # initialize the parser and parse the source "src"
list_of_attributes = {"class" : "some-class", "name" : "some-name"} # A list of attributes that you want to check in a tag
tag = parser.findAll('video',attrs=list_of_attributes) # Get the video tag from the source

n = 0 # Specify the index of video element in the web page
url = tag[n]['src'] # get the src attribute of the video
wget.download(url,out="path/to/output/file") # download the video

driver.close() # closes the driver

Conclusion

What we did in this tutorial is, to create a small script that automates the process of downloading a file which is dynamically loaded. The above script works for a single url. If you want to download multiple files, then you would need to manually grab the tags and dynamic content information of each website and store them in json or xml file. Then you would need to read that file and pass it through a forloop. I created another small script that does this job. Its not full proof but is a good starting point for you guys to get an idea on how to do it. It also accepts command line arguments that allow you to download either a single video file or by taking in a file that contains all the video urls. You can get the script here : https://gitlab.com/snippets/8921.

Please keep in mind that, you will encounter some websites which are so secure, that even though what you do, you just cannot download that video or file. This is because, they are designed in such a way that, the urls for the files are generated with unique id and is embedded into the site.

I hope you find this tutorial useful and learned something new.

Source : https://www.techbeamers.com/locate-elements-selenium-python/

https://dvenkatsagar.github.io/tutorials/python/2015/10/26/ddlv/

Vidéos YouTube

Python : Selenium

Selenium snippets

Importing

Starting up a browser

But Python can’t find chromedriver

But Python can’t find Chrome/Firefox

Typing in a form

Clicking buttons or links

Scrolling (if you get an error something’s not in view, `ElementNotVisibleException`)

Trying to get something that might not exist

Getting text and attributes

Installing Selenium and ChromeDriver on Windows

Installing ChromeDriver

STEP ONE: Downloading ChromeDriver

STEP TWO: Unzipping ChromeDriver

STEP THREE: Moving ChromeDriver somewhere sensible

Installing Selenium

Installing Chrome

Test it

Vidéos YouTube

Python : Selenium

Importing

Starting up a browser

But Python can’t find chromedriver

But Python can’t find Chrome/Firefox

Typing in a form

Fill out a dropdown

Clicking buttons or links

Scrolling (if you get an error something’s not in view, ElementNotVisibleException)

Trying to get something that might not exist

Getting text and attributes

Installing ChromeDriver

STEP ONE: Downloading ChromeDriver

STEP TWO: Unzipping ChromeDriver

STEP THREE: Moving ChromeDriver somewhere sensible

Installing Selenium

Installing Chrome

Test it

1. Locate Elements By Name

2. Locate Elements By ID

3. Locate Elements By Link Text

4. Locate Elements By Partial Link Text

5. Locate Elements By Xpath

6. Locate Elements By CSS Selector

7. Locate Elements By Tagname

8. Locate Elements By Classname

Quick Wrap Up – Locate Elements Using Selenium Python

Introduction

Prerequisites

The Concept

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

The script

Conclusion

Scrolling (if you get an error something’s not in view, `ElementNotVisibleException`)