Easyweb Tutorial

0 views

Skip to first unread message

Luz Tonks

unread,

Aug 4, 2024, 3:36:17 PM8/4/24

to unbioboword

Websitescreated on EasyWeb platform are secure, yet easy to develop. This tutorial demonstrates how a developer can create a website that uses uploaded comma-separated file to display temperature data to authorized users.

When demo website is selected, the Console for the website is displayed. The developer can either visit the user-facing website or start managing it. The developer wants to update the starting page, so they select "Manage website pages".

Before the developer can modify code, comma-separated file containing location temperatures must be uploaded. That can be done by "Manage global files" console. The file data.csv has the following content:

The main difference between Scrapy and other commonly used libraries, such as Requests / BeautifulSoup, is that it is opinionated, meaning it comes with a set of rules and conventions, which allow you to solve the usual web scraping problems in an elegant way.

In this tutorial we will create two different web scrapers, a simple one that will extract data from an E-commerce product page, and a more "complex" one that will scrape an entire E-commerce catalog!

Scrapy comes with a built-in shell that helps you try and debug your scraping code in real time. You can quickly test your XPath expressions / CSS selectors with it. It's a very cool tool to write your web scrapers and I always use it!

In this case, there isn't any robot.txt, that's why we got a 404 HTTP code. If there was a robot.txt, Scrapy will by default follow its rule set. You can disable this behavior by changing ROBOTSTXT_OBEY in product_scraper/settings.py:

You may wonder why the parse method can return so many different objects. It's for flexibility. Let's say you want to scrape an E-commerce website that doesn't have any sitemap. You could start by scraping the product categories, so this would be a first parse method.

This method would then yield a Request object to each product category to a new callback method parse2(). For each category you would need to handle pagination Then for each product the actual scraping that generate an Item so a third parse function.

With Scrapy you can return the scraped data as a simple Python dictionary, but it is a good idea to use the built-in Scrapy Item class. It's a simple container for our scraped data and Scrapy will look at this item's fields for many things like exporting the data to different format (JSON / CSV...), the item pipeline etc.

You can add several XPath expression to the same Item field, and it will test it sequentially. By default, in case Scrapy could successfully more than one XPath expression, it will load all of them into a list. You can find many examples of input and output processors in the Scrapy documentation.

It's really useful when you need to transform/clean the data your extract. For example, extracting the currency from a price, transforming a unit into another one (centimeter to meters, Celsius to Fahrenheit) ...

I also added a price_in field, which is an input processor to delete the dollar sign from the price. I'm using MapCompose which is a built-in processor that takes one or several functions to be executed sequentially. You can attach as many functions as you like. The convention is to add _in or _out to your Item field's name to add an input or output processor to it.

Pipelines are represented by plain classes which implement a process_item method. When your spider runs, it will call that method for each product item it earlier created and each configured pipeline instance will perform its validation and post-process steps on that item.

Should there be a price, it will first try to sanitize the price string and convert it to a float value. Finally it will check if the value is within our (arbitrary) limit of USD 10 and USD 100. If all that checks out, it will store the sanitized price value in our item and pass it back to the spider, otherwise it will throw a DropItem exception to signal to our spider, that it should drop this specific item.

Voil, we have configured our pipeline class and the spider should now call it for each item and validate the price according to our business logic. All right, let's take that a step further and add a second pipeline to persist the data in a database, shall we?

For our example, we pick MySQL as our database. As always, Python has got us covered, of course, and there's a lovely Python MySQL package, which will provide us with seamless access to our database. Let's install it first with pip

We first implemented the from_crawler() method, which allows us to initialize our pipeline from a crawler context and access the crawler settings. In our example, we need it to get access to the database connection information, which we have just specified in product_scraper/settings.py.

All right, from_crawler() initializes our pipeline and returns the instance to the crawler. Next, open_spider will be called, where we initialize our MySQL instance and create the database connection. Lovely!

Now, we just sit and wait for our spider to call process_item() for each item, just as it did earlier with PriceValidatorPipeline, when we take the item and run an INSERT statement to store it in our products table. Pretty straightforward, isn't it?

When we now run our crawler, it should scrape the specified pages, pass the data to our price validator pipeline, where the price gets validated and properly converted to a number, and eventually pass everything to our MySQL pipeline, which will persist the product in our database. ?

Just like our original spider, the CrawlSpider will crawl the target website by starting with a start_urls list. Then for each url, it will extract all the links based on a list of Rules.In our case it's easy, products has the same URL pattern /products/product_title so we only need filter these URLs.

? One thing to still keep in mind is rate limits and bot detection. Many sites use such features to actively stop scrapers from access their data. At ScrapingBee, we took all these issues into account and provide a platform to handle them in an easy, elegant, and scalable fashion.

In this post we saw a general overview of how to scrape the web with Scrapy and how it can solve your most common web scraping challenges. Of course we only touched the surface and there are many more interesting things to explore, like middlewares, exporters, extensions, pipelines!

If you've been doing web scraping more "manually" with tools like BeautifulSoup / Requests, it's easy to understand how Scrapy can help save time and build more maintainable scrapers. I hope you liked this Scrapy tutorial and that it will motivate you to experiment with it.

For further reading don't hesitate to look at the great Scrapy documentation. We have also published our custom integration with Scrapy, it allows you to execute Javascript with Scrapy, so please feel free to check it out and provide us with any feedback you may have.

Forgetthe Java and the HTML!*

JustClick and Type to Create a

ReallyCool Web Page Using

Netscape Composer:

A Tutorial

by gwynethanne bronwynne jones

library media/technologyspecialist

First of all I gottasay that there's a really good tutorial buried deep on Netscape's mainsite but it's sorta hard to find if you don't have the link. But, if you don't have it, hey... why not use mine? I frequently quotefrom it, plus I utilize a lot of the graphics that they use... (with fullcredit, of course!) Besides, you're already here, aren't you? Now, this tutorial is geared towards Netscape Composer 4.7 and higher,but that might change 'cause it seems like Netscape is coming out withnew versions of Communicator every day..so stay tuned.

Table of Contents

"Composerhelps you create and edit your own web pages and place (or "post")them on the web. Composer looks and acts like a word processing program,and is just as easy to use. Behind the scenes of each page on the web iscode called HTML, which stands for Hypertext Markup Language. This namemeans that HTML is is the computer language used for marking up documentswith links in them. The code, called HTML source code, tells the web browserhow a page should look and act. You don't need to know how to write HTMLcode to make a web page. Composer enables you to work in a WYSIWIG (whatyou see is what you get) environment - just like a word processor - andit automatically generates the HTML code you need to make your page dowhat you want it to do. The basic elements of a web page aretext, pictures, and links. You can put all of these and more on yourown pages, and then publish those pages on the web."

Saving, collecting, and organizing bookmarks within Netscape Communicator is a whole lot of fun....Really!

I mean, have you ever found a really*perfect* site only to go elsewhere and not be able to find your way backagain? doh!

The bookmark is your virtual breadcrumb to leave behind you so that you can go back again to a great site.

Well, here's how to organize and savethose bookmarks so that you can always find that perfect site for thatperfect lesson! And if you save them to a disk, you can share themwith a colleague, take them to a conference, or even create a quick webpage!

After you save several bookmarks (bygoing up to Bookmarks and choosing Add Bookmark) of the sametopic you probably want to organize them and put them into a folder. Believe me, this is very helpful, easy to use, and it looks great.

To do this go up to Bookmarkson your toolbar and select Edit Bookmarks (or you can use the shortcut OpenApple and the letter "B"). This will open up yourbookmarks file. (see bookmark screen capture)

How to Get rid of aBookmark:

"Many of thepages you see when you browse the web were made by ordinary people likeyou. You don't have to be a computer whiz - you can use Netscape Composer,part of Netscape Communicator, to create, edit, and publish your own webpages."