Creating a New Datapage

 

 

Now that you have installed WSP+ 4.0, read about data extraction strategy, and become familiar with the WSP+ 4.0 Console it’s time to start extracting data. The Datapage is the tool you will use. The Datapage is WSP+ 4.0’s scraping engine. The Datapage is a template that defines what data you want to get and where you want to store it. WSP+ 4.0 stores this template to execute the data extraction over and over again on the same site specific data. As its name suggests, it maps directly to a specific webpage layout. The Datapage wizard will guide you through the process.

A Datapage consists of two main components. The first is the Dataset and the second is the Field.

A Dataset maps groups of data on a webpage to a Table/Worksheet and rows in a database/spreadsheet. A Dataset determines the number of "Records" or "Rows" that you are going to extract. For example, a ecommerce product catalog with 10 products per page has 10 records. A movie review site that has 1 movie review per page has 1 record. A Dataset also defines the database and table name in the database. The database name is defined in the Dataset properties and the table is named after the Dataset name by default.

The field represents a column of data in the record. For example, in the ecommerce product catalog all the information for one product makes up a record. In the record you would have fields such as SKU, Product Name, Product Description, and Price. If you are extracting contact information an example of fields would be Name, Address, Phone Number, City, State, Zip, etc.

For intermediate to advanced users, web pages with compound data structures can be extracted by creating multiple Datasets in each Datapage. Datapages can also be linked together to "walk" entire site hierarchies by extracting URL links from one page to be used as a source list for another Package Task which uses a second Datapage.

The following is a step by step tutorial of how to create a datapage using the New Datapage Wizard.