A download task is a list of HTML or files to be downloaded (and potentially processed by Datapage extraction template). The verb download may be a confusing description, because a download task does not necessarily have to be from some location on the web. A download task can use a combination of windows file paths, windows folders (including subfolders), or internet urls as a source for its file list. This is important in the context of packages, because a web crawler does a good job of downloading and organizing files, but it cannot link directly to a datapage extraction template. So, a web crawler is often used to download an entire website and categorize it by file name, but then a download task is used to extract data to a spreadsheet or database.
It is important to note that a Datapage can only extract information from a HTML file. You can however, also use a download task without a datapage linked to it as a tool to download files and images in to a local folder.
In the Task Type box, select Download a list of web pages and check the Extract content to a…etc check box and click "Next".
The Datapage Extraction Template appears. From this box you can build a new Datapage or select a previously created Datapage. We are going to use a previously created Datapage.
Click "Select Datapage ".
This brings up a window with a list of the current datapages in the Console. We are going to browse to the Datapage we created earlier in the Datapage section. It is called Yahoo Finance Quotes. Select it and click "OK".
Notice that the selected Datapage is listed in the Select Datapage line.
The "Clear previous results from the destination Datastore before loading new data" check box is used to clear any data from the database table or spreadsheet defined in the Datapage’s Dataset(s). Remember that in the Dataset Wizard section we selected a destination Datastore. Our example had one Dataset for the STOCK_PRICES that saved to Excel and one Dataset for HEADLINES that saved to SQL. If there are extraction results in these tables and spreadsheets, checking this box will delete it. Remember these points:
1. If you are adding Rows/Records of new extracted data to your table or spreadsheet of existing Rows/Records, do not check this box.
2. If you are replacing the existing Rows/Records of extracted data in your table or spreadsheet with new extracted data, check this box.
3. You can edit this boxes selection from the Package’s edit Task properties Datapage tab anytime.
4. If you have this checked it will erase the data in your Destination Datastore.
5. If you have this unchecked and you are extracting a set of the same data with the Datapage, you will get duplicate Rows/Records.
A sample idea for our Package is to collect information about the ticker symbols used in our example on a daily basis over time for comparison. Earlier we had selected Metadata which included Package Start Time. We can put all of our extracted data in one table/spreadsheet and sort it by Symbol and then time. This will be easier than having to open up a spreadsheet from each day to compare change. We will leave this box unchecked.
When using an Excel sheet as an output, if you wanting individual workbooks for each day, the best way is to rename the Excel file after each days extraction with the date of extraction added to the file name. The Package will then recreate the Excel sheet with the Dataset Table Name each time the Package runs.