A powerful extractor utility

 

Extract targeted company contact data (email, phone, fax) from web for responsible b2b communication. Extract url, meta tag (title, desc, keyword) for website promotion, search directory creation, web research.

We are proud to introduce to you Web Data Extractor Professional, a powerful and easy-to-use application which helps you automatically extract specific information from web pages which is necessary in your day-to-day internet / email marketing or SEO activities.

With Web Data Extractor Professional you can automatically get lists of meta-tags, e-mails, phone and fax numbers, etc. and store them in different formats for future use.

A number of precise settings and filters makes Web Data Extractor Professional the most universal and flexible data extracting application

Web data Extractor Professional maintains Custom Data extraction. This enables you to extract different items of information that are distinctly structured. For example, you need to form a list of products of specified online store. With the help of Visual Expression Builder you can form such a list that you may further use for your own website, research, etc.

v3.7 (Released 28.02.2017):

  • Improved work of "Search Engines" mode
  • Improved "Remove HTML Tags" and "Page must contain the following text to extract data" filters
  • Added "Use country IP filter" filter which allows to exclude results of servers which does not related (by geolocation) to country selected in "Search Engines» option
  • Significantly improved email parser and «Custom Builder» parser
  • General improvements in data detection and extraction
  • We also made various minor changes and improvements based on feedbacks from our customers

v3.6 (Released 22.08.2016):

  • Added checkbox "Get redirected URL" on the "Custom Data Editor" form to extract urls (e.g. website addresses) that are presented through a redirect
  • Added checkbox "Mark Non-Responding Proxies Like Inactive Automatically". If during the session proxy server determined as «bad» (not working), it is automatically marked as inactive, and it’s not used in the session
  • Added new option "Use single line merge" to merge data into a single string. For example, you can export t-shirt colors like: "T-Shirt", "Black, Yellow, Red, Green»
  • Significantly improved loading of public proxy servers from the Internet
  • "Human Factor" option has been improved
  • Improved a parser of closed by JS email adresses
  • Improved option of passing Google-captcha when searching data via Google
  • We also made various minor changes and improvements based on feedbacks from our customers

v3.5 (Released 28.10.2015):

  • Significantly improved mechanism of searching data through search engines (added a mechanism to work with Google captcha etc.)
  • Added the ability to capture cookies (new button «Capture Cookie») and run a session with cookies (it is very useful in cases where the parameters of the search forms through cookies)
  • Added ability to import a proxy servers from the service where laid out fresh proxies every 30 minutes. Imports about 100-140 proxies. Each new import changes the earlier downloaded list. During the session, the server which became 100% inoperative, will automatically become inactive so in the list remain only actual servers
  • Added a new parser to decrypt hidden by javascript email addresses
  • Revised and improved server errors handling, which has a positive impact on work through proxy servers
  • Fixed email/fax adresses parser
  • Various minor improvements

v3.4 (Released 03.09.2015):

  • Improved parser of javascript protected email adresses, added 2 new decoders
  • Improved algorithm for merge the data for export
  • Added checkbox "Add in results" in filter "URL Filter: Page must contain the following text to extract data". If you turn it on, then the results table will have with the keywords of this filter, that satisfy the search criteria when retrieving data
  • Improved parser of links, added case that cover not quoted links in the page sources
  • Software improved for work with large data
  • Improved export data mechanism
  • Improved filter mode "Url List" work
  • Added recognition of servers that do not support the issuance of uncompressed content and a form correct request to such servers
  • Added new search engine - IXQUICK. It does not safe IP and searches in main search engines. With this engine you can spider for days without beeing blocked
  • Fixed "Object null reference" issue
  • Various minor additions/fixes

v3.3 (Released 05.05.2015):

  • Improved parser of javascript protected email adresses
  • Improved handling of network errors. Now better recognized temporarily unavailable pages, for example due to high activity on the server
  • Added use of regular expressions in filters. To recognize a regular expression, please enclose it between the symbols "^" and "$"
  • Added detection of specific symbols of the German language in urls
  • Added "Recovery" button in the settings. It allows you to export all the collected data for the selected date range, even if the main database of the program has been damaged for some reason
  • Added the ability to export data to Excel file format
  • Added the ability to save the results in multiple files, if there are too many results. For example, you can specify that the file is saved in one of 10,000 lines (supported range of values 1 - 1,000,000) and we get the results - main file "Results.(txt|cvs|xlsx)" and more, automatically generated files "Results_XXXX.(txt|cvs|xlsx)" for each additional 10,000 lines
  • Greatly improved algorithm for traversing large sites containing millions and tens of millions links
  • Various fixes/additions based on your feedbacks

v3.1 (Released 05.09.2014):

  • Added the ability to edit url and email filters in stopped session, and then to continue with already edited filters
  • Added the ability to download the list of proxy servers from the text files (*.txt). Also we added support of files with format like “host:port”
  • Added progress in percents for requests. Now the list of requests updates very quickly
  • Added the name of proxy, through what the request to field “Title” is sending (for running requests only)
  • Improved the dispatch on proxies – now with a big list of proxies it works much more efficiently

v3.0 (Released 23.06.2014):

  • Added support of working with proxy servers' list
  • Various small fixes/additions based on your feedbacks

v2.3 (Released 08.01.2014):

  • Added the ability to retrieve data with preservation of custom HTML markup (checkbox "Remove HTML tags")
  • Improved extraction of meta tags
  • Fixed errors when exporting to csv
  • Various small fixes/additions based on your feedbacks

v2.2 (Released 15.05.2013):

  • Bug fixes and improvements on customers requests
  • "Remove duplicates" option added to "Email Filter"
  • New regular expressions builder for custom data search. Now you can choose one or two similar text blocks. It works even with one block chosen. It can be useful when you need to extract company details, for example address, phone numbers etc and the webpage has information only for one company.

v2.1 (Released 27.12.2012):

  • Significantly enhanced phones and faxes parser
  • Additional filters for phones and faxes are added. Now you can indicate which figures should be comprised to the phone number as well as maximum length of phone/fax number
  • Now you can upload/download session settings to/from the file
  • Command line arguments support is added: you can start session from the saved setups and indicate the file for input records
  • We added a possibility to make “advanced” custom data results merge – now you can gather structured data from online shops and other places

v2.0 (Released 29.08.2012):

  • Visual expression builder - it was never that easy to configure your custom extraction expressions
  • New updated help with a lot of use cases and examples
  • Merging results when saving to a file or copying to the clipboard
  • Great results filtering option
  • New feature - "Collect Domains Without Emails"
  • Many visual and engine changes/fixes

v1.2 (Released 07.06.2012):

  • Ability to scan RSS feeds is added
  • Program sustainability to the physical damage of the database is added
  • Improved streams control, which has a positive impact on the overall performance
  • Ability to determine such types of email adresses as info[at]mail.com and info(at)mail.com is added
  • Decoding of the hidden email adresses using java script is added
  • Scan time is now displaying without a split seconds and includes a days indication
  • Improved work with a large list of keywords in "Search Engines" mode
  • Added quotes support in keywords to search for exact phrases and words in Google
  • Reworked the algorithm for determining the depth of scan (Url Depth)
  • Improved filter to screen out potentially incorrect phones and faxes
  • Added ability to set the "Fixed Number Pages" in "Search Engines" mode
  • Added ability to define tag "
  • SQLite engine updated
  • Search for new countries added: Arabia, Argentina, Chile, Philippines, Singapore. Also checked and corrected the existing list of search queries
  • Various fixes

The current version of 'Web Data Extractor Professional' is distributed as shareware.

Requirements: Windows XP/Vista/7/8/10, 32 MB RAM, 50 MB Hard Disk Space.