WebHarvy's Pattern Recognition Extraction engine identifies repeating page structures - product grids, directory listings, search result rows - and automatically applies a capture template across all matching elements on a page. For an operator scraping an e-commerce category page, this means clicking once on a product name and once on a price, then watching WebHarvy propagate those selectors to every matching item without additional input. The engine handles both grid and list layouts and adjusts when element count varies between pages in the same session.
Regex-Based Capture extends the tool's reach to data that does not live in clean, isolated HTML elements. Phone numbers embedded in text blocks, product codes mixed into description strings, and inconsistently formatted pricing data can all be extracted by defining a regular expression pattern that WebHarvy applies to the raw text content of a captured field. The regex editor includes a live match preview window so operators can validate pattern accuracy against real page content before running a full extraction job. Automatic Pagination Handling detects common pagination patterns - numbered page links, next-button navigation, and infinite scroll triggers - and follows them without additional configuration in most cases. For sites using non-standard JavaScript-driven pagination, the manual pagination rule editor allows operators to define the exact navigation logic and specify a maximum page count or end-of-results stop condition.