Real-Time Crawler was one of the few and one of the first large-scale scraping solutions for business offered by Oxylabs, which has recently been revamped into Scraper APIs, marking a strategic change.
We talked with Gabrielė Montvilė, Head of Account Management at Oxylabs, who provided insights into her career journey, as well as key trends, solutions, and challenges associated with web scraping.
Let’s start simple. Tell us how you got involved with the technology industry, web scraping, and what you find most interesting in it.
Before joining Oxylabs, I worked within the technology sector for several years, mainly focusing on driving digital transformations through existing SaaS or custom business solutions.
My roles revolved around B2B customers, service sales, and strategic account management. In other words, I was responsible for creating successful digital transformation projects, finding the right business value-driven solutions, and establishing long-term relationships with clients.
I started in a service sales role at Microsoft, positioning MS Dynamics solutions and implementation projects for B2B clients. Later, I moved to Salesforce and worked in a couple of different customer-focused roles there. I then continued my journey at TeleSoftas (a bespoke IT software delivery company) as a Key Account Manager.
After being an individual contributor for several years, I was looking for a new challenge that would allow me to move into a management role while remaining in the technology industry. This is how I joined Oxylabs, where I currently lead our Account Management department.
What’s particularly interesting about working with web scraping, is that you have an opportunity to work with some of the largest global companies and understand their data collection needs and processes. These, in turn, often influence their most important business decisions. For example, alternative data collected online often determines dynamic pricing strategies in e-commerce.
What main tendencies are shaping Oxylabs’ customer needs?
One of the overarching factors determining our client needs is the classic build versus buy decision. That is, whether a company decides to build their own web scraping solution or to outsource it to a third party.
Those companies that put emphasis on human resource efficiency, operational excellence, and scalability (usually, large enterprises) often prefer to outsource web scraping operations and, therefore, use our Scraper APIs. These solutions help address some of the most frequent challenges associated with web scraping, such as a lack of in-house expertise, low request success rates, inability to scale, or high maintenance costs.
On the other hand, some businesses prioritize building their own technical expertise and choose to concentrate their web scraping operations in-house. These clients usually use our IaaS solutions, such as Datacenter or Residential Proxies. While this approach has advantages, such as high levels of customization and security, it also brings some challenges.
For example, you have to ensure that you have the right skills in your team, manage associated infrastructure costs and dedicate time to web scraper maintenance. This can be difficult, since web scrapers need to constantly be adapted to ever evolving bot protection mechanisms.
To sum up, our clients’ needs are often driven by the state of their internal data collection operations.
So, what would you say are the primary challenges for those who don’t use Scraper APIs?
Since web scraping is rather a complicated process, there are multiple challenges faced by our clients. However, I do believe that our Scraper APIs address most of them.
To start with, the proxy acquisition costs are high and not always predictable. While most businesses that engage in web scraping use proxy providers, they are often charged per traffic rather than IP address.
This means that costs can increase substantially if a client is experiencing any challenges while scraping data online (e.g., failed scraping attempts due to target website layout changes or existing anti-bot measures). It’s a fairly frequent occurrence, especially if the company doesn’t have the deep technical expertise that is required for web scraping.
Data quality is another issue. I’ve seen what the regular output of a web scraper looks like and it’s not pretty. A lot of development and work goes into extracting data in a proper fashion and even small errors can lead to low quality data. As a result, a lot of effort has to go into ensuring scraped data quality.
To summarize, all these issues can largely be solved by using our Scraper APIs. Our extensive experience allows us to reduce scraping and proxy acquisition costs, while our innovative machine learning solutions make it easier to maintain data quality.
What kind of feedback do you receive about Scraper APIs (or Real-Time Crawler back in the day)? Do you have any success stories?
It’s interesting that you note both names of our solution. Once we decided to go ahead with the rebrand, we collected in-depth feedback from our partners and customers. Our switch to Scraper APIs has been regarded favorably since it made it easier to understand the purpose of the solution and clarified the value-add.
I recall two related success stories. First, there was a business intelligence company that was scraping e-commerce data and constantly experienced issues with frequent blocks and loss of data.
Due to these challenges, they used to dedicate most of their time to bug fixing rather than delivering real business value. Once they implemented one of our Scraper APIs, they were able to extract the data without failures and focus on data analysis instead, which is one of the key value drivers for their company.
Another success story has been with trivago, a travel fare aggregator. They had previously used an in-house solution powered by our Residential Proxies. Unfortunately, the data collection process was highly complicated as the travel fare aggregator needed data from over 200 countries. Transitioning to our Web Scraper API allowed them to solve their existing web scraping challenges and improve data collection outcomes.
What are the most frequent clients for Scraper APIs? Are there any unique use cases?
Well, most of the Scraper APIs do exactly what they say. I think the easiest to outline is the E-commerce Scraper API, which is primarily used by e-commerce companies, platforms, and websites. Competitor analysis stands at the forefront with use cases like dynamic pricing and product catalog mapping.
SERP Scraper API usage is highly dominated by marketing and SEO agencies. They usually scrape data from known search engines, allowing them to get insights into keyword ranking and associated trends, ad performance, brand monitoring, and many other areas.
Finally, our Web Scraper API, which has the widest range of possible uses is mostly being used for fraud protection and travel fare aggregation.
What would you do if you could get any publicly available data from the internet? The possibilities afforded by web scraping are nearly endless.