9 Web Scraping Legal Questions for 2022

//

In 2021, the two big stories in the world of web scraping and the law were the Supreme Court’s decision in Van Buren v. United States, and the Supreme Court’s decision to vacate the hiQ Labs opinion from the 9th Circuit in 2019 and remand it for further consideration light of Van Buren.

As I wrote when those cases first were published, those decisions left more questions unanswered than they answered. If anything, it feels as if there had been a pendulum in the law that was shifting toward greater permissiveness with web scraping, and that the pendulum has now swung in the other direction.

I fear there is greater uncertainty in the law surrounding web scraping today than there was in 2020.

So where does that leave us in 2022? This year, there does not appear to be anything as consequential as Van Buren on the horizon. But there are plenty of important cases out there, and plenty of significant issues that I expect to see on a recurring basis throughout this year (and beyond). Here are nine of the most notable web-scraping legal questions I hope to see answered in 2022.

What will the 9th Circuit decide for hiQ Labs II?

After its decision in Van Buren, the Supreme Court told the Ninth Circuit that it had to reconsider its prior decision in the matter of hiQ Labs, Inc. v. LinkedIn Corp. Previously, the 9th Circuit’s decision in hiQ Labs was the most favorable web scraping legal opinion in US history.

I didn’t see anything in the Van Buren opinion that made me think that a remand was necessary. But the Supreme Court disagreed.

My prediction is that the result on the CFAA claim will be the same, but that the 9th Circuit will have to provide a different rationale for its opinion that more closely follows from the reasoning of Van Buren.

From the perspective of pro-scraping advocates, the question is whether the language of hiQ Labs II will be as pro-open internet as hiQ Labs I. My guess is that it won’t be. Not because the 9th Circuit isn’t generally an advocate of an open internet, but because the complexities related to this issue have become more apparent in the intervening years.

After Van Buren, what’s a “gate” and what isn’t?

The Supreme Court in Van Buren said that liability under both clauses of the CFAA stem from a “gates-up-or-gates-down” inquiry. What they didn’t tell us is what types of technologies and policies constitute “gates.” That is for the lower courts to decide.

To date, lower courts have been reluctant to provide this clarity. Based on my research, I’m not seeing any meaningful reduction of liability risk under the CFAA in the civil context after Van Buren, as many of us had hoped. If anything, courts seem more eager to punt on it after Van Buren. The language of Van Buren is just so indirect and full of hedges, and questions related to the CFAA are usually so technical, that courts don’t want to make any inferential leaps, no matter how logical they might seem. Either way, you could read footnotes 8 & 9 of the Supreme Court’s opinion in Van Buren and then read the facts of any web scraping case and Rorschach test your way into just about any conclusion.

And the conclusion that most lower courts have reached thus far is that they don’t want to deal with it.

Will any lower court stick its neck out and say that accessing publicly available data is per se legal?

One of the key legal arguments in favor of web scraping is that everyone should have a legal right to access publicly available data that is not protected by a log in or authentication barrier, assuming that this information is not otherwise protected by copyright.

That seemed to be the logical direction that the law was heading after hiQ Labs in 2019. But as I said before, the trend seems to be shifting in the other direction now.

So far, to my knowledge, only one court has answered the question of whether there is a per se rule permitting access to publicly available data after Van Buren. And that case answered the question in the negative. Will any court reach the opposite conclusion in 2022?

Are there limits to the enforceability of browsewrap contracts in the “utter absence of assent?”

So much of the attention related to web scraping and the law is focused on the CFAA, and for good reason. It’s a federal law and it contains a criminal component. But as a practical matter, if you’re a company that’s in the business of web-scraping, the law that should probably make you the most nervous is breach of contract.

In just about every jurisdiction in the United States, with few exceptions, if you continue to scrape a website after receiving a cease-and-desist letter, you’re likely subject to a breach of contract claim.

There are countless arguments why that should not be the case. But right now, in most places in the United States, it still is. Will courts start to reel in the worst excesses of this regime in 2022?

Is this the year that privacy regulators get tough on web scrapers?

If you’re a web-scraping business that collects PII, it’s not easy to comply with the GDPR (Europe’s privacy law), the CCPA (California’s privacy law), the VCDPA (Virginia’s privacy law), or the soon-arriving CPA (Colorado’s privacy law).

But to date, we haven’t seen a ton of aggressive regulatory actions against web scrapers that collect PII. Whether that’s because they’re not on regulators’ radars, because they’re building their cases against them, or for some other reason, I could not tell you. But sooner or later we’re going to see some high-profile scraping-related regulatory actions for privacy issues. It’s not a question of if but when. Will it happen in 2022?

Will trespass to chattels claims experience a revival this year?

One of the dumbest laws that relates to web scraping is trespass to chattels. It’s a law that says that if you burden someone’s servers, you could be held liable for doing so. This law isn’t dumb because it exists, but rather because of the way that it’s applied.

In 2022, most companies’ web infrastructure is hosted on services like AWS. And the reality is that most web scrapers are perfectly capable of collecting the data they need to collect without having any impact whatsoever on a company’s web infrastructure.

Nonetheless, in 2021, the Northern District in California in the hiQ Labs case found that the collective impact of web scrapers (over 95 million requests per day) on LinkedIn was sufficient to establish a cause of action for trespass to chattels, even though there was no real evidence that the individual impact of the defendant’s scraping had caused an impact on LinkedIn’s servers.

This was a terrible decision, in my opinion. By that logic, any web scraper, no matter how infinitesimally small their impact on a plaintiff’s infrastructure, could be liable for trespass to chattels. That’s like holding an individual liable for destruction of public property for driving the speed limit on a public road, simply because the cumulative impact of drivers on roads leads to wear and tear on public property. That’s not how the law should work.

But, if plaintiffs can argue that in court with success, there’s nothing to dissuade plaintiffs from pursuing those claims. If that’s the case, it’s only logical that we might expect a return of these claims in 2022.

Is this the year antitrust cases against the big data monopolies finally gain traction?

In 2019, the 9th Circuit wrote in hiQ Labs, Inc. v. LinkedIn Corp.:

Although there are significant public interests on both sides, the district court properly determined that, on balance, the public interest favors hiQ’s position. We agree with the district court that giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.

This was powerful language, in that it gave serious legal consideration to the recurrent power dynamics that are almost always at play in web-scraping litigation. Namely, a plaintiff that is one of the biggest power players in a given industry, pursuing litigation against an up-and-comer that is doing new and innovative things with data hosted on their sites.

Historically, courts have sided with the incumbents in controlling who does and does not have a right to access data on their sites. But there are very real policy benefits to providing broader access to publicly available data: consumer choice, lower prices, greater innovation, reduced monopolistic and rent-seeking policies, etc.

The 9th Circuit recognized that dynamic and provided redress for the web scraper in that case. Will we see other courts latch on to or build of this premise in 2022?

Is this the year third-party scraping services get caught up in major litigation?

Most of the major web-scraping services on the internet are: located outside the United States, and have thus far managed to avoid the brunt of major litigation in the United States.

But, since what they are doing is at times (and in some contexts and circumstances) illegal in certain jurisdictions in the United States, it’s a matter of time before one or more of these companies get dragged into court in the United States.

It hasn’t happened yet, in large part because usually it’s direct competitors who get brought into litigation. It’s not the ones who do the scraping that get caught in the net; it’s the ones that are doing things with the scraped data. But sooner or later this will happen, and there will be fierce jurisdictional battles fought over who litigates where, and under what circumstances.

Will there be a bifurcation of the law with respect to research and commercial uses of scraping?

One thing that matters in certain parts of the law is not just what you’re doing, but the context in which you’re doing it. For example, in the context of copyright infringement and fair use, it matters whether you’re using something in the context of commercial competition or whether you’re doing it for a non-commercial reason. For example, a high school teacher that excerpts from a copyrighted work is going to be given far more leeway to borrow from that copyrighted work than a potential competitor who is looking to sell something that competes with someone who owns a valid copyright.

Similarly, there are commercial and research applications of web scraping. And some scholars have called for clear legal exemptions for scholarly uses of public data.

One of the reasons that web scraping is such a tough legal issue is that there are such pro-social and anti-social uses of the technology. The same technology that fuels muckraking journalism and the COVID-tracking project is responsible for hacking and publishing private data on the dark web.

One way for courts to “split the baby,” so to speak, is to create a differing standard for pro-social and anti-social uses of the technology. The problem with that approach is that it may prejudice certain “neutral” commercial uses of the technology.

But it is one possible solution that I could envision.

Either way, I think we are far from a stable equilibrium in terms of the law that governs web scraping. The technology is too common and too important be made “illegal” in any absolute way. But there are too many anti-social applications of the technology to make it legal without nuance, either.

Before we can achieve any potential equilibrium, courts will have to think deeply on these issues and come up with meaningful, long-term solutions to these challenges that will serve us in the decades to come. I don’t not know what those solutions are yet, but I feel confident in saying that we’re not yet there.