Web scraping has recently come under scrutiny from regulatory agencies that investigate securities laws.
Increasingly, investment firms have been using web scraping as an alternative form of data collection. To bolster more traditional data sets like SEC filings and financial statements, investment firms have been going directly to websites and online resources to get information for their investment decisions. This situation raises unique questions as to the legality of using that information for investing decisions.
In 2020, for the first time, the SEC’s Office of Compliance Inspections and Examinations (OCIE) included “alternative data” sourcing among its Examination Priorities, stating:
Examinations will focus on firms’ use of these data sets and technologies to interact with and provide services to investors, firms, and other service providers and assess the effectiveness of related compliance and control functions.
Current SEC examinations are applying this rubric, the SEC’s upcoming 2021 Examination Priorities will likely reiterate this priority, and a SEC Risk Alert concerning firms’ use of alternative data is believed to be forthcoming. This focus on alternative data brings into sharper relief potential risks concerning material non-public information (MNPI) and privacy for web scraping.
The language of the Examination Priority emphasizes compliance and control of alternative data sets. Among these compliance obligations, Section 204A of the Investment Advisers Act requires all investment advisers—even those exempt from SEC registration—to “establish, maintain, and enforce written policies and procedures” to prevent the misuse of MNPI. Because an investment adviser includes anyone who issues reports regarding securities for compensation, a web scraping firm that supplies data to an investment firm could be subject to the same procedural and policy standards regarding MNPI as investment firms—if not necessarily the same level of SEC scrutiny.
Web-scraping raises potential insider-trading 10(b) liability:
Web scraping or data mining for investment information potentially runs afoul of Section 10(b) of the Securities Exchange Act and its affiliate SEC Rule 10b-5. These laws prohibit the use of “manipulative or deceptive devices” or “artifices to defraud” in connection with the sale of securities and are commonly used to police insider trading. When an insider—say a board member—trades based on MNPI not known to the general public, this acts as a deceptive device prohibited under Section 10(b).
However, as articulated by SEC v. Dorozhko, a 2009 case out of the influential Second Circuit, the plain language of Section 10(b) prohibits all deceptive devices in trading and does not limit liability to insiders. Under this interpretation, anyone who obtains access to and trades based on MNPI could be liable under Section 10(b) regardless of whether that person owes a duty to the source of information.
Although there have been no published opinions examining web-scraper liability under 10(b), the conduct examined in Dorozhko concerning a hacker who gained access to a health company’s private internal network and used that information to trade in the company’s stock can be analogized to web scraping. Like a hacker, many web scrapers access information they are unauthorized to access.
Unfortunately, there was no final statement in Dorozhko as to whether the purported hacker had engaged in deceptive conduct under 10(b)—the Second Circuit remanded to the lower court to determine this question, at which point Dorozhko “disappeared” and the SEC received summary judgment. However, the Second Circuit left us this following useful information: that someone who “misrepresent[s] one’s identity in order to gain access to information that is otherwise off limits and then steal[s] that information” is engaging in “plainly deceptive” conduct. 574 F.3d 42, 51 (2d Cir. 2009).
However, deceptively accessing information is not likely by itself a violation of 10(b). The information accessed must be MNPI, material and non-public for the web scraper to gain an unfair advantage over other investors. The distinction between accessible public information and non-public information likely lies with whether access to the information is limited from general users. A court interpreting this issue may look to Dorozhko, or hiQ Labs, Inc. v. LinkedIn Corp., decided by the 9th Circuit, and Facebook Inc. v. BrandTotal Ltd., decided by the Northern District of California, and query whether the information accessed by the web scraper has been demarcated as private by an authorization system (i.e. a company’s internal network) or password protection (a social media profile).
Additionally, the information scraped must be material, in that it would influence a reasonable investor’s decision, to be punishable under 10(b). Scraping data from one profile is not likely material, unless that person has real influence over a company or industry, but aggregating profile data from many thousands of users may be. And, of course, most web-scraping firms are aggregating data to investment firms for the specific purpose of influencing investment decisions.
If your business scrapes data and either uses that data or sells that data for investment decisions, it is imperative that your business complies with the laws of the jurisdictions where this information is scraped. The risks of failing to do so have never been higher.