Using HTTP proxy with Puppeteer

Japanese dolls

I had requirement to evaluate remote JavaScript using Headless Chrome, but requests had to be routed through an internal proxy and different proxies had to be used for different URLs. A convoluted requirement perhaps, but the last bit describes an important feature that Puppeteer is lacking: switching HTTP proxy for each Page/ Request.

However, it turns out that even if the feature is lacking, it is easy to implement an entirely custom HTTP request/ response handling using Puppeteer. All you need is:

This way Chrome itself never makes an outgoing HTTP request and all requests can be handled using Node.js.

The basic functionality is simple to implement:

import puppeteer from 'puppeteer';
import got from 'got';
import HttpProxyAgent from 'http-proxy-agent';
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  // 1. Enable request/ response interception
  await page.setRequestInterception(true);
 
  // 2. Intercept request
  page.once('request', async (request) => {
    // 3. Make request using Node.js
    const response = await got(request.url(), {
      // HTTP proxy.
      agent: new HttpProxyAgent('http://127.0.0.1:3000'),
      body: request.postData(),
      headers: request.headers(),
      method: request.method(),
      retry: 0,
      throwHttpErrors: false,
    });
 
    // 4. Return response to Chrome
    await request.respond({
      body: response.body,
      headers: response.headers,
      status: response.statusCode,
    });
  });
 
  await page.goto('http://gajus.com');
})();

It gets a bit trickier if you require to support HTTPS, error and cookie handling. However, as of last night, there is a package for that: puppeteer-proxy.

puppeteer-proxy abstracts HTTP proxy handling for Puppeteer, including HTTPS support, error and cookie handling. Using it is simple:

import puppeteer from 'puppeteer';
import {
  createPageProxy,
} from 'puppeteer-proxy';
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
 
  const pageProxy = createPageProxy({
    page,
  });
 
  await page.setRequestInterception(true);
 
  page.once('request', async (request) => {
    await pageProxy.proxyRequest({
      request,
      proxyUrl: 'http://127.0.0.1:3000',
    });
  });
 
  await page.goto('http://gajus.com');
})();