Centris.ca Automation

This project is an automation pipeline built for scraping real estate data from a website and storing the results in structured JSONL files along with downloaded property images. The system was designed for fast and reliable large-scale extraction using Scrapy and Playwright.

PREVIOUS NEXT

PROJECT DETAILS

ROLE

Automation Developer

CHALLENGES

The main challenge was handling authenticated sessions and dynamic website interactions. Playwright was used to automate login flows and browser actions, while rotating proxies and custom user agents helped reduce blocking during large-scale scraping.

SOLUTION

I built two different automation scripts for the client. The first script allows the client to enter keywords such as location, title, or search filters, then automatically scrapes all matching houses from the platform. The second script reads data from the user's liked or favorite house list and extracts all saved properties automatically. The scraped data is stored as JSONL files while all property images are downloaded into organized folders.

PERFORMANCE

Used Scrapy for fast concurrent scraping
Integrated Playwright for automated login and dynamic page handling
Implemented rotating proxies and custom user agents

TECH STACK

Python

Scrapy

Playwright

JSONL

Proxy Rotation

Requests

ARCHITECTURE

The automation system combines Scrapy for high-performance crawling with Playwright for browser automation and login handling. Extracted data is processed and stored into JSONL datasets with automatic image downloading and folder organization.

Centris.ca Automation

Please Wait

Centris.ca Automation