A serverless streaming infrastructure resolving ETL queues, parsing JSON logs, and loading warehouse aggregates under 5 seconds.
The client, a digital analytics firm, collected user clickstreams and events from hundreds of publisher websites. Their legacy analytics database processed data in hourly batches, meaning analytical dashboards lagged by up to 20 minutes, preventing clients from running real-time marketing optimization campaigns.
As event volume grew to over 1 million events per hour, the batch processing server became overloaded, causing resource starvation and database crashes. The client needed a highly scalable, serverless data pipeline that could ingest, process, and make clickstream logs queryable within seconds.
Ankur Weldtech India designed and implemented a serverless real-time ETL pipeline using AWS Kinesis Data Streams and AWS Lambda. Incoming clickstream events are received by Kinesis streams, which buffer the raw payloads. AWS Lambda functions are triggered automatically by Kinesis events to parse, sanitize, and validate JSON payloads.
To optimize storage costs and query speeds, the processed logs are batched, converted to Parquet format, and saved to AWS S3. These files are indexed using AWS Glue, making them queryable via Amazon Athena. We wrote all infrastructure definitions using Terraform, allowing the client to tear down and recreate entire testing environments.
By migrating to DataSphere Pipeline, the client replaced batch processing with a real-time analytics system. Advertisers now view clickstream metrics instantly, allowing them to optimize ad spending dynamically while lowering server maintenance costs.