Loading Events

« All Events

  • This event has passed.

[Repeat] Introduction to Big Data AppHub: Kafka to HDFS Filter Application

July 6, 2017 @ 12:00 pm - 1:00 pm


To make critical business decisions in real time, many businesses today rely on a variety of data, which arrives in large volumes. Variety and volume together make big data applications complex operations. Big data applications require businesses to combine transactional data with structured, semi-structured, and unstructured data for deep and holistic insights. 

And, time is of the essence: to derive the most valuable insights and drive key decisions, large amounts of data have to be continuously ingested into Hadoop data lakes as well as other destinations. As a result, data ingestion poses the first challenge for businesses, which must be overcome before embarking on data analysis. 

With its various Application Templates for ingestion, DataTorrent allows users to: Ingest vast amounts of data with enterprise-grade operability and performance guarantees provided by its underlying Apache Apex framework. Those guarantees include fault tolerance, linear scalability, high throughput, low latency, and end-to-end exactly-once processing. Quickly launch template applications to ingest raw data, while also providing an easy and iterative way to add business logic and such processing logic as parse, dedupe, filter, transform, enrich, and more to ingestion pipelines. Visualize various metrics on throughput, latency and app data in real-time throughout execution. In this webinar, we will also show you how seamless it is to download and run the app template on your AWS account with the AWS integration scripts. 

Template description: 

Kafka to HDFS Filter Application: This Kafka to HDFS Filter Application continuously reads string messages separated by ‘|’ from configured kafka topic(s), filters based on the filter criteria and writes each message as a line in HDFS file(s). This application uses PoJoEvent as an example schema, this can be customized to use custom schema based on specific needs. 


Sanjay Pujare, Engineer at DataTorrent

Please register for the webinar at: 


After registering, you will receive a confirmation email containing information about joining the webinar.

To reduce time to market and total cost of ownership, look at operable solutions factory – that you can quickly import and launch. Examples: HDFS to HDFS & HDFS-Line-Copy (back-up, replication, disaster-recovery, distcp replacement); Kafka  to HDFS (ingest, transform); S3 to HDFS (cloud to on-prem); HDFS to Kafka (data lake to event stream, big data log streaming); Database to HDFS (db offload); Database to Database (change data capture, customer 360); Kafka to Database (ingest, transform & load); Kinesis to S3 (Cloud ingest, transform, & load).

Templates include ability to parse, error check, transform, and act on before loading. Additionally, You can add/modify your custom logic on transform, alerts, and actions. Templates include real-time dashboarding for instant views and historical views.

Free DataTorrent Enterprise Edition for qualifying startups. Check it out!

Free DataTorrent Enterprise Edition for Universities. Check it out!

Brought to you by DataTorrent, creators of Apache Apex.


July 6, 2017
12:00 pm - 1:00 pm
Event Category:


Chicago, IL us


Next Gen Native Hadoop Big Data Apex Users Group, Chicago