Data Wrangler

Easily and Quickly Search Raw Data and Convert into Visual Data Form

Data Wrangler converts the collected raw data into an excel-like format, which is more user-friendly. This helps save time and effort for sorting and analyzing data. Data processing steps from data search to conversion are visualized, and it only takes a single click to check data profile information, types and requirements of join and data source.




Service Architecture

  • User → Request/Distribute Products → ( Data Wrangler Image & Chart Repository: Data Wrangler → Kubernetes Engine ) ← Process Data ← Data Engineer
    • Data Wrangler Image & Chart Repository
    • Data Wrangler
      Wrangler UI
      Wrangler Service
      Spark Driver, Spark Executor
      Metadata Database
  • Kubernetes Engine: Container, Container, Container

Key Features

  • Easy installation

    - Apply Data Wrangler and Kubernetes engine all at once
    - Kubernetes engine resources can be configured above the size of resources requested on Data Wrangler, preventing any errors caused by user mistake

  • Integration with various data

    - Uses schema information of connected data source (Hive schema, RDB schema)
    - Load data using SQL
    - Upload target data using local file feature

  • Provide various data analysis function

    - Group function : Count, sum, avg, min, max, first, last, countDistinct, sumDistinct, collect_list, collect_set, etc.
    - Window function : Lag, lead, rank, dense_rank, row_number, etc.
    - Use various default scalar function as well as function needed for data pre-processing and math function

  • Management and monitoring of job

    - Manage job, which applies recipe to the entire data and monitor execution status
    - View job by status and name
    - Monitor detailed status such as job list, status and execution time


    • Billing
    • CPU usage time for Kubernetes Engine pods used by Data Wrangler
    • Kubernetes Engine, Worker Node(VM), Storage usage charges separately
