Skip to content

Dataproc

Dataproc

Dataproc is a fully managed and highly scalable service for running Apache Hadoop, Apache Spark, Apache Flink, Presto, and 30+ open-source tools and frameworks. Use Dataproc for data lake modernization, ETL, and secure data science, at scale, integrated with Google Cloud, at a fraction of the cost.

Pricing:

Required serverless then cluster methods.

Ref:

Dataproc | Google Cloud

This will required:

BigQuery Storage & Spark DataFrames

  • Practive Dataproc:

Official document: https://cloud.google.com/dataproc-serverless/docs/concepts/metrics

Basic: Tính tổng tài sản của ngành ngân hàng.

SQL Basic + Aggretion Example (98%) + Financial (2%)