Dataproc¶
Dataproc
Dataproc is a fully managed and highly scalable service for running Apache Hadoop, Apache Spark, Apache Flink, Presto, and 30+ open-source tools and frameworks. Use Dataproc for data lake modernization, ETL, and secure data science, at scale, integrated with Google Cloud, at a fraction of the cost.
Pricing:
Required serverless then cluster methods.
Ref:
Dataproc | Google Cloud
This will required:
BigQuery Storage & Spark DataFrames
- Practive Dataproc:
Official document: https://cloud.google.com/dataproc-serverless/docs/concepts/metrics
Basic: Tính tổng tài sản của ngành ngân hàng.
SQL Basic + Aggretion Example (98%) + Financial (2%)