How to decide bucket count in hive
navinkumar
1 min read
Table of contents
Steps
Calculate Expected Bucket Size:
- Divide the table size by the block size on Hadoop to get an initial estimate.
Expected Bucket Size = Table Size / Block Size on Hadoop
Find the Nearest Power of 2:
- Take the base-2 logarithm of the initial estimate to find the nearest power of 2.
n = log(Expected Bucket Size, 2)
Calculate Final Bucket Size:
- Use the rounded-off value to calculate the final bucket size by raising 2 to the power of the rounded value.
Bucket Size = 2 ^ Rounded Value
Example Calculation:
Assume:
Table Size = 2300 MB
Block Size on Hadoop = 128 MB
Calculate Expected Bucket Size:
Expected Bucket Size = 2300 / 128 = 17.96875
Find the Nearest Power of 2:
n = log(17.96875, 2) ≈ 4.167418
Round off to the Nearest Integer:
Rounded Value ≈ 5
Calculate Final Bucket Size:
Bucket Size = 2^5 = 32
Result:
Therefore, the bucket size for the given example is 32.
Reference:
0
Subscribe to my newsletter
Read articles from navinkumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
navinkumar
navinkumar
A passionate data engineer skilled in ETL, Data warehousing, SQL, Python, Docker & Kubernetes and more. Follow my blog for tech insights!