Computer Science

Bucket Sort – Sorting Algorithm

Bucket Sort

Bucket Sort is a sorting algorithm that distributes elements into several groups (buckets) based on their values. Each bucket is then sorted individually, and the results are merged to produce a sorted array. It is most effective when data is uniformly distributed over a range and is particularly suitable for sorting floating-point numbers.


1. How It Works

  1. Create Buckets: Divide the range of input data into a fixed number of buckets.
  2. Distribute Elements: Place each element of the input array into its appropriate bucket.
  3. Sort Buckets: Sort the elements within each bucket using another sorting algorithm (e.g., insertion sort or quicksort).
  4. Merge Buckets: Concatenate all sorted buckets to form the final sorted array.

2. Characteristics

  • Time Complexity:
    • Best/Average Case: O(n + k), where n is the number of elements and k is the number of buckets.
    • Worst Case: O(n^2), when all elements land in a single bucket and require a full sort.
  • Space Complexity: O(n + k), due to the extra space needed for buckets.
  • Stability: Not inherently stable but can be made stable with additional effort.
  • Best Use Case:
    • When data is uniformly distributed over a range (e.g., floating-point numbers between 0 and 1).
    • When the dataset is large, and sorting efficiency is critical.

3. Steps with Example

Example Input: [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]

  1. Create Buckets:
    • Divide the range [0, 1] into 10 buckets (e.g., [0, 0.1), [0.1, 0.2), …, [0.9, 1)).
  2. Distribute Elements:
    • Place each element into its corresponding bucket based on its value.
    • Example: Bucket 0: [0.12] Bucket 1: [0.17, 0.21, 0.23] Bucket 2: [0.26] Bucket 3: [0.39] Bucket 6: [0.68] Bucket 7: [0.72, 0.78] Bucket 9: [0.94]
  3. Sort Buckets:
    • Sort each bucket individually. For simplicity, insertion sort is often used.
  4. Merge Buckets:
    • Concatenate sorted buckets to form the final sorted array: [0.12, 0.17, 0.21, 0.23, 0.26, 0.39, 0.68, 0.72, 0.78, 0.94].

4. Advantages and Disadvantages

Advantages:

  1. Efficient for Uniform Data: Performs very well when data is evenly distributed across the range.
  2. Parallelizable: Different buckets can be sorted concurrently.

Disadvantages:

  1. Extra Memory Usage: Requires additional space for buckets.
  2. Dependent on Distribution: Performance heavily depends on how uniformly the data is distributed.
  3. Bucket Selection: Choosing the right number and size of buckets is crucial for optimal performance.

5. Java Implementation

import java.util.ArrayList;
import java.util.Collections;

public class BucketSort {

    public static void bucketSort(float[] arr) {
        int n = arr.length;

        // 1. Initialize buckets
        ArrayList<Float>[] buckets = new ArrayList[n];
        for (int i = 0; i < n; i++) {
            buckets[i] = new ArrayList<>();
        }

        // 2. Distribute elements into buckets
        for (float num : arr) {
            int bucketIndex = (int) (num * n); // Calculate bucket index
            buckets[bucketIndex].add(num);
        }

        // 3. Sort each bucket
        for (ArrayList<Float> bucket : buckets) {
            Collections.sort(bucket);
        }

        // 4. Merge buckets
        int index = 0;
        for (ArrayList<Float> bucket : buckets) {
            for (float num : bucket) {
                arr[index++] = num;
            }
        }
    }

    public static void main(String[] args) {
        float[] arr = {0.78f, 0.17f, 0.39f, 0.26f, 0.72f, 0.94f, 0.21f, 0.12f, 0.23f, 0.68f};
        System.out.println("Original array:");
        for (float num : arr) {
            System.out.print(num + " ");
        }

        bucketSort(arr);

        System.out.println("\nSorted array:");
        for (float num : arr) {
            System.out.print(num + " ");
        }
    }
}

6. Python Implementation

def bucket_sort(arr):
    n = len(arr)
    if n <= 0:
        return arr

    # 1. Create buckets
    buckets = [[] for _ in range(n)]

    # 2. Distribute elements into buckets
    for num in arr:
        bucket_index = int(num * n)  # Calculate bucket index
        buckets[bucket_index].append(num)

    # 3. Sort each bucket
    for bucket in buckets:
        bucket.sort()

    # 4. Merge buckets
    sorted_array = []
    for bucket in buckets:
        sorted_array.extend(bucket)

    return sorted_array


# Example usage
if __name__ == "__main__":
    arr = [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12, 0.23, 0.68]
    print("Original array:", arr)

    sorted_arr = bucket_sort(arr)

    print("Sorted array:", sorted_arr)

7. When to Use Bucket Sort

  1. Floating-Point Numbers:
    • Best suited for sorting real numbers (e.g., 0.78, 0.23).
  2. Uniform Distribution:
    • Performs well when the data is uniformly distributed across a range.
  3. Large Datasets:
    • Particularly useful when sorting large datasets where efficiency is critical.

Bucket Sort is highly efficient for specific scenarios but requires careful consideration of data distribution and bucket configuration for optimal performance.

Aquinas

Hello! I'm Aquinas, a lifelong learner who finds everything in the world fascinating. I can’t ignore my curiosity, and this blog is where I document my journey of learning, exploring, and understanding various topics. I don’t limit myself to a single field—I enjoy diving into science, philosophy, technology, the arts, and more. For me, learning isn’t just about gathering information; it’s about applying knowledge, analyzing it from different perspectives, and discovering new insights along the way. Through this blog, I hope to record my learning experiences, share ideas, and connect with others who have a similar passion for knowledge. Let’s embark on this journey of exploration together! 😊

Recent Posts

What Is EPS(Earnings Per Share)?

When analyzing a stock, one of the first financial indicators you’ll encounter is EPS, or Earnings Per Share. It’s one… Read More

8 months ago

What is Market Capitalization? Everything Investors Need to Know

When you look at a stock’s profile on a financial website, one of the first things you’ll see is its… Read More

8 months ago

The MIT License

In the world of open-source software, simplicity and flexibility are often just as important as legal protection. That’s why the… Read More

9 months ago

Mozilla Public License (MPL)

If you want your software to be open source, but still compatible with commercial use—and not as restrictive as the… Read More

9 months ago

The Apache License 2.0

When it comes to open-source software, developers and businesses alike need licenses that balance freedom, legal clarity, and long-term security.… Read More

9 months ago

BSD (Berkeley Software Distribution) License

If you’re working on open-source projects or choosing third-party libraries for your software, understanding software licenses is essential. Among the… Read More

9 months ago