07/15/2025 | Press release | Distributed by Public on 07/15/2025 06:23
Object storage performance for small files matters. And chunk store is the key. This is low-level, fairly technical detail, but it's important to understand the concept and its benefits as critical AI data pipelines migrate to all-flash object storage.
How many small files are we talking about in today's data pipelines? At scale, there are billions of files. These files could be metadata generated as unstructured data is processed into semistructured data for Large Language Model (LLM) fine-tunings. Or the files may come from a data lakehouse architecture with massive open table databases.
Dell ObjectScale is object storage that's purpose-built for enterprises grappling with the demands of modern data in the AI era. ObjectScale stands apart from the competition for small-file performance, recoverability and durability-dramatically enhancing data storage efficiency. Here are some reasons why.
ObjectScale packs files into 128MB chunks. Those chunks give the system major advantages when dealing with huge numbers of small objects.
For example, take a system with hundreds of millions or billions of very small 10K metadata files. ObjectScale can store over 10,000 of those files into a single chunk. That chunk is then erasure encoded, and the resulting shards are distributed between racks and nodes for fault tolerance. The chunk is laid predictably on disk with a clean storage overhead of 25 percent (with 10+2 erasure encoding).
Contrast this scenario with a system that doesn't use chunk store. For such small objects, individual erasure encoding is a bad option (it could result in an over 600% overhead). Those systems usually fall back on double or triple mirroring (200% or 300% overhead). Try multiplying that by hundreds of millions or billions.
Next, consider how chunking can determine outcomes in a fault scenario.
On a non-chunk-store based object system, the failure of a 61TB NVMe drive would mean the system has to re-create billions of object shards. We're talking weeks to months of rebuild time for a single drive failure. What if an entire storage node with 24 drives went down? The rebuilds would be a constant drag on the system.
The ObjectScale chunk store reduces the total shards that need to be re-created in a fault scenario by orders of magnitude (from billions down to millions). Rebuild times on large NVMe drives can decrease from weeks and months to just hours, all while keeping storage overhead low. It is really the only manageable solution for large NVMe support.
Also consider the impact of data durability when managing object storage for modern workloads such as AI. To prevent silent data corruption, object storage does proactive scanning of the objects, verifying checksums and repairing errors.
If each individual object in a system needs to be checksumed, an active system could easily get into a state where those scans are never able to complete. Some object systems will limit ingest speeds if they get into a situation where checksum scans can't be completed.
ObjectScale, by contrast, checksums individual objects inline before putting them into a chunk. It does not need to verify that in the background, as checksums are verified at the segment/stripe level.
By reducing the number of checksums that need to be continually validated, ObjectScale massively reduces the associated processing overhead. This frees up CPU cycles so the storage nodes can do their main job, reading and writing data.
The powerful chunk store mechanism of Dell ObjectScale directly addresses the challenges of managing billions of small objects. In fact, some of our customers are running ObjectScale environments that include over 100 billion objects in a single bucket. We invite you to reach out and learn more about how ObjectScale offers superior storage efficiency, durability and resiliency, making it an indispensable foundation for high-performance AI and analytics workflows.