One of the things that has often confused me is how little good advice there is for reading large files efficiently when writing code.
Typically most people use whatever the canonical file read suggestion for their language is, until they need to read large files and it’s too slow. Then they google “efficiently reading large files in
However, in Halvar’s recent QCon talk he had several slides talking about how most code is written based on the old assumptions of spinning disks. With non-SSD HD’s there’s usually a single read head and you can’t do much in parallel. This requires code to optimise for single reads, minimal seeks, and large redhead of data layed out on disk next to each other. But modern SSDs are much more comfortable with seeks and parallelism.
Continue reading "Reading Large Files and Perf"