On Thu, 21 Apr 2022 07:12:07 -0700 alice@coakmail.com wrote: > OP maybe need the streaming IO for reading files. Which is what they were already doing - they used: while (<HD>) { ... } Which, under the hood, uses readline, to read a line at a time. (where "HD" is their global filehandle - a lexical filehandle would have been better, but makes no difference here) You can use B::Deparse to see that the above deparses to a use of readline: [davidp@columbia:~]$ cat tmp/readline #!/usr/bin/env perl while (<STDIN>) { print "Line: $_\n"; } [davidp@columbia:~]$ perl -MO=Deparse tmp/readline while (defined($_ = readline STDIN)) { print "Line: $_\n"; } tmp/readline syntax OK So, they're already reading line-wise, it seems they're just running in to memory usage issues from holding a hash of 80+million values, which is not super suprising on a reasonably low-memory box. Personally, if it were me, I'd go one of two ways: * just throw some more RAM at it - these days, RAM is far cheaper than programmer time trying to squeeze it into the least amount of bytes, especially true if it's a quick "Get It Done" solution * hand it off to a tool made for the job - import the data into SQLite or some other DB engine and let it do what it's designed for, as it's likely to be far more efficient than a hand-rolled Perl solution. (They already proved that Apache Spark can handle it on the same hardware)Thread Previous | Thread Next