develooper Front page | perl.beginners | Postings from November 2022

Re: change one line in a large fine

Thread Previous | Thread Next
Kang-min Liu
November 20, 2022 14:09
Re: change one line in a large fine
Message ID: writes:
> I have a large file which has millions of lines.
> They are text only lines.
> If I have to change one line in the file, what's the efficient way?
> I don't want to slurp the whole file into memory, change that line and
> write the full content back to disk again.

It seems like the editing is line-based, I could recommend checking out
'perl -i' for doing in-place editing, if you are OK with regexp-based
search & replace -- it's basically the same as doing it in vim,

    perl -p -i.orig -e "s/hello/wow/" input.txt

This finds the first line that matces regexp ^foo$ , then replaces the
entire line with "foobar", shift the remainder of the file correctly and
wrote everything back to input.txt -- while keep an original copy of input.txt at

However that may match on multiple lines, if you know the line number in
advance, you could check the line number variable $.

    perl -p -i -e "s/^.+$/wow/ if $. == 2" input.txt

Note that doing this would still scan the entire input.txt line by
line. And adding `exit()` or `next` in the body of `-e` would make the
program finish early but would also truncate input.txt -- which is
probblay not what we want.


Alternatively, if you are looking for doing this with some code but not
with "perl" command, read on...

How efficient it could be depends a little bit on how the target line is
identified and the kind of editing that's required.

For sure you could avoid slupring by doing doing line-based changes

    while (defined(my $line = <$fh>)) {

If the editing is replacing $line with something that's equal in length,
then it can be pretty efficient -- just print the thing and the file is
modified in place.

If, say, we want to just just the 42nd line in the file, here's how I
would do:

    # Open as read-write mode.
    open my $fh, "+<", "input.txt";

    # Seek to the beginning of 42nd line
    my $lineno = 1;
    while (defined(my $line = <$fh>)) {
       $lineno += 1;
       last if $lineno == 42;

    # Print the new content at the begging of 42nd line.
    print $fh $newcontent;

This is the most efficient scenario because the program can end here
without reading the remainder of input.txt.

However, if $newcontent is longer than the 42nd line, the program would
still finish and when we inspect the file, we'll see that the text in
$newcontext bleed over to the 43rd line and maybe further lines.

Similarly, if the $newcontent is shorter, the original conten in the
42nd line will only be partially replaced.

Most likely that's not the kind of editing we want to be doing.

Meaninng, if $newcontent is longer or shorter, the remainder of the file
should be shifted a few characters forward or backword and we want to
re-print those lines back to $fh -- which also requires a lot of bookkeeping
code just to get everything corner case right.

If the editing we want is rather generic I'd say we probably want to put
the output to a different file instead of doing in-place editing.

And we will still end up slurping the entire file, but only keeping one
line at a time in memory.

Kang-min Liu

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About