Removing entries in big file (~300k lines)

33 views
Skip to first unread message

Adam

unread,
May 10, 2022, 6:58:28 PMMay 10
to Automate
Hello, I have a very big list of words from which I need to slice off a few entries from the bottom or top.

Currently I have this and its been working for me for a long time but recently I have been using a bigger file of around 300k lines and this is no longer working for me (it takes like 10minutes+ to complete and i've never actually seen it finish so who knows how long it will take).


Read file (reading the initial file) -> output : strTobeReplaced

Variable set (Creating an array with X amount of entries sliced off) 
Variable : slicedList / value : slice(split(strToBeReplaced, "\n"), X)
*Where X is the number of entries that I want to remove from the list.

Variable set (Creating the string that will be written in file)
Variable : fileString / value : null

For each in slicedList with "i" as entry index DO -> 
Variable set 
Variable : fileString -> value : fileString ++ slciedList[i] ++ "\n"

Once the loop is completed there is a final Write file block
content : fileString -> the file that I want to replace

the format of said file is as shown below(where 1 line is 1 word) ex:
Adam
Jim
Robert


Is there a more optimized way of slicing off lines from a file than this?

Henrik "The Developer" Lindqvist

unread,
May 11, 2022, 9:12:17 AMMay 11
to Automate
The most "optimized" way is probably:
  1. Flow beginning
  2. File read: lines
  3. File write: content= join(slice(split(lines, "\n") , 0, -X), "\n)
  4. Variable set: lines = null

Adam

unread,
May 11, 2022, 3:51:11 PMMay 11
to Automate
Thank you very much!

Adam

unread,
May 12, 2022, 2:11:22 PMMay 12
to Automate
That method you gave me works very well, although it seems to not work on larger files when I try to slice off the bottom but it works very well when slicing form the top. (I have tried on smaller files and it works wonders top or bottom)

So i've been using [ file write : content = join(slice(split(lines, "\n") , X), "\n) ] instead.

It suits my needs perfectly and I am very grateful for the response but out of curiosity, do you have any idea why it doesn't seem to work when slicing from the bottom? Am I doing something wrong and being 2 dumb to notice or is it something to do with how file write block works?

I have attached both a picture of the flow and the flow itself in case you need to take a closer look.
Untitled.png
Untitled.flo

Henrik "The Developer" Lindqvist

unread,
May 12, 2022, 4:22:16 PMMay 12
to Automate
slice(split(lines, "\n") , X)  will remove the first X lines.
slice(split(lines, "\n") , X, -X) should remove the last X-1 lines, try slice(split(lines, "\n") , -X-1) or...
  1. Variable set: lines = split(lines, "\n")
  2. File write: join(slice(lines, 0, max(#lines-X, 0)), "\n")

Ricardo Fernández Serrata

unread,
May 16, 2022, 3:04:58 AMMay 16
to Automate
Another option is to use `head` and `tail` shell commands, they have slice modes that work on lines or bytes depending on which input args you use. But I would only recommend it if you have VERY BIG files, like 2MB. And remember to use output redirection correctly, one is for appending and the other for overwriting, I don't remember which, but the "operators" are `>` and `>>`. For more info use `head --help` and `tail --help`, also search the internet for any details not present in the manual
Reply all
Reply to author
Forward
0 new messages