Renaming files in Drive using a pattern

795 views
Skip to first unread message

Peter Smulders

unread,
Jul 20, 2018, 4:35:28 PM7/20/18
to GAM for G Suite
Today, I had the challenge to rename a set of files that follow a pattern. These files are accessible through Google Drive on the web, obviously, and I also have them accessible through Google Drive File Stream on Windows.

The most basic option is to do this by hand in the web UI. Even with efficient copy and pasting and some quick keyboard movements, this is tedious and slow. Not a good option for more than 5 - 10 files at most.

Same goes for using Windows Explorer. Sure, this is faster because the Drive File Stream will do the syncing in the background, but still a copy&paste fest.

Windows has a command line of sorts (the venerable CMD window, sometimes known as the console). Although there are a few looping constructs, doing stuff to a predictable stream of input (the bread and butter of a UNIX shell) is almost as painful as typing by hand.

I tried the recently released Windows Subsystem for Linux and the bash you can run on Windows. This is an incredible step forward for running UNIX tools on Windows without having to resort to Virtual Machines or ConEmu. (NB: I had this years and years ago with Cygwin and haven't tried it in this case, but that is not the point of this tale).

The problem with Bash on Windows on the Windows Subsystem for Linux is that Google Drive File Stream is not yet 100% perfect in pretending to be a locally available drive: it mounts a virtual drive on (in my case) G:\, which is accessible to Windows Explorer and even the CMD window (go to the G: drive, cd into My Drive, do things like 'dir' and 'dir /s' -- it all works). The bash shell, however, functions by making actual unrecompiled GNU Bash talk to Windows system primitives. Somehow, Google Drive File Stream is not plugged in deep enough, because in bash, although I can do this:

$ cd /mnt/g/
$ cd My\ Drive

I can not do even a simple 'ls':

$ ls
ls: reading directory '.': Function not implemented

So... my other options were to grab a MacBook with at least High Sierra (to be able to run Google Drive File Stream) and see if its terminal is more cooperative (I expect it is) or give Cygwin a go. The latter option might very well work, but I would have to load Cygwin onto an already groaning laptop. Also, I wanted a solution that is portable, so I made it work in Google Cloud Shell.

The following bit of code does this:
  • List the files in a folder specified by ID
  • Parse the file names and change them (the 'sed' command spits out the new file name)
  • Pipe the output to gam csv. This step both aids in performance, because name changes are done in parallel, but it also saves me from having to deal with quotes and spaces.
I provide this here for those among you who are either comfortable with UNIX shell or are intermediate learners, for inspiration and to not have to invent the same wheel yourself. That said, these are power tools and can do unpredictable things in inexperienced hands, so make you know what you are doing!

gam user john...@yourdomain.com show filelist select id 9adaksdjnv9w7dfs9-wn9cm3c3 depth 0 fields id,title delimiter '|' | tail +2 | ( IFS=","; echo "id,newname"; while read USER ID NAME; do echo "${ID},$(echo $NAME | sed 's/^Beren-\(.*\)/Het Rijk van de Grizzly \1/')"; done) | gam csv - gam user john...@yourdomain.com move drivefile id ~id newfilename ~newname

If I find I am doing this over and over again, I might script it, but given that the sed command is highly tied to whatever the situation might be and is very sensitive to spaces and quoting, editing it in the command line history is probable the most efficient way to use this.

--peter

Peter Smulders

unread,
Jul 20, 2018, 4:39:30 PM7/20/18
to GAM for G Suite
I just now noticed the 'delimiter' left in from some earlier experimenting. Although documented, it didn't seem to do anything. -peter

Peter Smulders

unread,
Jul 20, 2018, 6:16:45 PM7/20/18
to GAM for G Suite
I just had contact with Ross and he pointed out some things:
  • The 'delimiter' switch is for fields that have multiple values, which will never happen in this output and hence will never have any effect.
  • the gam move command is more meant for moving files and folders from one folder to another. In this case, the whole purpose really is only to rename files, i.e. only update the name. Therefore, the 'update' command is more efficient.
  • the syntax 'id ~id' is superfluous: just providing an ID is all that is required.
Furthermore, to make this more generic, it is possible to pass through the user instead of having to modify it twice if you re-use the command line. An updated example command:

gam user john...@yourdomain.com show filelist select id 9adaksdjnv9w7dfs9-wn9cm3c3 depth 0 fields id,title | tail +2 | ( IFS=","; echo "user,id,newname"; while read USER ID NAME; do echo "${USER},${ID},$(echo $NAME | sed 's/^Beren-\(.*\)/Het Rijk van de Grizzly \1/')"; done) | gam csv - gam user ~user update drivefile ~id newfilename ~newname

NB: take note of the 'IFS' trick: by changing the Field Seperator (which normally is spaces and tabs) to a comma, the shell will happily parse lines of CSV output into fileds for you (no need for cut or awk), which is why the 'while read' construct does not need to be special in any way. Even better: by doing this in a subshell (caused by the parentheses) you do not need to set IFS back to what it was before and hence you need not even try to remember the original value. This may seem old hat to UNIX veterans, but it was a revelation for me when I learned this, because it makes dealing with CSV in a quick one-liner absolutely trivial.

enjoy!

--peter



+KimNilsson

unread,
Jul 25, 2018, 7:50:20 AM7/25/18
to GAM for G Suite
Cool, Peter!
But not even semi-intermediate bash learners can read RegEx without a guide. :-)

So your "Parse the file names and change them (the 'sed' command spits out the new file name)" still leaves me wanting more. :-)
I don't really like guessing, so instead I ask.

Which files will be affected, and how will they be renamed?

Peter Smulders

unread,
Jul 25, 2018, 8:54:46 AM7/25/18
to GAM for G Suite
I'll explain. This is going to be tedious, but you asked...

First, the sed command we are talking about (pulled from the long one-liner):

$ sed 's/^Beren-\(.*\)/Het Rijk van de Grizzly \1/')

The sed syntax for the substitute command ('s') is s/[something to match]/[something to replace it with]/.

In this particular case, I had a number of files (from processing DVDs into video files to put onto a device for the holidays, if you must know..) that are all named 'Beren-1-1.mkv', 'Beren-1-2.mkv', etc. Some have different extensions, there are some subtitle files in there, etc. My goal is to replace every instance of 'Beren-' with 'Het Rijk van de Grizzly '. This is what the regex says exactly, bit by bit, for the matching part:

^ --> start matching at the beginning of the string.
Beren- --> match this literally.
\( --> start a grouping.
. --> match anything.
* --> of the previous match operator ('.' in this case), match 0 or more instances.
NB: .* is therefore the very common pattern for 'anything, including nothing and as much of it as possible'.
\) --> close the grouping.

I suppose I could have added a $ to indicate the end of the pattern to match should be the end of the string, but the .* will grab anything to the end of the string anyway, so it would be redundant here.

The replacement part is:

Het Rijk van de Grizzly  --> literal text. (note the intentional space after the y)
\1 --> back reference to the first (and in this case only) group. This will yield as a replacement everything that matched between the \( and \) in the matching regex, so effectively the part of the file name after the initial 'Beren-'.

There is a sloppy bit in here: I knew beforehand that all the files going in would match. A non matching file would cause a blank (empty string) as a name to rename the file to. I don't know if that is a legal operation, but I also don't want to find out.


Now .... my point with this whole mini-lecture in sed and regex is that this part will always be highly specific to whatever situation you are dealing with. I actually discussed it with Ross, who hinted at providing something that would accept a regex as an argument and apply it for you.

This is --- at least in my humble opinion --- a bad idea. It is highly difficult to get properly working because regexes are extremely sensitive to quotes, slashes and spaces. If you would call a hypothetical gam command with a regex, you run a very high chance of the shell messing it up before passing it on to Python. When it fails, it will be next to impossible to debug (in bash land, the term is 'quoting hell' for a good reason). And this is a hypothetical with an experienced command line user in mind, which most of the people on the list are not. Also, don't even get me started on the idiosyncrasies of the CMD shell or PowerShell in Windows.

Perhaps the even better argument against it would be that this is a slippery slope: the use cases are so incredibly diverse, that Ross would have to implement an entire regex engine (or wrap around Pythons engine at the least) to cover all the bases. Which comes back to my point: the bit in that one-liner that figures out how to turn a particular file name into the file name you want it to become is intentionally left as an exercise to the reader.

In general, I would always advise against running code you do not understand.

Now, if you are not Kim and have read this far and would like to learn more about scripting and regexes, there are a number of resources available. For shell, there is a decent O'Reilly book, but also many others, and a fair amount of material online, and off course the Bash Reference Manual (which reads just about as easy as bash code itself, imho) and this course. I learned by doing this for 20 years and never accepting that something could not be solved one way or another. NB: learning shell always means learning UNIX in general and learning about a few dozen big and small utilities. You could do worse than referring to UNIX in a Nutshell. (incredibly dense, but useful on every single page).

For learning proper regex, there is no better book than Mastering Regular Expressions.

--peter







+KimNilsson

unread,
Jul 25, 2018, 9:23:56 AM7/25/18
to GAM for G Suite
Awesome reply, Peter!
We all love you for it.
Thank you!

Peter Smulders

unread,
Jul 25, 2018, 11:33:04 AM7/25/18
to GAM for G Suite
Thanks! <blush>

Not to pile on even more, but I neglected to mention that O'Reilly has a Regex Cookbook as well, with many 'here is how you do this' examples. Also, for 99 out of a 100 of these types of problems, Stack Exchange is your friend, also for Bash and UNIX in general.

-peter
Reply all
Reply to author
Forward
0 new messages