How to extract parent directory name from .log file in Chrome's File System folder

93 views
Skip to first unread message

guest271314

unread,
Jun 2, 2024, 11:26:17 AMJun 2
to Chromium-discuss
On Chrome browser when [Origin private file system](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API/Origin_private_file_system), not to be confused with [File System *Access* API](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API) is used to create directories and files a folder is created for each entry in the Chrome configuration folder in `Default/File System`.

The folders are named something like `021` for each entry, so `063`, `067` and so forth can also exist within the parent `File System` directory.

Inside of a given subdirectory there is a folder named `t`, which includes folders named `00` and `Paths`.

`00` contains the files that were written to the origin private file system using `FileSystemWritableFileStream`.

`Paths` contains a file named `000003.log`, which is a `text/x-log` file including chararacters outside of `0`-`255` code point range.

After removing all except alphanumeric, underscore, dash, forward-slash, dot characters the content of `000003.log` looks something like this https://gist.github.com/guest271314/c930dd00388aab20ab66528fad86d8c3:

```
rM0Ux/LAST_FILE_ID0LAST_INTEGER-1FCHILD_OF:0:persistent-serviceworker1140persistent-serviceworkerUx/LAST_FILE_ID100Ux/TLAST_INTEGER0ZlCHILD_OF:1:README.md224000/00000000README.mdLAST_FILE_ID2KMD140persistent-serviceworkerUx/LAST_INTEGER1fVwCHILD_OF:1:README.md.crswap338400/00000001README.md.crswapLAST_FILE_ID3D140persistent-serviceworkerUx/dCHILD_OF:1:README.md.crswap324000/00000001README.mdInD140persistent-serviceworkerUx/D140persistent-serviceworkerUx/UeKCHILD_OF:1:chromium_extension_web_accessible_resources_iframe_message_event44X@chromium_extension_web_accessible_resources_iframe_message_eventUx....
```

I am able to get the file names in an `Array` using the code below.

```
  async function parseChromeDefaultFileSystem(path) {
    var set = new Set([
      32, 45, 46, 47, 48, 49, 50, 51, 52, 53,
      54, 55, 56, 57, 58, 64, 65, 66, 67, 68,
      69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
      79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
      89, 90, 95, 97, 98, 99, 100, 101, 102, 103,
      104, 105, 106, 107, 108, 109, 110, 111,
      112, 113, 114, 115, 116, 117, 118, 119,
      120, 121, 122,
    ]);

    return fetch(
      path,
    ).then((r) => r.text()).then((text) => {
      let str = "";
      for (const char of text) {
        const code = char.codePointAt();
        if (set.has(code)) {
          str += char;
        }
      }
      console.log(str);
     
      const matches = [
        ...new Set(
          str.replace(/./g, (s) => set.has(s.codePointAt()) ? s : " ").match(
            /00000\d+[A-Za-z-_.0-9\s]+\.crswap/g,
          ),
        ),
      ].map((s) => s.replace(/00000[\d\s]+|\.crswap/g, ""));

      return matches;
     
    }).catch(console.error);
  }

var paths = await parseChromeDefaultFileSystem("file:///home/user/.config/chromium/Default/File\ System/021/t/Paths/000003.log");
console.log(paths);
```

What I am trying to do now is get the parent directory of the discrete files from `000003.log`. So I need to get those `@` symbols followed by the directory or subdirectory name, or parse the `CHILD_OF` and so forth parts of the string.

How would you go about extracting the parent directories and prepending that to the file names so we can know which flat files in `00` correspond to which directories and subdirectories in the `000003.log` file?

guest271314

unread,
Jun 2, 2024, 6:21:30 PMJun 2
to Chromium-discuss, guest271314
[SOLVED]

async function parseChromeDefaultFileSystem(path) { try { const set = new Set([ 32, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 95, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, ]); const request = await fetch(path); const text = (await request.text()).replace(/./g, (s) => set.has(s.codePointAt()) ? s : ""); const files = [ ...new Set( text.match( /00000\d+[A-Za-z-_.0-9\s]+\.crswap/g, ), ), ].map((s) => { const dir = [...new Set(text.slice(0, text.indexOf(s)).match(/(?<=[@\s]|CHILD_OF:0:)([\w-_])+(?=Ux)/g).map((d) => d.split(/\d+|D140/) ))].flat().pop(); const [key] = s.match(/00000[\d\s]+|\.crswap/g); return ({ [key]: s.replace(/00000[\d\s]+|\.crswap/g, ""), dir }) }); return { name: files[0].dir, files } } catch (e) { console.error(e); } } let paths = await parseChromeDefaultFileSystem("file:///home/user/.config/chromium/Default/File\ System/021/t/Paths/000003.log"); console.log(JSON.stringify(paths,null,2)); { "name": "persistent-serviceworker", "files": [ { "00000001": "README.md", "dir": "persistent-serviceworker" }, { "00000003": "background.js", "dir": "chromium_extension_web_accessible_resources_iframe_message_event" }, { "00000005": "index.html", "dir": "chromium_extension_web_accessible_resources_iframe_message_event" }, { "00000007": "index.js", "dir": "chromium_extension_web_accessible_resources_iframe_message_event" }, { "00000009": "manifest.json", "dir": "chromium_extension_web_accessible_resources_iframe_message_event" }, { "00000011": "index.html", "dir": "docs" }, { "00000013": "script.js", "dir": "docs" }, { "00000015": "sw.js", "dir": "docs" }, { "00000017": "index.html", "dir": "docs" }, { "00000019": "script.js", "dir": "docs" }, { "00000021": "sw.js", "dir": "docs" }, { "00000023": "index.html", "dir": "message-event" }, { "00000025": "script.js", "dir": "message-event" }, { "00000027": "sw.js", "dir": "message-event" }, { "00000029": "index.html", "dir": "readablestream-fetch-respondwith" }, { "00000031": "script.js", "dir": "readablestream-fetch-respondwith" }, { "00000033": "sw.js", "dir": "readablestream-fetch-respondwith" } ] }
Reply all
Reply to author
Forward
0 new messages