Directory Metadata on SSDs and Deletes

287 views
Skip to first unread message

gtjones

unread,
Jul 3, 2013, 7:55:07 AM7/3/13
to isilon-u...@googlegroups.com
Two related questions:

1. When running GNA, do the data blocks for directories reside on SSDs? I understand that the inode information does, but what about the data blocks that relate file names to inodes.

2. It's my understanding that inode information is written to both SSDs and HDDs. When performing metadata reads, SSDs get hit, but what about performing removes ( ex: rm -rf /dir ). Does the remove have to go out and touch the HDDs before it acknowledges the delete has occurred?

I'm running 6.5.5.22.

Thanks

Peter Serocka

unread,
Jul 3, 2013, 8:44:41 AM7/3/13
to isilon-u...@googlegroups.com

On Wed 3 Jul '13 md, at 19:55 st, gtjones wrote:

> Two related questions:
>
> 1. When running GNA, do the data blocks for directories reside on
> SSDs? I understand that the inode information does, but what about
> the data blocks that relate file names to inodes.

Are file system metadata. Shocking question :)

Data = file data, everything else is metadata.



>
> 2. It's my understanding that inode information is written to both
> SSDs and HDDs. When performing metadata reads, SSDs get hit, but
> what about performing removes ( ex: rm -rf /dir ). Does the remove
> have to go out and touch the HDDs before it acknowledges the delete
> has occurred?

Yes. The information stored on HDD is the authoritative one.
7.0 has the option to keep metadata solely on SSD, not HDD.
Said to consume about 5x more space on SSD though.

Cheers

Peter

>
> I'm running 6.5.5.22.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google
> Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Keith Nargi

unread,
Jul 3, 2013, 9:11:40 AM7/3/13
to isilon-u...@googlegroups.com, isilon-u...@googlegroups.com
One correction on that. In 7.0 you can do read write meta data acceleration. This keeps all copies of metadata on ssds. The space consumption is related to your protection scheme so it won't be 5x more consumption but it will consume more than what you have currently in 6,5.5.22.

Cheers
Happy 4th

Peter Serocka

unread,
Jul 3, 2013, 9:29:51 AM7/3/13
to isilon-u...@googlegroups.com
Metadata always gets mirrored at least 3x,
currently that SSD copy counts as 1x ...

Greetings from Shanghai

Peter

Peter Serocka

unread,
Jul 3, 2013, 9:36:18 AM7/3/13
to isilon-u...@googlegroups.com
Just wondered what made me think 5x in the first place... here it is,
more or less...

Metadata read/write acceleration — Writes file data to HDDs and
metadata to SSDs,
when available. This strategy accelerates metadata writes in addition
to reads but requires
about four to five times more SSD storage than the Metadata read
acceleration setting.
Enabling GNA does not affect read/write acceleration.

(OneFS Admin guide, SSD pools section)

gtjones

unread,
Jul 3, 2013, 10:39:46 AM7/3/13
to isilon-u...@googlegroups.com
Pete, 

Thanks, and a follow up question.

For directories, can I see with the isi get -DD command the data blocks I'm referring to. What is the section labeled "file data"? I understand that the 1st line is where my metadata (LIN) information lives.

Here's sample output for a directory.

 POLICY  W  LEVEL PERFORMANCE COAL  ENCODING      FILE              IADDRS
default     4x concurrency on    N/A           ./                <25,19,55893492252672:512>, <27,16,23483564179456:512>, <29,16,6859805179904:512>, <31,5,44048085696512:512> ct: 1297980525 rt: 0
*************************************************
* IFS inode: [ 25,19,6822936064:512, 27,16,2866646016:512, 29,16,837378560:512, 31,5,5376963584:512 ]
*************************************************
*
*  Inode Version:      4
*  Dir Version:        2
*  Inode Revision:     2640
*  Inode Mirror Count: 4
*  Recovered Flag:     0
*  Recovered Groups:   0
*  Link Count:         16
*  Size:               489
*  Mode:               16877
*  Flags:              0x4e0
*  File Data Blocks:   1
*  LIN:                1:024f:0047
*  Restripe Cursor:    -1
*  Abort Cursor:       -1
*  Last Access:        1372777814.095694254
*  Last Modified:      1371560686.477369819
*  Last Inode Change:  1371560686.477369819
*  Create Time:        1297980525
*  Rename Time:        0
*  Write Caching:      Enabled
*  Parent Lin          2
*  Parent Hash:        703899
*  Habanero Inode:     True
*  Snapshot IDs:       None
*  Last Paint ID:          11700
*  Domain IDs:         None
*  LIN needs repair:   False
*  Manually Manage:
*       Access         False
*       Protection     False
*  Protection Policy:  Diskpool default
*  Current Protection: 4x
*  Future Protection:  0x
*  Disk pools:         policy x40066(5) -> target x40066(5)
*  SSD Strategy:       metadata
*  SSD Status:         complete
*  Data Width Device List:
*  Meta Width Device List:
*  (d: 24, b: 1, i: no, drv(250): 00020000
*  (d: 25, b: 1, i: no, drv(250): 00000040
*  (d: 29, b: 1, i: no, drv(250): 00020000
*  (d: 31, b: 1, i: no, drv(250): 00000002
*
*  File Data (32 bytes):
*    Metatree Depth: 1
*    24,17,15077130240:8192
*    25,6,264585084928:8192
*    29,17,1460732805120:8192
*    31,1,1694748631040:8192
*
*  Dynamic Attributes (93 bytes):
*
        ATTRIBUTE                OFFSET SIZE
        New file attribute       0      25
        Last snapshot paint time 25     9
        Alternate data stream    34     9
        Disk pool policy         43     9
        Disk pool target         52     5
        v5 Meta wdl              57     27
        Access time              84     9
*
*************************************************

*  NEW FILE ATTRIBUTES
*  Access attributes:  active
*  Write Cache:  on
*  Access Pattern:  concurrency
*  At_r: 0
*  Protection attributes:  active
*  Protection Policy:  Diskpool default
*  Disk pools:         policy 5
*  SSD Strategy:       metadata
*
*************************************************
*************************************************

SECURITY DESCRIPTOR OWNER/GROUP

*  Owner - UID:0
*  Group - GID:0

*************************************************

Peter Serocka

unread,
Jul 4, 2013, 6:32:23 AM7/4/13
to isilon-u...@googlegroups.com
On 2013 Jul 3. md, at 22:39 st, gtjones wrote:

Pete, 

Thanks, and a follow up question.

For directories, can I see with the isi get -DD command the data blocks I'm referring to. What is the section labeled "file data"? I understand that the 1st line is where my metadata (LIN) information lives.



Yes, tradionally a UNIX directory has been a "file" whose content
was interpreted a table mapping file names to inodes nums.

Now, in OneFS, that table has become a B-tree structure. 
You wonder whether is is treated as "file data" or metadata 
(like inode time stamps etc) when SSD accelerated is considered.

A couple of points:

- how would that be poor if OneFS wouldn't use
  SSD for accelerating that B-tree structure... ;)

- if only the per-inode metadata would be on SSD,
  then "ls -l" would get accelerated, but a plain "ls" would not.
  Try it out; I've never seen plain "ls" not benefiting from SSD.
  (Of course ls is faster that ls -l, you need to
  compare both, once on SSD and once w/o SSD. And beware the
  global caches in OneFS.)

- finally:

  File Data (32 bytes):
*    Metatree Depth: 1
*    24,17,15077130240:8192
*    25,6,264585084928:8192
*    29,17,1460732805120:8192
*    31,1,1694748631040:8192

That's the four disks onto which your dir
got quadruplicated -- 4x as in the first line of isi get.

Now my prediction (you might check on your cluster):
One of those four disks is an SSD!
And such will be true for any directory
with SSD acceleration.

For example, 12 is the node ID, not the logical node name (LNN).
Next, 17 is the logical disk num (Lnum), not the bay num.
(The SSDs are bay 1, but can have different Lnums on the nodes.
So I can't tell where the SSD is in your example.) 
I took those relations into account for our X200/SSD pool
and found the prediction to be true for all 200 dirs checked,
-- always one SSD per 4x-protected dir.

While it does not immediately become obvious when looking
at the isi get output, it tells a clear story:
4x protection with metadata on SSD means one of the fours
copies is on SSD, and that one is used for accelated access.

(And it applies right away to the "File Data" section ;-)

Cheers

Peter



Peter Serocka
CAS-MPG Partner Institute for Computational Biology (PICB)
Shanghai Institutes for Biological Sciences (SIBS)
Chinese Academy of Sciences (CAS)
320 Yue Yang Rd, Shanghai 200031, China





Reply all
Reply to author
Forward
0 new messages