[RFC][PATCH 0/6] swsusp: rework swap handling

2 views
Skip to first unread message

Rafael J. Wysocki

unread,
Oct 29, 2005, 5:00:16 PM10/29/05
to
Hi,

The following series of patches divides swsusp into two functionally
independent subsystems:

- the snapshot-handling part responsible for creating and populating
the snapshot-related data structure which is the list of page backup
entries aka pagedir,

- the swap-handling part responsible for writing the snapshot data to
and reading it from swap.

On suspend the snapshot-handling part creates the system snapshot
and makes the data stored in the snapshot available to the swap-handling
part via an interface function allowing it to transfer the data as
a series of consecutive data pages, in a specific order. The swap-handling
part writes the data pages to a swap partition is such a way that they
can be read in exactly the same order in which they have been saved.

On resume the snapshot-handling part is invoked by the swap-handling
part to create the pagedir. Then, the swap-handling part is allowed to
send it, with the help of an interface function, data pages that are used
to populate the snapshot data structure. It is assumed that the data
pages will be sent in the same order in which they have been received
by the swap-handling part on suspend. Finally, the system state (from
before suspend) is restored by the snaphot-handling part from the
data structure handled by it.

From the point of view of the swap-handling part, the contents of the
data pages provided by the snapshot-handling do not matter at all.
It handles each data page in the same way without analyzing its
contents and the snapshot-handling part is responsible for recognizing
the metadata and using them as appropriate. Consequently, in principle
the swap-handling part can be replaced with a user-space process and
the interface functions used in transferring data between the two parts
of swsusp can be replaced with a relatively simple kernel-user interface
in the future.

The approach used in this series of patches has some additional benefits:
1) the size of the pagedir is reduced by 1/4 which causes some more memory to
be available on resume,
2) the amount of metadata written to swap is reduced by 3/4,
3) the artificial limitation on the pagedir size, imposed by the size of the
swsusp_info structure, is lifted,
4) the size of swsusp_info structure is reduced so it can be merged with the
swsusp_header structure in the future,
5) the swap-handling part does not use any global variables related to the
snapshot data structure,
6) the __nosavedata variables are almost eliminated (on x86-64 the last of them
is the in_suspend variable).

I have divided the changes into some more or less logical steps for clarity.
Although the code has been designed as proof-of-concept, it is functional
and has been tested on x86-64, except for the cryptographic functionality
and error paths.

For your convenience the patches are available from:
http://www.sisk.pl/kernel/patches/2.6.14-rc5-mm1/

I will very much appreciate any comments, remarks and suggestions.

Greetings,
Rafael

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Rafael J. Wysocki

unread,
Oct 29, 2005, 5:00:17 PM10/29/05
to
This is a non-essential step making the next patch possible. No functionality
changes.

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>

kernel/power/swsusp.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)

Index: linux-2.6.14-rc5-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.14-rc5-mm1.orig/kernel/power/swsusp.c 2005-10-29 13:13:47.000000000 +0200
+++ linux-2.6.14-rc5-mm1/kernel/power/swsusp.c 2005-10-29 13:13:55.000000000 +0200
@@ -644,12 +644,15 @@
{
int error;

+ if ((error = swsusp_swap_check())) {
+ printk(KERN_ERR "swsusp: cannot find swap device, try swapon -a.\n");
+ return error;
+ }
lock_swapdevices();
error = write_suspend_image();
/* This will unlock ignored swap devices since writing is finished */
lock_swapdevices();
return error;
-
}


@@ -672,11 +675,6 @@
goto Enable_irqs;
}

- if ((error = swsusp_swap_check())) {
- printk(KERN_ERR "swsusp: cannot find swap device, try swapon -a.\n");
- goto Power_up;
- }
-
if ((error = save_highmem())) {
printk(KERN_ERR "swsusp: Not enough free pages for highmem\n");
goto Restore_highmem;
@@ -689,7 +687,6 @@
restore_processor_state();
Restore_highmem:
restore_highmem();
-Power_up:
device_power_up();
Enable_irqs:
local_irq_enable();
@@ -916,7 +913,7 @@
* Reset swap signature now.
*/
error = bio_write_page(0, &swsusp_header);
- } else {
+ } else {
return -EINVAL;
}
if (!error)

Rafael J. Wysocki

unread,
Oct 29, 2005, 5:00:17 PM10/29/05
to
This is another preliminary step. It moves the snapshot-handling functions
remaining in swsusp.c to snapshot.c (moving the code without changing
the functionality) and makes the next patch be more clear (in my opinion).

Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>

kernel/power/power.h | 2
kernel/power/snapshot.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/power/swsusp.c | 126 ------------------------------------------------
3 files changed, 125 insertions(+), 124 deletions(-)

Index: linux-2.6.14-rc5-mm1/kernel/power/power.h
===================================================================
--- linux-2.6.14-rc5-mm1.orig/kernel/power/power.h 2005-10-28 23:49:15.000000000 +0200
+++ linux-2.6.14-rc5-mm1/kernel/power/power.h 2005-10-28 23:50:11.000000000 +0200
@@ -69,4 +69,6 @@
extern int restore_highmem(void);
extern struct pbe * alloc_pagedir(unsigned nr_pages);
extern void create_pbe_list(struct pbe *pblist, unsigned nr_pages);
+extern int check_pagedir(struct pbe *pblist);
+extern struct pbe * swsusp_pagedir_relocate(struct pbe *pblist);
extern void swsusp_free(void);
Index: linux-2.6.14-rc5-mm1/kernel/power/snapshot.c
===================================================================
--- linux-2.6.14-rc5-mm1.orig/kernel/power/snapshot.c 2005-10-28 23:49:15.000000000 +0200
+++ linux-2.6.14-rc5-mm1/kernel/power/snapshot.c 2005-10-28 23:50:11.000000000 +0200
@@ -423,3 +423,124 @@
printk("swsusp: critical section/: done (%d pages copied)\n", nr_pages);
return 0;
}
+
+/**
+ * On resume, for storing the PBE list and the image,
+ * we can only use memory pages that do not conflict with the pages
+ * which had been used before suspend.
+ *
+ * We don't know which pages are usable until we allocate them.
+ *
+ * Allocated but unusable (ie eaten) memory pages are marked so that
+ * swsusp_free() can release them
+ */
+
+unsigned long get_safe_page(unsigned gfp_mask)
+{
+ unsigned long m;
+
+ do {
+ m = get_zeroed_page(gfp_mask);
+ if (m && PageNosaveFree(virt_to_page(m)))
+ /* This is for swsusp_free() */
+ SetPageNosave(virt_to_page(m));
+ } while (m && PageNosaveFree(virt_to_page(m)));
+ if (m) {
+ /* This is for swsusp_free() */
+ SetPageNosave(virt_to_page(m));
+ SetPageNosaveFree(virt_to_page(m));
+ }
+ return m;
+}
+
+/**
+ * check_pagedir - We ensure here that pages that the PBEs point to
+ * won't collide with pages where we're going to restore from the loaded
+ * pages later
+ */
+
+int check_pagedir(struct pbe *pblist)
+{
+ struct pbe *p;
+
+ /* This is necessary, so that we can free allocated pages
+ * in case of failure
+ */
+ for_each_pbe (p, pblist)
+ p->address = 0UL;
+
+ for_each_pbe (p, pblist) {
+ p->address = get_safe_page(GFP_ATOMIC);
+ if (!p->address)
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+/**
+ * swsusp_pagedir_relocate - It is possible, that some memory pages
+ * occupied by the list of PBEs collide with pages where we're going to
+ * restore from the loaded pages later. We relocate them here.
+ */
+
+struct pbe * swsusp_pagedir_relocate(struct pbe *pblist)
+{
+ struct zone *zone;
+ unsigned long zone_pfn;
+ struct pbe *pbpage, *tail, *p;
+ void *m;
+ int rel = 0;
+
+ if (!pblist) /* a sanity check */
+ return NULL;
+
+ /* Clear page flags */
+
+ for_each_zone (zone) {
+ for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
+ if (pfn_valid(zone_pfn + zone->zone_start_pfn))
+ ClearPageNosaveFree(pfn_to_page(zone_pfn +
+ zone->zone_start_pfn));
+ }
+
+ /* Mark orig addresses */
+
+ for_each_pbe (p, pblist)
+ SetPageNosaveFree(virt_to_page(p->orig_address));
+
+ tail = pblist + PB_PAGE_SKIP;
+
+ /* Relocate colliding pages */
+
+ for_each_pb_page (pbpage, pblist) {
+ if (PageNosaveFree(virt_to_page((unsigned long)pbpage))) {
+ m = (void *)get_safe_page(GFP_ATOMIC | __GFP_COLD);
+ if (!m)
+ return NULL;
+ memcpy(m, (void *)pbpage, PAGE_SIZE);
+ if (pbpage == pblist)
+ pblist = (struct pbe *)m;
+ else
+ tail->next = (struct pbe *)m;
+ pbpage = (struct pbe *)m;
+
+ /* We have to link the PBEs again */
+ for (p = pbpage; p < pbpage + PB_PAGE_SKIP; p++)
+ if (p->next) /* needed to save the end */
+ p->next = p + 1;
+
+ rel++;
+ }
+ tail = pbpage + PB_PAGE_SKIP;
+ }
+
+ /* This is for swsusp_free() */
+ for_each_pb_page (pbpage, pblist) {
+ SetPageNosave(virt_to_page(pbpage));
+ SetPageNosaveFree(virt_to_page(pbpage));
+ }
+
+ printk("swsusp: Relocated %d pages\n", rel);
+
+ return pblist;
+}
Index: linux-2.6.14-rc5-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.14-rc5-mm1.orig/kernel/power/swsusp.c 2005-10-28 23:49:15.000000000 +0200
+++ linux-2.6.14-rc5-mm1/kernel/power/swsusp.c 2005-10-28 23:50:11.000000000 +0200
@@ -645,130 +645,6 @@
return error;
}

-/**
- * On resume, for storing the PBE list and the image,
- * we can only use memory pages that do not conflict with the pages
- * which had been used before suspend.
- *
- * We don't know which pages are usable until we allocate them.
- *
- * Allocated but unusable (ie eaten) memory pages are marked so that
- * swsusp_free() can release them
- */
-
-unsigned long get_safe_page(unsigned gfp_mask)
-{
- unsigned long m;
-
- do {
- m = get_zeroed_page(gfp_mask);
- if (m && PageNosaveFree(virt_to_page(m)))
- /* This is for swsusp_free() */
- SetPageNosave(virt_to_page(m));
- } while (m && PageNosaveFree(virt_to_page(m)));
- if (m) {
- /* This is for swsusp_free() */
- SetPageNosave(virt_to_page(m));
- SetPageNosaveFree(virt_to_page(m));
- }
- return m;
-}
-
-/**
- * check_pagedir - We ensure here that pages that the PBEs point to
- * won't collide with pages where we're going to restore from the loaded
- * pages later
- */
-
-static int check_pagedir(struct pbe *pblist)
-{
- struct pbe *p;
-
- /* This is necessary, so that we can free allocated pages
- * in case of failure
- */
- for_each_pbe (p, pblist)
- p->address = 0UL;
-
- for_each_pbe (p, pblist) {
- p->address = get_safe_page(GFP_ATOMIC);
- if (!p->address)
- return -ENOMEM;
- }
- return 0;
-}
-
-/**
- * swsusp_pagedir_relocate - It is possible, that some memory pages
- * occupied by the list of PBEs collide with pages where we're going to
- * restore from the loaded pages later. We relocate them here.
- */
-
-static struct pbe * swsusp_pagedir_relocate(struct pbe *pblist)
-{
- struct zone *zone;
- unsigned long zone_pfn;
- struct pbe *pbpage, *tail, *p;
- void *m;
- int rel = 0;
-
- if (!pblist) /* a sanity check */
- return NULL;
-
- pr_debug("swsusp: Relocating pagedir (%lu pages to check)\n",
- swsusp_info.pagedir_pages);
-
- /* Clear page flags */
-
- for_each_zone (zone) {
- for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
- if (pfn_valid(zone_pfn + zone->zone_start_pfn))
- ClearPageNosaveFree(pfn_to_page(zone_pfn +
- zone->zone_start_pfn));
- }
-
- /* Mark orig addresses */
-
- for_each_pbe (p, pblist)
- SetPageNosaveFree(virt_to_page(p->orig_address));
-
- tail = pblist + PB_PAGE_SKIP;
-
- /* Relocate colliding pages */
-
- for_each_pb_page (pbpage, pblist) {
- if (PageNosaveFree(virt_to_page((unsigned long)pbpage))) {
- m = (void *)get_safe_page(GFP_ATOMIC | __GFP_COLD);
- if (!m)
- return NULL;
- memcpy(m, (void *)pbpage, PAGE_SIZE);
- if (pbpage == pblist)
- pblist = (struct pbe *)m;
- else
- tail->next = (struct pbe *)m;
- pbpage = (struct pbe *)m;
-
- /* We have to link the PBEs again */
- for (p = pbpage; p < pbpage + PB_PAGE_SKIP; p++)
- if (p->next) /* needed to save the end */
- p->next = p + 1;
-
- rel++;
- }
- tail = pbpage + PB_PAGE_SKIP;
- }
-
- /* This is for swsusp_free() */
- for_each_pb_page (pbpage, pblist) {
- SetPageNosave(virt_to_page(pbpage));
- SetPageNosaveFree(virt_to_page(pbpage));
- }
-
- printk("swsusp: Relocated %d pages\n", rel);
-
- return pblist;
-}
-
/*
* Using bio to read from swap.
* This code requires a bit more work than just using buffer heads
@@ -1015,6 +891,8 @@

create_pbe_list(p, nr_copy_pages);

+ pr_debug("swsusp: Relocating pagedir (%lu pages to check)\n",
+ swsusp_info.pagedir_pages);
if (!(pagedir_nosave = swsusp_pagedir_relocate(p)))
return -ENOMEM;

Pavel Machek

unread,
Oct 29, 2005, 6:30:06 PM10/29/05
to
On So 29-10-05 22:06:22, Rafael J. Wysocki wrote:
> This is another preliminary step. It moves the snapshot-handling functions
> remaining in swsusp.c to snapshot.c (moving the code without changing
> the functionality) and makes the next patch be more clear (in my opinion).
>
> Signed-off-by: Rafael J. Wysocki <r...@sisk.pl>

ACK and thanks.
Pavel
--
Thanks, Sharp!

Pavel Machek

unread,
Oct 29, 2005, 7:10:06 PM10/29/05
to
Hi!

> I have divided the changes into some more or less logical steps for clarity.
> Although the code has been designed as proof-of-concept, it is functional
> and has been tested on x86-64, except for the cryptographic functionality
> and error paths.

Don't worry about crypto paths too much. It is my fault, I should not
have taken them in the first place. Just ask ast for testing when you
are reasonably confident it will work.
Pavel
--
Thanks, Sharp!

Pavel Machek

unread,
Oct 29, 2005, 7:30:07 PM10/29/05
to
Hi!

> This is a non-essential step making the next patch possible. No functionality
> changes.

If you can push this before 3/6, that would be nice.

Pavel
--
Thanks, Sharp!

Pavel Machek

unread,
Oct 30, 2005, 6:50:11 AM10/30/05
to
Hi!

> > > This is a non-essential step making the next patch possible. No functionality
> > > changes.
> >
> > If you can push this before 3/6, that would be nice.
>

> Sure. I think I'll send the two patches you have already acked and this one
> to Andrew as a separate series. Then I'll get back to the 3/6 etc.

Splitting swsusp.c into swsusp.c and swap.c looked nice, too, btw. We
may finally get rid of "swsusp" name :-). But, in the long term, I'd
like to see swap reading/writing support removed from kernel. That
2.8.0 or something, through...

Rafael J. Wysocki

unread,
Oct 30, 2005, 6:50:12 AM10/30/05
to
Hi,

On Sunday, 30 of October 2005 01:21, Pavel Machek wrote:
> Hi!
>
> > This is a non-essential step making the next patch possible. No functionality
> > changes.
>
> If you can push this before 3/6, that would be nice.

Sure. I think I'll send the two patches you have already acked and this one


to Andrew as a separate series. Then I'll get back to the 3/6 etc.

Greetings,
Rafael

Pavel Machek

unread,
Oct 30, 2005, 7:20:16 AM10/30/05
to
Hi!

> > I have divided the changes into some more or less logical steps for clarity.
> > Although the code has been designed as proof-of-concept, it is functional
> > and has been tested on x86-64, except for the cryptographic functionality
> > and error paths.
>
> Don't worry about crypto paths too much. It is my fault, I should not
> have taken them in the first place. Just ask ast for testing when you
> are reasonably confident it will work.

We exchanged few emails with ast, and he'll probably not kill me if I
remove crypto swsusp support. That's good news ;-).

Reply all
Reply to author
Forward
0 new messages