The learnware paradigm, proposed by Professor Zhi-Hua Zhou in 2016 [1, 2], aims to build a vast model platform system, i.e., a learnware dock system, which systematically accommodates and organizes models shared by machine learning developers worldwide, and can efficiently identify and assemble existing helpful model(s) to solve future tasks in a unified way.
The learnware package provides a fundamental implementation of the central concepts and procedures within the learnware paradigm. Its well-structured design ensures high scalability and facilitates the seamless integration of additional features and techniques in the future.
A learnware consists of a high-performance machine learning model and specifications that characterize the model, i.e., "Learnware = Model + Specification".These specifications, encompassing both semantic and statistical aspects, detail the model's functionality and statistical information, making it easier for future users to identify and reuse these models.
The architecture is designed based on the guidelines including decoupling, autonomy, reusability, and scalability. The above diagram illustrates the framework from the perspectives of both modules and workflows.
In the learnware package, besides the base classes, many core functionalities such as "learnware specification generation" and "learnware deployment" rely on the torch library. Users have the option to manually install torch, or they can directly use the following command to install the learnware package:
Note: However, it's crucial to note that due to the potential complexity of the user's local environment, installing learnware[full] does not guarantee that torch will successfully invoke CUDA in the user's local setting.
To facilitate the construction of a learnware, we provide a Learnware Template that users can use as a basis for building their own learnware. We've also detailed the format of the learnware zip package in Learnware Preparation.
Before uploading your learnware to the Learnware Market, you'll need to create a semantic specification, semantic_spec. This involves selecting or inputting values for predefined semantic tags to describe the features of your task and model.
For instance, the following code illustrates the semantic specification for a Scikit-Learn type model. This model is tailored for education scenarios and performs classification tasks on tabular data:
To find learnwares that align with your task's purpose, you'll need to provide a semantic specification, user_semantic, that outlines your task's characteristics. The Learnware Market will then perform an initial search using user_semantic, identifying potentially useful learnwares with models that solve tasks similar to your requirements.
If you decide in favor of providing your own statistical specification file, stat.json, the Learnware Market can further refine the selection of learnwares from the previous step. This second-stage search leverages statistical information to identify one or more learnwares that are most likely to be beneficial for your task.
With the list of learnwares, mixture_learnware_list, returned from the previous step, you can readily apply them to make predictions on your own data, bypassing the need to train a model from scratch. We provide two methods for reusing a given list of learnwares: JobSelectorReuser and AveragingReuser. Substitute test_x in the code snippet below with your testing data, and you're all set to reuse learnwares:
We also provide two methods when the user has labeled data for reusing a given list of learnwares: EnsemblePruningReuser and FeatureAugmentReuser. Substitute test_x in the code snippet below with your testing data, and substitute train_x, train_y with your training labeled data, and you're all set to reuse learnwares:
The learnware package also offers automated workflow examples. This includes preparing learnwares, uploading and deleting learnwares from the market, and searching for learnwares using both semantic and statistical specifications. To experience the basic workflow of the learnware package, the users can run test/test_workflow/test_workflow.py to try the basic workflow of learnware.
We build various types of experimental scenarios and conduct extensive empirical study to evaluate the baseline algorithms for specification generation, learnware identification, and reuse on tabular, image, and text data.
On various tabular datasets, we initially evaluate the performance of identifying and reusing learnwares from the learnware market that share the same feature space as the user's tasks. Additionally, since tabular tasks often come from heterogeneous feature spaces, we also assess the identification and reuse of learnwares from different feature spaces.
Our study utilize three public datasets in the field of sales forecasting: Predict Future Sales (PFS), M5 Forecasting (M5), and Corporacion. To enrich the data, we apply diverse feature engineering methods to these datasets. Then we divide each dataset by store and further split the data for each store into training and test sets. A LightGBM is trained on each Corporacion and PFS training set, while the test sets and M5 datasets are reversed to construct user tasks. This results in an experimental market consisting of 265 learnwares, encompassing five types of feature spaces and two types of label spaces. All these learnwares have been uploaded to the Beimingwu system.
In the homogeneous cases, the 53 stores within the PFS dataset function as 53 individual users. Each store utilizes its own test data as user data and applies the same feature engineering approach used in the learnware market. These users could subsequently search for homogeneous learnwares within the market that possessed the same feature spaces as their tasks.
We conduct a comparison among different baseline algorithms when the users have no labeled data or limited amounts of labeled data. The average losses over all users are illustrated in the table below. It shows that unlabeled methods are much better than random choosing and deploying one learnware from the market.
The figure below showcases the results for different amounts of labeled data provided by the user; for each user, we conducted multiple experiments repeatedly and calculated the mean and standard deviation of the losses; the average losses over all users are illustrated in the figure. It illustrates that when users have limited training data, identifying and reusing single or multiple learnwares yields superior performance compared to user's self-trained models.
We consider the 41 stores within the PFS dataset as users, generating their user data using a unique feature engineering approach that differ from the methods employed by the learnwares in the market. As a result, while some learnwares in the market are also designed for the PFS dataset, the feature spaces do not align exactly.
In this experimental setup, we examine various data-free reusers. The results in the following table indicate that even when users lack labeled data, the market exhibits strong performance, particularly with the AverageEnsemble method that reuses multiple learnwares.
We employ three distinct feature engineering methods on all the ten stores from the M5 dataset, resulting in a total of 30 users. Although the overall task of sales forecasting aligns with the tasks addressed by the learnwares in the market, there are no learnwares specifically designed to satisfy the M5 sales forecasting requirements.
In the following figure, we present the loss curves for the user's self-trained model and several learnware reuse methods. It is evident that heterogeneous learnwares prove beneficial with a limited amount of the user's labeled data, facilitating better alignment with the user's specific task.
Second, we assess our algorithms on image datasets. It is worth noting that images of different sizes could be standardized through resizing, eliminating the need to consider heterogeneous feature cases.
We choose the famous image classification dataset CIFAR-10, which consists of 60000 32x32 color images in 10 classes. A total of 50 learnwares are uploaded: each learnware contains a convolutional neural network trained on an unbalanced subset that includs 12000 samples from four categories with a sampling ratio of 0.4:0.4:0.1:0.1.A total of 100 user tasks are tested and each user task consists of 3000 samples of CIFAR-10 with six categories with a sampling ratio of 0.3:0.3:0.1:0.1:0.1:0.1.
We assess the average performance of various methods using 1 - Accuracy as the loss metric. The following table and figure show that when users face a scarcity of labeled data or possess only a limited amount of it (less than 2000 instances), leveraging the learnware market can yield good performances.
We conduct experiments on the well-known text classification dataset: 20-newsgroup, which consists approximately 20000 newsgroup documents partitioned across 20 different newsgroups.Similar to the image experiments, a total of 50 learnwares are uploaded. Each learnware is trained on a subset that includes only half of the samples from three superclasses and the model in it is a tf-idf feature extractor combined with a naive Bayes classifier. We define 10 user tasks, and each of them encompasses two superclasses.
The results are depicted in the following table and figure. Similarly, even when no labeled data is provided, the performance achieved through learnware identification and reuse can match that of the best learnware in the market. Additionally, utilizing the learnware market allows for a reduction of approximately 2000 samples compared to training models from scratch.
Learnware is still young and may contain bugs and issues. We highly value and encourage contributions from the community. For detailed development guidelines, please consult our Developer Guide. We kindly request that contributors adhere to the provided commit format and pre-commit configuration when participating in the project. Your valuable contributions are greatly appreciated.
03c5feb9e7