Chunmei HU; Jian YANG; Xiangchen YIN

doi:10.21078/JSSI-2023-0134

PDF(1601 KB)

系统科学与信息学报(英文) ›› 2024, Vol. 12 ›› Issue (6) : 758-774. DOI: 10.21078/JSSI-2023-0134

作者信息 +

Optimal Pricing Model of IoT Data Market Based on Profit Maximization

Author information +

文章历史 +

Abstract

The growth of the Internet of Things (IoT) equipment business encourages the collection of large sizes of data. IoT data is being regarded as a new digital asset which contains valuable information. As a result, IoT data transactions are gaining in popularity, and data markets are starting to emerge. To support the smooth flow of data transactions, several academics offer market models and pricing techniques from various perspectives. However, the factors considered in the pricing model are still not comprehensive enough, and the willingness to sell of data providers has been ignored. Therefore, this paper investigates the pricing and profit maximization problems for the IoT data market who considers the willingness of data providers as well as data quality when purchasing data. Firstly, we analyze the factors that impact data providers' willingness to sell and give a definition of the willingness function. Secondly, we propose a data quality evaluation method and define a joint utility function based on data size and data quality. In addition, we build the profit function model of data market and give theoretical analysis. Finally, numerical experiments demonstrate that the suggested pricing mechanism can benefit the data market participants the most.

Key words

IoT data market / utility function / pricing model

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

Chunmei HU , Jian YANG , Xiangchen YIN. . 系统科学与信息学报(英文), 2024, 12(6): 758-774 https://doi.org/10.21078/JSSI-2023-0134

Chunmei HU , Jian YANG , Xiangchen YIN. Optimal Pricing Model of IoT Data Market Based on Profit Maximization. Journal of Systems Science and Information, 2024, 12(6): 758-774 https://doi.org/10.21078/JSSI-2023-0134

1 Introduction

In recent years, big data is applied to almost all aspects of people's life and work. Data, like land, labor and capital, has become important production factors of the country. Especially with the emergence of IoT technology, many devices in our life can transmit data through the Internet. People expect to use machine learning methods to transform these data into useful information and knowledge, which they use to a variety of fields, e.g., business management, manufacturing control, market analysis, engineering design, and scientific investigation, etc. It has become a trend to accelerate data transactions through a market-oriented approach^[1]. Microsoft Windows Azure data market, Infochimps, and Factual are examples of typical data markets^[2]. Therefore, an IoT data market model and optimal pricing scheme are needed to achieve the optimal utilization of IoT data resources.

Data providers and consumers can engage each other by bringing data to the data market as a commodity, sharing and enhancing the data utility^[3]. To establish an effective data trading market and ensure the interests of all participants, several problems need to be solved, such as determining the appropriate price for the raw data or services. Big data market models and pricing mechanisms have been proposed in a number of studies. The three stakeholders in a typical data market model are the service user, the service provider, and the data provider^{[4, 5]}. Users of the service either purchase data analysis services from service providers or buy the data provided by the data suppliers for use in making decisions. Related researchers have studied different pricing models, e.g., based on data size, data quality, information entropy, or based on auction, game. Furthermore, most studies focus on the value of data and customers' willingness-to-buy (WTB), but they rarely consider data providers' willingness-to-sell (WTS). When the recompense does not meet their expectations as a data provider, they have the right to refuse to sell their own data. Consequently, the WTS of data providers are considered in our work, with the aim of balancing the interests of various market participants, and then develop a model for the IoT data market that maximizes profits. In summary, the following are the primary contributions of this paper:

1)

From the perspective of economics, the transaction behavior of each participant in the IoT data market is analysed. And an interesting functional formula is constructed to describe the WTS of the data provider to solve the problems ignored in the previous work.

2)

Based on the diminishing marginal utility rule, a joint utility function model of data size and data quality is constructed, which lays a foundation for the establishment of pricing model.

3)

We build a profit maximization model for the IoT data market that considers both the WTB of data consumers and the WTS of data providers. According to the numerical experimental data, the model is effective and the optimal solution to the profit function is obtained.

Other sections of the paper are arranged as follows: The associated work is reviewed in Section 2. Next, Section 3 describes the system model in detail. Section 4 discusses the problem of profit maximization and makes a theoretical analysis. The corresponding numerical experiments are carried out in Section 5. The final summary is provided in Section 6.

2 Related Works

How to set an acceptable price for data has become a matter of widespread concern with the rise of the IoT data market. Despite some challenges, there are still studies focusing on pricing problem. Existing pricing approaches are divided into two groups: data attribute-based pricing and market-based pricing.

From a data attribute perspective, for example, data size, data quality, privacy level, etc., relevant scholars have done some research. Niyato, et al.^[6] proposed a candidate utility function model related to data volume and then developed the pricing strategy of IoT. Alsheikh, et al.^[7] also discussed the optimal pricing problem of the IoT. Whether in independent service sales or service bundle sales, service providers will optimize the data size and service subscription fees to maximize their profits. It is unreasonable to only focus on data size to measure data utility. Some researchers began to be interested in the data quality from multiple dimensions, and then gave the pricing model. Data quality and the version control method were both taken into account in the two-level mathematical programming model for data pricing suggested by Yu, et al.^[8]. In order to determine the most effective versioning strategy for information products, Li, et al.^[9] reconsidered two key assumptions, namely the data product's quality and the consumers' self-selection behavior. The above studies are mainly focused on service providers and revolve around the issue of maximizing profits in the data market. The service provider evaluates data quality based on a certain strategy and pricing based on data quality. In order to facilitate transactions, the data market will consider the utility of data consumers and the willingness to buy. But they almost never thinks from the perspective of a data provider. Nowadays, many applications regularly collect and analyze the detailed personal data from large databases through data mining. The application users can sometimes benefit from sharing these data. Shen, et al.^[10] and Yang, et al.^[11] discussed the data pricing problem based on different privacy levels, and gave the corresponding pricing model. At this point, the security and privacy of personal big data have received attention, with the expectation of achieving a balance between data privacy protection and sharing. Nevertheless, the compensation mechanism is still provided by the data platform, which holds the initiative.

From a market perspective, auction pricing is one of the common pricing methods. To optimize the benefits of all stakeholders, Borjigin, et al.^[12] developed a double-auction method which was used to both service function chain routing and NFV pricing adjustment. Zhao, et al.^[13] proposed an efficient auction mechanism for high timeliness data pricing under mobile and wireless networks. Game theory is a mathematical analysis method for studying group behavior, which plays a role in many fields such as decision analysis, economics, sociology, political science, and computer science. In [14], game theory was introduced to solve the problems in edge computing. The aim of the game is to minimize the end-to-end network's energy consumption and increase its resource utilization efficiency. In [15], the researchers established an effective and responsible carbon emissions trading system based on a blockchain framework of game theory. In [16], the paper studied the bank run in game theory. In recent years, game theory has gradually been applied to data market pricing. Stackelberg model is a price leadership model, which can be used to solve market pricing problems. In a cloud scenario, Valerio, et al.^[17] took into account the sale of resources by an IaaS provider to SaaS providers. The authors adopted the Stackelberg game model in the pricing strategy of the second stage. Xiao, et al.^[18] constructed an economic model of the IoT data market consisting of data providers, service providers and service users. In this study, the Stackelberg game method was also used. In [19], a three-layer Stackelberg game among data owner, service provider and data buyer to obtain an optimal pricing strategy in the car sharing data market was formulated. Evolutionary game theory is also often used to solve the pricing problems faced in the data market. Zeng, et al.^[20] proposed a hybrid pricing mechanism for the transaction of the sensed data. They then used evolutionary game theory to analyze the dynamics of user behavior and the evolution of data markets. The main characteristic of game pricing is to consider the interactions among the three parties in the data market. Additionally, some academics have explored the data pricing model from other aspects. The most typical strategies among them are subscription, query-based pricing, bundling, and discrimination^[2]. For example, Miao, et al.^[21] explored the pricing problem for queries over incomplete data. And a sophisticated pricing mechanism was proposed, which takes a series of essential factors into consideration. However, these economic models still pay more attention to the interests of the data market and the WTB of the consumers, and the WTS of data providers has received subtle attention. Therefore, it is essential to develop such a big data market model and the efficient pricing strategy.

3 System Model

This paper discusses the pricing problem based on a typical market framework. This model is applicable in many data service-oriented scenarios, such as data marketplace Infochimps, crowdsourcing services platform Placemeter, IoT data trading platform Thingful, etc^[7]. The specific architecture is shown in Figure 1, consists of three stakeholders: data consumers, data market and data providers.

Figure 1 An IoT data market trading framework

Full size|PPT slide

Data providers: Data providers contribute the raw data generated by different devices, such as the smart phones, IoT gadgets and sensor nodes^[22]. Since data collection and storage need to consume resources, the data provider receives appropriate monetary compensation from data market.

Data market: The data market is a trusted platform by data providers and consumers where data services or datasets of different kinds are traded as assets. As a data platform, Factual cleans, organizes, and layers the collected data, and then provides it to developers, businesses, and organizations in need for use. The core concept of this platform is data neutrality, which means that data comes from everyone and should be accessible to everyone.

Data consumers: The consumers search datasets or services in the market according to their own needs and preferences. A data consumer makes purchase decisions according to service value and WTB.

For the convenience of understanding, Table 1 lists the main parameters of the model.

Table 1 Major symbols

Notation	Definition
$r$	Remuneration for data providers
$a$	Price adjustment factor
$q_{k}$	Quality hierarchy of the data product $k$
$ψ$	Data providers' WTS
$M$	Number of data consumers
$U$	Utility function of data
$p$	Subscription fee for the data product
$n_{k}$	Data size of data product $k$

3.1 Willingness-to-Sell of Data Providers

In an IoT data market, data providers' willingness to sell data is frequently tied to the following criteria in order to protect their own interests. First, the data provider usually sets the limit selling price^[23], i.e., the price offered by the buyer shall not be lower than the minimum selling price. Otherwise the data will not be sold. Second, the higher the reward for the same data resources, the higher the WTS is. Third, the more data items sold, the higher the seller's expected reward.

Based on the work of Benndorf, et al.^[24] and the above principles, we give the WTS function of data providers from the perspective of data size, which is defined as follows.

Definition 1 Willingness-To-Sell (WTS)) The willingness-to-sell of data providers depends on the reward received and the size of data provided. We consider the following fraction-based function:

\begin{array}{rcl} ψ = 1 - \frac{1}{1 + r / n_{k}}, \end{array}

(1)

where

r

is the reward received by the data provider, and

n_{k}

is the data size.

We scale

r

and

n_{k}

to the interval [0, 1]. The function trend of formula (1) is shown in Figure 2. Suppose there are two datasets A and B, and the data size involved in dataset A is larger than that of dataset B. We note that when the data market bids the same for both datasets, the data provider of dataset B has a stronger WTS. In other words, it is easier to buy less data at a cheap price. Furthermore, data size is small (

n = 0.015

), the value of WTS quickly tends to 1. The WTS to 1 increases more and more slowly as the size of data increases, which conforms to the rule of diminishing marginal benefits^[25].

Figure 2 WTS as a function of data reward based on the data size

Full size|PPT slide

3.2 Quality Hierarchy

After collecting the data, the data market often needs to evaluate data quality and determine the compensation amount paid to the data provider. Data are evaluated from multiple quality dimensions, where each dimension refers to a specific aspect of data quality^[26]. Our evaluation framework covers four dimensions: Accuracy, Completeness, Redundancy and Consistency. These quality dimensions have practical significance. In the financial field, the accuracy of stock market data is crucial for investors. If real-time data on stock prices is inaccurate, it may lead investors to make incorrect investment decisions. In medical research, the completeness of patients' electronic health record (EHR) data is crucial for the study. If there is a lack of key information in EHR, such as the patient's medication history or allergic reactions, research results based on these incomplete data may mislead doctors' treatment decisions. In Table 2, we introduce and define these four DQ dimensions^{[27, 28]}.

Table 2 Major symbols

Attributes	Description	Formula
Accuracy	The proportion of the correct number of units in the data source.	$A c c u r a c y = 1 - \frac{N u m b e r o f d a t a u n i t s i n e r r o r}{T o t a l n u m b e r o f d a t a u n i t s}$
Completeness	The proportion of complete elements in the dataset.	$C o m p l e t e n e s s = 1 - \frac{N u m b e r o f i n c o m p l e t e e l e m e n t s}{T o t a l n u m b e r o f e l e m e n t s}$
Consistency	Data in a certain domain is often constrained by rules. These rules can be clearly defined business rules or they can be established at run time as part of the data mining process. For example, the article identifier must satisfy a certain regular expression. The consistency of individual attribute values is assessed by the number and severity of inconsistencies indicated by attribute value rule violations. The consistency of the dataset is given by the consistency mean of the attribute values with consistency rules.	The consistency of an attribute value with m rules is defined as: $C o n s i s t e n c y (ω) = \frac{1}{\sum_{i = 1}^{m} c r (ω) g_{i} + 1}$ , where $g_{i} \in [0, 1]$ is the relative importance of rule cr_i. And cr_i(ω) is the violation of consistency rulecr_i, $c r_{i}, c r_{i} (ω) {\begin{cases} 0 & i f ω s a t i s f i e s c r_{i}, \\ 1 & o t h e r w i s e . \end{cases}$ If there are $n$ attribute values constrained by the consistency rule in the dataset, the consistency of the entire dataset is expressed as: Consistency = $\frac{\sum_{j}^{n} C o n s i s t e n c y (ω_{j})}{n}$ .
Redundancy	The proportion of duplicate records in the data source.	$Redundancy = 1 - \frac{N u m b e r o f d u p l i c a t e r e c o r d s}{N u m b e r o f r o w s}$

The calculation formula of data quality score is expressed as the weighted sum of each quality dimension, which is

\begin{aligned} Q (d) & = \sum_{l = 1}^{4} α_{l} d i m e n s i o n [l] \\ s.t. & α_{1} + α_{2} + α_{3} + α_{4} = 1, \end{aligned}

(2)

where

α_{1}

α_{2}

α_{3}

α_{4}

are the weight factors. The values can be reasonably set by the users. The quality score is specified as a number between 0 and 1 and determines the data quality level in this paper.

3.3 Utility of Data and Willingness-to-Buy of Data Consumers

3.3.1 Utility of Data

Utility is one of the most commonly used concepts in economics, and the degree to which consumers satisfy their desires by owning or consuming goods or services is called the utility of goods or services. In this paper, goods or services refer to data products or services. In marketing, by collecting consumer's purchase history and preference data, consumer's behavior patterns can be analyzed, and more accurate marketing strategies can be formulated. This is the utility of data, which can help companies better understand the market and improve marketing effectiveness. In practice, data consumers primarily utilize data mining or machine learning technology to extract valuable information from acquired data, which is used for customer analysis, business decision-making, etc. The main process of data mining is shown in Figure 3. Model construction and training are important links, and the quality of the model has an important impact on the decision-making results. Generally, the more training data used, the better the performance of the model. With the same data size, the model is more effective when the data quality is higher. From this perspective, the utility of the data can be evaluated by the quality of the model. Yang, et al.^[25] evaluated the data utility through a quality hierarchy. Niyato, et al.^[6] defined the data utility function from the dimension of data size. However, it is unreasonable to consider only one side. In this paper, we define a joint utility function based on the quality level

q

and data size

n

, which is formulated as:

U (q_{k}, n_{k}) = 1 - \frac{x_{1}}{1 + x_{2} q_{k} n_{k}}

(3)

Figure 3 The main process of data mining

Full size|PPT slide

where

x_{1} < 1

x_{2} > 0

x_{1}

x_{2}

is the fitting parameter. We assume that utility function satisfies the law of diminishing marginal utility. In other words, the first-order partial derivative of

U

n_{k}

and

q_{k}

is greater than 0, whereas the second-order partial derivative is less than 0.

Classification problem is the most long history in the field of data mining, and it is also a more thorough problem. This research uses principles learned from machine learning classification algorithms to assess the rationality of our utility function. The performance index of evaluating classification problem is classification accuracy. Therefore, data utility is regarded as the accuracy of the classification model.

Assuming that the training set contains

t_{1}

samples and the test set has

t_{2}

samples, we evaluate the model on the test set after building it on the training set. Each test sample can be expressed as

(v_{1}, v_{2}, \dots, v_{n}; o)

, where

v_{i}

is the attribute value and

o

is the actual category. After the test samples are input into the model, the classification model gives the prediction category

\hat{o}

. A good classification model aims to minimize the error between

o

and

\hat{o}

. In the testing stage, we will get the accuracy of the classification model, i.e., data utility. To determine the fitting parameters in formula (3), we conduct experiments from different perspectives using KNN algorithm based on MNIST dataset. In the experiments, the noise ratio of the data set is used to represent the quality of the data^[15]. We select

X

items at random from the total of

N

items in the training data set, and the original labels of the selected items are changed with a random digit between 0 and 9. Then the noise ratio refers to the proportion of noise data in the dataset:

\begin{array}{rcl} n o i s e r a t i o = \frac{X}{N} . \end{array}

(4)

Firstly, we suppose the quality score of the training set is the same, and adjust the data size of the training set for experiment. A series of test points are represented by

{(n_{1}, a_{1}), (n_{2}, a_{2}), \dots,

(n_{l}, a_{l})}

, where

a_{j}

is the value of accuracy,

n_{j} < n_{j + 1}

and

j = 1, 2, \dots, l

. We fit the utility function

U

by minimizing the square error sum:

\begin{array}{rcl} min \sum_{j = 1}^{l} | | a_{j} - U | |^{2} . \end{array}

(5)

Assuming that training set is noise free, i.e., the noise ratio is 0, and conduct experiments based on data sets of different sizes. Figure 4(a) shows the relationship curve between classification accuracy and utility function. It can be observed that the accuracy of classification will increase as the data size does. In addition, the accuracy increases more and more slowly when the data size is large enough. Secondly, assuming that the data size in the training set is the same, the quality of the training set is changed for the experiment. Figure 4(b) demonstrates the fitting relationship between classification accuracy and utility function at different quality levels. It can be shown that as the noise increases, the classification accuracy decreases. When

x_{1}

is set to 0.92 and

x_{2}

is set to 5.3, the utility function can well approximate the actual results. Figure 4(c) shows the results of classification accuracy under different noise ratio and data size. The fitting image of the utility function under various data sizes and quality levels is shown in Figure 4(d). They verifies the rationality of our proposed model.

Figure 4 Estimation of utility function: (a) estimation of utility function under data size; (b) estimation of utility function under noise ratio; (c) the actual accuracy under different data size and noise ratio; (d) utility function values under different data size and noise ratio.

Full size|PPT slide

3.3.2 Willingness-to-Buy of Data Consumers

Compared with the WTS of data providers, data consumers have the willingness-to-buy (WTB) of different data. WTB indicates the highest price that consumers will pay for a data product^[29]. From an economic perspective, WTB depends on the utility of data. Let

ζ

represent the actual WTB of a consumer for a data product,

ν

denote a nominal WTB. Because different consumers have different demands and preferences for data,

ν

is randomly distributed between 0 and

V

(the maximum nominal WTB). To simplify the problem, we adopt the following function to express the actual WTB of a consumer:

\begin{array}{rcl} ζ = ν U (q_{k}, n_{k}) . \end{array}

(6)

4 The Optimal Pricing of Data Market

4.1 Profit Function

The profit of the IoT data market for the data product depends on difference between revenue and cost. On one hand, the cost of data market stems from obtaining raw data, and on the other hand from data storage, processing, and management. In this paper, we ignore other costs and only consider the cost of purchasing data from data providers. In most data transactions, pricing is often based only on the evaluation results of data quality by the data market. But in fact, each data provider has its own view on the value of the data sold. For example, the willingness of data providers to sell is affected by the data size and the remuneration they receive. Therefore, this paper defines the total purchasing cost

C

of the data product from the two aspects of WTS and quality level

q

, which is as follows:

\begin{array}{rcl} \begin{array}{l} C = a (τ ψ + (1 - τ) q_{k}) \\ \begin{array}{c} s . t . & 0 < τ < 1, \end{array} \end{array} \end{array}

(7)

where

C

is the final cost paid by the data market to the data provider, and

a

is the cost adjustment factor. The

τ

is a weight factor, which can be set by user.

Assuming that every consumer in the data market wants to buy data products with greater value and utility with less money, we consider that consumers' WTB meet the following economic principles: Ⅰ) Consumers have a psychological price ceiling for a data product, that is, they will not purchase the product beyond the price; Ⅱ) When the price of data products with the same quality rises, consumers' WTB decreases; Ⅲ) Data consumers prefer to buy high-quality data products. According to formula (6), we have

ξ = V U (q_{k}, n_{k})

for the maximum actual WTB. If there are

M

consumers in the data market for the data product

k

, the probability density function

f (ζ)

indicates WTB, where

ζ \in [0, ξ]

. Then the revenue of the data market for the data product is:

\begin{array}{rcl} Ω = M p P r (ζ \geq p) = p M \int_{p}^{ξ} f (ζ) d ζ, \end{array}

(8)

where

p

is the subscription fee of the data. We assume that users' WTB follows a uniform distribution, and set

V

is 1. The revenue of the data market is expressed as:

\begin{array}{rcl} Ω = M p (ξ - p) = M p (U (q_{k}, n_{k}) - p) = M p (1 - \frac{x_{1}}{1 + x_{2} q_{k} n_{k}} - p) . \end{array}

(9)

Therefore, the profit function of data market is:

\begin{array}{rcl} δ = Ω - C = M p (1 - \frac{x_{1}}{1 + x_{2} q_{k} n_{k}} - p) - a (τ (1 - \frac{1}{1 + a / n_{k}}) + (1 - τ) q_{k}) . \end{array}

(10)

4.2 Optimal Pricing

In this paper, we assume that the trading entities in the data market are completely rational, without considering the impact of the interaction behavior of each participant on data trading. Next, we discuss the optimal pricing problem from two aspects.

1) The parameters

p

and

q_{k}

are tuned, and the others are reset to their default settings. The profit optimization problem is described as:

\begin{array}{rcl} \begin{array}{l} max δ (p, q_{k}) \\ s . t . \begin{array}{c} C_{1} : p \geq 0, \end{array} C_{2} : q_{k} \geq 0. \end{array} \end{array}

(11)

The two constraints in formula (11) respectively represent that the quality level and subscription fee of data products are nonnegative. By differentiating

δ (p, q_{k})

with respect to

p

and

q_{k}

, we have:

\begin{array}{rcl} \frac{\partial δ (p, q_{k})}{\partial p} = - \frac{M (2 p + x_{1} - x_{2} n_{k} q_{k} + 2 x_{2} n_{k} p q_{k} - 1)}{x_{2} n_{k} q_{k} + 1} = 0, \end{array}

(12)

\begin{array}{rcl} \frac{\partial δ (p, q_{k})}{\partial q_{k}} = \frac{a τ - a + 2 a n_{k} q_{k} τ x_{2} (τ - 1) + a {n_{k}}^{2} {q_{k}}^{2} {x_{2}}^{2} (τ - 1) + M n_{k} p x_{1} x_{2}}{{(n_{k} q_{k} x_{2} + 1)}^{2}} = 0. \end{array}

(13)

The closed solution of

p

and

q_{k}

can be obtained by simultaneous equations (12) and (13). The two roots for

p

are:

\begin{array}{rcl} p_{1} = \frac{2 n_{k} x_{2} σ_{5} - σ_{3} - 1 + σ_{1}}{σ_{2}} \begin{array}{c}  \end{array} p_{2} = \frac{2 n_{k} x_{2} σ_{5} - σ_{3} - 1 - σ_{1}}{σ_{2}}, \end{array}

(14)

where

\begin{array}{rcl} \begin{array}{l} σ_{1} = \sqrt{- (2 n_{k} x_{2} (σ_{7} - σ_{5} - σ_{6}) + σ_{3} + σ_{4} - 1) (2 n_{k} x_{2} (σ_{7} + σ_{5} - σ_{6}) - σ_{3} + σ_{4} - 3)}, \\ σ_{2} = 2 (2 n_{k} x_{2} σ_{6} + {n_{k}}^{2} {x_{2}}^{2} {σ_{6}}^{2} + 1), σ_{3} = {n_{k}}^{2} {x_{2}}^{2} {σ_{5}}^{2}, σ_{4} = 2 {n_{k}}^{2} {x_{2}}^{2} σ_{6} σ_{7}, \\ σ_{5} = \frac{σ_{12}}{3 σ_{14}} + \frac{σ_{8}}{2} + \frac{σ_{9}}{2 σ_{8}} + \frac{\sqrt{3} (σ_{8} - \frac{σ_{9}}{σ_{8}}) i}{2}, σ_{6} = σ_{8} - \frac{σ_{12}}{3 σ_{14}} + \frac{σ_{9}}{σ_{8}}, \\ σ_{7} = \frac{σ_{12}}{3 σ_{14}} + \frac{σ_{8}}{2} + \frac{σ_{9}}{2 σ_{8}} - \frac{\sqrt{3} (σ_{8} - \frac{σ_{9}}{σ_{8}}) i}{2}, \\ σ_{8} = {(\sqrt{{(σ_{11} + \frac{{σ_{12}}^{3}}{27 {σ_{14}}^{3}} + σ_{10})}^{2} - {σ_{9}}^{3}} - \frac{{σ_{12}}^{3}}{27 {σ_{14}}^{3}} - σ_{11} - σ_{10})}^{1 / 3}, \\ σ_{9} = \frac{σ_{13}}{3 σ_{14}} + \frac{{σ_{12}}^{2}}{9 {σ_{14}}^{2}}, σ_{10} = \frac{σ_{12} σ_{13}}{6 {σ_{14}}^{2}}, σ_{11} = \frac{M n_{k} x_{2} {x_{1}}^{2} - M n_{k} x_{2} x_{1} + 2 a - 2 a τ}{2 σ_{14}}, \\ σ_{12} = 6 a {n_{k}}^{2} {x_{2}}^{2} - 6 a {n_{k}}^{2} τ {x_{2}}^{2}, σ_{13} = 6 a n_{k} τ x_{2} - 6 a n_{k} x_{2} + M {n_{k}}^{2} x_{1} {x_{2}}^{2}, \\ σ_{14} = 2 c {n_{k}}^{3} {x_{2}}^{3} - 2 c {n_{k}}^{3} τ {x_{2}}^{3} . \end{array} \end{array}

(15)

And there are two roots for

q_{k}

, namely,

\begin{array}{rcl} q_{k 1} = \frac{σ_{2} σ_{3} + σ_{1}}{2 σ_{2} (σ_{5} - \frac{σ_{10}}{3 σ_{11}} + \frac{σ_{7}}{σ_{5}})}, \begin{array}{c}  \end{array} q_{k 2} = \frac{σ_{2} σ_{3} - σ_{1}}{2 σ_{2} (σ_{5} - \frac{σ_{10}}{3 σ_{11}} + \frac{σ_{7}}{σ_{5}})}, \end{array}

(16)

where

\begin{aligned} σ_{1} = \sqrt{σ_{3}^{2} + 4 (σ_{5} - \frac{σ_{10}}{3 σ_{11}} + \frac{σ_{7}}{σ_{5}}) (\frac{σ_{10}}{3 σ_{11}} + \frac{σ_{5}}{2} + \frac{σ_{7}}{2 σ_{5}} - \frac{\sqrt{3} (σ_{5} - \frac{σ_{7}}{σ_{5}}) i}{2})} σ_{2}, \\ σ_{2} = \frac{σ_{9}}{3 σ_{8}} - \frac{σ_{6}}{σ_{4}} + σ_{4}, σ_{3} = \frac{σ_{10}}{3 σ_{11}} + \frac{σ_{5}}{2} + \frac{σ_{7}}{2 σ_{5}} + \frac{\sqrt{3} (σ_{5} - \frac{σ_{7}}{σ_{5}}) i}{2}, \\ σ_{4} = {(\sqrt{{(\frac{σ_{11}}{2 σ_{8}} - \frac{σ_{9}^{3}}{27 σ_{8}^{3}} + \frac{σ_{10} σ_{9}}{6 σ_{8}^{2}})}^{2} + σ_{6}^{3}} - \frac{σ_{11}}{2 σ_{8}} + \frac{σ_{9}^{3}}{27 σ_{8}^{3}} - \frac{σ_{10} σ_{9}}{6 σ_{8}^{2}})}^{1 / 3}, \\ σ_{5} = {(\sqrt{{(\frac{σ_{8}}{2 σ_{11}} + \frac{σ_{10}^{3}}{27 σ_{11}^{3}} + \frac{σ_{10} σ_{9}}{6 σ_{11}^{2}})}^{2} - σ_{7}^{3}} - \frac{σ_{10}^{3}}{27 σ_{11}^{3}} - \frac{σ_{8}}{2 σ_{11}} - \frac{σ_{10} σ_{9}}{6 σ_{11}^{2}})}^{1 / 3}, \\ σ_{6} = \frac{σ_{10}}{3 σ_{8}} - \frac{σ_{9}^{2}}{9 σ_{8}^{2}}, σ_{7} = \frac{σ_{9}}{3 σ_{11}} + \frac{σ_{10}^{2}}{9 σ_{11}^{2}}, σ_{8} = M n_{k} x_{2} (x_{1}^{2} - x_{1}) + 2 a - 2 a τ, \\ σ_{9} = 6 a n_{k} x_{2} (τ - 1) + M n_{k}^{2} x_{1} x_{2}^{2}, σ_{10} = 6 a n_{k}^{2} x_{2}^{2} (1 - τ), \\ σ_{11} = 2 a n_{k}^{3} x_{2}^{3} (1 - τ) . \end{aligned}

(17)

The optimal quality level and subscription fee are obtained by the following formula:

\begin{array}{rcl} (q_{k}^{*}, p^{*}) = \arg_{(q_{k}, p) \in {q_{k 1}, q_{k 2}} \times {p_{1}, p_{2}}} max δ (p, q_{k}) . \end{array}

(18)

Next, we find the second-order partial derivative of

p

and

q_{k}

\begin{array}{rcl} \frac{\partial δ^{2} (p, q_{k})}{\partial p^{2}} = - 2 M < 0, \end{array}

(19)

\begin{array}{rcl} \frac{\partial δ^{2} (p, q_{k})}{\partial {q_{k}}^{2}} = - \frac{2 M {n_{k}}^{2} p x_{1} {x_{2}}^{2}}{{(n_{k} q_{k} x_{2} + 1)}^{3}} < 0. \end{array}

(20)

Since the second-order partial derivatives are less than 0, it shows that if the value of

p

is fixed, the solution

q_{k}

in formula (11) is globally optimal. On the contrary, if

q_{k}

remains unchanged, the solution

p

in formula (11) is globally optimal.

2) The parameters

p

and

n_{k}

are explored, and the others are reset to their default settings. The problem of profit maximization can be expressed as follows:

\begin{array}{rcl} max δ (p, n_{k}) . \end{array}

(21)

Similarly, to find the first derivative of formula (10) with respect to

p

and

n_{k}

, we have:

\begin{array}{rcl} \frac{\partial δ (p, n_{k})}{\partial p} = - \frac{M (2 p + x_{1} - x_{2} n_{k} q_{k} + 2 x_{2} n_{k} p q_{k} - 1)}{x_{2} n_{k} q_{k} + 1} = 0, \end{array}

(22)

\begin{array}{rcl} \frac{\partial δ (p, n_{k})}{\partial n_{k}} = \frac{τ a^{2} (2 n_{k} q_{k} x_{2} + 1 + {n_{k}}^{2} {q_{k}}^{2} {x_{2}}^{2}) + M p x_{1} q_{k} x_{2} (2 a n_{k} + {n_{k}}^{2} + a^{2})}{{(n_{k} q_{k} x_{2} + 1)}^{2} {(a + n_{k})}^{2}} = 0. \end{array}

(23)

There are closed form solutions for

p

and

n_{k}

. And the two roots of

p

are:

p_{1} = \frac{\begin{matrix} (M x_{1} - M x_{1}^{2} - σ_{19} - σ_{15} + σ_{14} - σ_{31} - σ_{17} - σ_{13} + σ_{30} + σ_{16}) + \\ (- M x_{1} σ_{10} + σ_{9} + σ_{7} + σ_{3} + σ_{2} + σ_{8} - σ_{6} + σ_{1} - σ_{5} + σ_{4}) \end{matrix}}{2 σ_{11}}, p_{2} = \frac{\begin{array}{l} (M x_{1} - M x_{1}^{2} - σ_{19} - σ_{15} + σ_{14} - σ_{31} - σ_{17} - σ_{13} + σ_{30} + σ_{16}) - \\ (- M x_{1} σ_{10} + σ_{9} + σ_{7} + σ_{3} + σ_{2} + σ_{8} - σ_{6} + σ_{1} - σ_{5} + σ_{4}) \end{array},}{2 σ_{11}},

(24)

where

(25)

And the two roots of

n_{k}

are:

\begin{array}{rcl} n_{k 1} = \frac{σ_{2} σ_{3} + σ_{1}}{2 (\frac{σ_{7}}{σ_{5}} - \frac{σ_{10}}{3 σ_{11}} + σ_{5}) σ_{2}}, \begin{array}{c}  \end{array} n_{k 2} = \frac{σ_{2} σ_{3} - σ_{1}}{2 (\frac{σ_{7}}{σ_{5}} - \frac{σ_{10}}{3 σ_{11}} + σ_{5}) σ_{2}}, \end{array}

(26)

where

\begin{aligned} σ_{1} = \sqrt{σ_{3}^{2} + 4 (\frac{σ_{7}}{σ_{5}} - \frac{σ_{10}}{3 σ_{11}} + σ_{5}) (\frac{σ_{7}}{2 σ_{5}} + \frac{σ_{10}}{3 σ_{11}} + \frac{σ_{5}}{2} - \frac{\sqrt{3} (\frac{σ_{7}}{σ_{5}} - σ_{5}) i}{2})} σ_{2}, \\ σ_{2} = \frac{σ_{6}}{σ_{4}} + \frac{σ_{9}}{3 σ_{8}} - σ_{4}, σ_{3} = \frac{σ_{7}}{2 σ_{5}} + \frac{σ_{10}}{3 σ_{11}} + \frac{σ_{5}}{2} + \frac{\sqrt{3} (\frac{σ_{7}}{σ_{5}} - σ_{5}) i}{2}, \\ σ_{4} = {(\sqrt{{(\frac{σ_{11}}{2 σ_{8}} + \frac{σ_{9}^{3}}{27 σ_{8}^{3}} - \frac{σ_{10} σ_{9}}{6 σ_{8}^{2}})}^{2} + σ_{6}^{3}} - \frac{σ_{11}}{2 σ_{8}} - \frac{σ_{9}^{3}}{27 σ_{8}^{3}} + \frac{σ_{10} σ_{9}}{6 σ_{8}^{2}})}^{1 / 3}, \\ σ_{5} = {(\sqrt{{(\frac{σ_{8}}{2 σ_{11}} + \frac{σ_{10}^{3}}{27 σ_{11}^{3}} - \frac{σ_{10} σ_{9}}{6 σ_{11}^{2}})}^{2} - σ_{7}^{3}} - \frac{σ_{8}}{2 σ_{11}} - \frac{σ_{10}^{3}}{27 σ_{11}^{3}} + \frac{σ_{10} σ_{9}}{6 σ_{11}^{2}})}^{1 / 3}, \\ σ_{6} = \frac{σ_{10}}{3 σ_{8}} - \frac{σ_{9}^{2}}{9 σ_{8}^{2}}, σ_{7} = \frac{σ_{10}^{2}}{9 σ_{11}^{2}} - \frac{σ_{9}}{3 σ_{11}}, \\ σ_{8} = - M q_{k} x_{2} a^{2} x_{1}^{2} + M q_{k} x_{2} a^{2} x_{1} + 2 τ a^{2}, \\ σ_{9} = M a^{2} q_{k}^{2} x_{1} x_{2}^{2} + 6 τ a^{2} q_{k} x_{2} - 2 M a q_{k} x_{1}^{2} x_{2} + 2 M a q_{k} x_{1} x_{2}, \\ σ_{10} = 6 τ a^{2} q_{k}^{2} x_{2}^{2} + 2 M a q_{k}^{2} x_{1} x_{2}^{2} - M q_{k} x_{1}^{2} x_{2} + M q_{k} x_{1} x_{2}, \\ σ_{11} = 2 τ a^{2} q_{k}^{3} x_{2}^{3} + M x_{1} q_{k}^{2} x_{2}^{2} . \end{aligned}

(27)

The optimal data size and subscription fee can be obtained by the following formula:

\begin{array}{rcl} ({n_{k}}^{*}, p^{*}) = \arg_{(n_{k}, p) \in {n_{k 1}, n_{k 2}} \times {p_{1}, p_{2}}} max δ (p, n_{k}) . \end{array}

(28)

Similarly, the second-order partial derivative of

p

and

n_{k}

are all less than 0. It shows that if

p

remains unchanged, the solution

n_{k}

in formula (21) is globally optimal. Correspondingly, if

n_{k}

remains unchanged, the solution

p

in formula (21) is globally optimal.

5 Numerical Experiment

In this paper, we set specific parameters for experiments and give representative numerical results of the optimal pricing model. Firstly, the optimal solution of subscription fee

p

and quality level

q

is discussed. We set the parameters as follows: the number of consumers

M

in the data market is 500, the quality level

q

and subscription fee

p

are taken from the interval [0.1, 1], and the parameter

τ

is set to 0.5,

n

is 0.2 and

a

is 0.08. The parameters of utility function

U

use the experimental results in Section 3.3.1.

Figure 5(a) is a 3D surface diagram of

p

q

and profit, showing the trend of profit as

q

and

p

change with a fixed data size. Obviously, when taking the optimal value of data quality level and subscription fee, the data market can obtain the maximum profit. To be more intuitive, we also give the sections of Figure 5(a). Figure 6(a) shows that for data with the same quality, as the subscription fee

p

increases gradually, the profit of the data market increases first and then decreases. So the data market's profit reaches the maximum at a certain subscription fee. In addition, when the data quality is poor, we can see that the market profit decreases quickly with the increase of subscription fee. Since data with poor quality levels is less valuable to customers, they are unwilling to purchase even if the subscription fee is low. The abscissa in Figure 7(a) represents the quality level. When the data quality is better, the maximum profit of the data market also increases, indicating that data products with good quality are more popular.

Next, we discuss the optimal solution of subscription fee

p

and data size

n

. The quality level is taken as a fixed value of 0.6, the data size

n

are taken from the interval [0.1, 1], and other parameter are set to default values. The 3D surface diagram of profit function and the corresponding sections are shown in Figure 5(b), Figure 6(b) and Figure 7(b) respectively.

Figure 5(b) shows that the optimal solutions of

p

and

n

which maximize the profit of the data market are obtained. When

n

is the same, the subscription fee

p

is increased, the profit steadily increases, as shown in Figure 6(b). The profit shows a negative tendency after achieving its maximum value, because consumers' WTB will decrease. Similarly, when data size is small, the WTB of the customers is low. According to Figure 7(b), as the size of data increases, the maximum profit in the market also increases, because consumers consider that datasets with larger sizes of data as more valuable.

6 Conclusion

In this paper, we propose a joint utility function based on data size and data quality, and construct a profit maximization model of the data market, in which the WTS of data providers is considered. Firstly, we define the WTS function of the data provider and give a method to evaluate the data quality. Secondly, a joint utility function based on quality level

q

and data size

n

is proposed, and its parameters are fitted by machine learning algorithm. Furthermore, we define the WTB of data consumers based on the proposed utility function. Finally, we propose the data market pricing model and prove the existence of its optimal solution.

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

1	Balazinska M, Howe B, Suciu D. Data markets in the cloud: An opportunity for the database community. Very Large Data Bases. VLDB EndowmentPUB4722, 2011. 本文引用 [1]

2	Li X, Yao J, Liu X, et al. A first look at information entropy-based data pricing. IEEE International Conference on Distributed Computing Systems. IEEE, 2017. 本文引用 [2]

3	Liang F, Yu W, An D, et al. A survey on big data market: Pricing, trading and protection. IEEE Access, 2018, 1. 本文引用 [1]

4	Nget R, Cao Y, Yoshikawa M. How to balance privacy and money through pricing mechanism in personal data market. arXiv preprint arXiv: 1705.02982, 2017. 本文引用 [1]

5	Shen B, Shen Y, Ji W. Profit optimization in service-oriented data market: A Stackelberg game approach. Future Generation Computer Systems, 2019, 95(Jun.): 17- 25. 本文引用 [1]

6	Niyato D, Alsheikh M A, Wang P, et al. Market model and optimal pricing scheme of big data and internet of things (IoT). IEEE, 2016. 本文引用 [2]

7	Alsheikh M A, Hoang D T, Niyato D, et al. Optimal pricing of internet of things: A machine learning approach. IEEE Journal on Selected Areas in Communications, 2020. 本文引用 [2]

8	Yu H, Zhang M. Data pricing strategy based on data quality. Computers & Industrial Engineering, 2017, 112, 1- 10. 本文引用 [1]

9	Li M, Feng H, Chen F, et al. Optimal versioning strategy for information products with behavior-based utility function of heterogeneous customers. Computers & Operations Research, 2013, 40(10): 2374- 2386. 本文引用 [1]

10	Shen Y, Guo B, Shen Y, et al. Personal big data pricing method based on differential privacy. Computers & Security, 2022,(113): 113. 本文引用 [1]

11	Yang J, Xing C. Personal data market optimization pricing model based on privacy level. Information (Switzerland), 2019, 10(4): 123. 本文引用 [1]

12	Borjigin W, Ota K, Dong M. In broker we trust: A double-auction approach for resource allocation in NFV markets. IEEE Transactions on Network and Service Management, 2018, PP(4): 1. 本文引用 [1]

13	Zhao Y, Xu K, Yan F, et al. Auction-based high timeliness data pricing under mobile and wireless networks. IEEE, 2020. 本文引用 [1]

14	Diamanti M, Charatsaris P, Tsiropoulou E E, et al. Incentive mechanism and resource allocation for edge-fog networks driven by multi-dimensional contract and game theories. IEEE Open Journal of the Communications Society, 2022, 3(February): 435- 452. 本文引用 [1]

15	Kazi M K, Hasan M M F. Optimal and secure peer-to-peer carbon emission trading: A game theory informed framework on blockchain. Computers and Chemical Engineering, 2024, 180, 108478. 本文引用 [2]

16	Lu S. Bank run model: The application of game theory. BCP Business & Management, 2023. 本文引用 [1]

17	Valerio V D, Cardellini V, Presti F L. Optimal pricing and service provisioning strategies in cloud systems: A stackelberg game approach. IEEE 6th International Conference on Cloud Computing (CLOUD'13). IEEE, 2013. 本文引用 [1]

18	Xiao Z, He D, Du J. A Stackelberg game pricing through balancing trilateral profits in big data market. IEEE Internet of Things Journal, 2020, 8(16): 12658- 12668. 本文引用 [1]

19	Xu C, Zhu K, Yi C, et al. Data pricing for blockchain-based car sharing: A stackelberg game approach. IEEE, 2020. 本文引用 [1]

20	Zeng X, Gao L, Jiang C, et al. A hybrid pricing mechanism for data sharing in P2P-based mobile crowdsensing. 201816th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). IEEE, 2018. 本文引用 [1]

21	Miao X, Gao Y, Chen L, et al. Towards query pricing on incomplete data. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(8): 4024- 4036. 本文引用 [1]

22	Yang J, Ban X, Xing C. Using greedy random adaptive procedure to solve the user selection problem in mobile crowdsourcing. Sensors (Basel, Switzerland), 2019, 19(14): 3158. 本文引用 [1]

23	Oh H, Park S, Lee G M, et al. Personal data trading scheme for data brokers in IoT data marketplaces. IEEE Access, 2019, 1. 本文引用 [1]

24	Benndorf V, Normann H T. The willingness to sell personal data. Scandinavian Journal of Economics, 2018, 120(4): 1260- 1278. 本文引用 [1]

25	Yang J, Zhao C, Xing C. Big data market optimization pricing model based on data quality. Complexity, 2019, 2019, 1- 10. 本文引用 [2]

26	Stahl F, Vossen G. Fair knapsack pricing for data marketplaces. Springer International Publishing, 2016. 本文引用 [1]

27	Ehrlinger L, Wolfram W. A survey of data quality measurement and monitoring tools. Frontiers in Big Data, 2022, 5, 850611. 本文引用 [1]

28	Vetro A, Canova L, Torchiano M, et al. Open data quality measurement framework: Definition and application to open government data. Government Information Quarterly, 2016, 33(2): 325- 337. 本文引用 [1]

29	Wertenbroch K, Skiera B. Research notes and communications measuring consumers' willingness to pay at the point of purchase. Journal of Marketing Research, 2002, 39(2): 228- 241. 本文引用 [1]

PDF(1601 KB)

1273

Accesses

Citation

Detail

段落导航

Abstract
Key words
引用本文
1 Introduction
2 Related Works
3 System Model
Figure 1 An IoT data market trading framework
Table 1 Major symbols
3.1 Willingness-to-Sell of Data Providers
Figure 2 WTS as a function of data reward based on the data size
3.2 Quality Hierarchy
Table 2 Major symbols
3.3 Utility of Data and Willingness-to-Buy of Data Consumers
3.3.1 Utility of Data
Figure 3 The main process of data mining
Figure 4 Estimation of utility function: (a) estimation of utility function under data size; (b) estimation of utility function under noise ratio; (c) the actual accuracy under different data size and noise ratio; (d) utility function values under different data size and noise ratio.
3.3.2 Willingness-to-Buy of Data Consumers
4 The Optimal Pricing of Data Market
4.1 Profit Function
4.2 Optimal Pricing
5 Numerical Experiment
Figure 5 3D surface diagram
Figure 6 The profile about p
Figure 7 The profile about q, n
6 Conclusion
参考文献

收稿日期	接受日期
2023-09-03	2024-10-29
发布日期
2024-12-20

选择文件类型/文献管理软件名称

选择包含的内容

Abstract

Key words

引用本文

1 Introduction

2 Related Works

3 System Model

Figure 1 An IoT data market trading framework

Table 1 Major symbols

3.1 Willingness-to-Sell of Data Providers

Figure 2 WTS as a function of data reward based on the data size

3.2 Quality Hierarchy

Table 2 Major symbols

3.3 Utility of Data and Willingness-to-Buy of Data Consumers

3.3.1 Utility of Data

Figure 3 The main process of data mining

Figure 4 Estimation of utility function: (a) estimation of utility function under data size; (b) estimation of utility function under noise ratio; (c) the actual accuracy under different data size and noise ratio; (d) utility function values under different data size and noise ratio.

3.3.2 Willingness-to-Buy of Data Consumers

4 The Optimal Pricing of Data Market

4.1 Profit Function

4.2 Optimal Pricing

5 Numerical Experiment

Figure 5 3D surface diagram

Figure 6 The profile about p

Figure 7 The profile about q, n

6 Conclusion

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

Share

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

Abstract

Key words

引用本文

1 Introduction

2 Related Works

3 System Model

Figure 1 An IoT data market trading framework

Table 1 Major symbols

3.1 Willingness-to-Sell of Data Providers

Figure 2 WTS as a function of data reward based on the data size

3.2 Quality Hierarchy

Table 2 Major symbols

3.3 Utility of Data and Willingness-to-Buy of Data Consumers

3.3.1 Utility of Data

Figure 3 The main process of data mining

Figure 4 Estimation of utility function: (a) estimation of utility function under data size; (b) estimation of utility function under noise ratio; (c) the actual accuracy under different data size and noise ratio; (d) utility function values under different data size and noise ratio.

3.3.2 Willingness-to-Buy of Data Consumers

4 The Optimal Pricing of Data Market

4.1 Profit Function

4.2 Optimal Pricing

5 Numerical Experiment

Figure 5 3D surface diagram

Figure 6 The profile about p

Figure 7 The profile about q, n

6 Conclusion

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注