内存数据库设计
我正在尝试使用 HashMap
创建一个内存数据库.我有一个结构 Person
:
I am trying to create an in-memory database using HashMap
. I have a struct Person
:
struct Person {
id: i64,
name: String,
}
impl Person {
pub fn new(id: i64, name: &str) -> Person {
Person {
id: id,
name: name.to_string(),
}
}
pub fn set_name(&mut self, name: &str) {
self.name = name.to_string();
}
}
我有结构Database
:
use std::collections::HashMap;
use std::sync::Arc;
use std::sync::Mutex;
struct Database {
db: Arc<Mutex<HashMap<i64, Person>>>,
}
impl Database {
pub fn new() -> Database {
Database {
db: Arc::new(Mutex::new(HashMap::new())),
}
}
pub fn add_person(&mut self, id: i64, person: Person) {
self.db.lock().unwrap().insert(id, person);
}
pub fn get_person(&self, id: i64) -> Option<&mut Person> {
self.db.lock().unwrap().get_mut(&id)
}
}
以及使用这个数据库的代码:
And code to use this database:
let mut db = Database::new();
db.add_person(1, Person::new(1, "Bob"));
我想更改person
的名字:
let mut person = db.get_person(1).unwrap();
person.set_name("Bill");
在编译时,我遇到了 Rust 生命周期的问题:
When compiling, I get a problem with Rust lifetimes:
error[E0597]: borrowed value does not live long enough
--> src/main.rs:39:9
|
39 | self.db.lock().unwrap().get_mut(&id)
| ^^^^^^^^^^^^^^^^^^^^^^^ temporary value does not live long enough
40 | }
| - temporary value only lives until here
|
note: borrowed value must be valid for the anonymous lifetime #1 defined on the method body at 38:5...
--> src/main.rs:38:5
|
38 | / pub fn get_person(&self, id: i64) -> Option<&mut Person> {
39 | | self.db.lock().unwrap().get_mut(&id)
40 | | }
| |_____^
如何实施这种方法?
编译器拒绝您的代码,因为它违反了 Rust 强制执行的正确性模型并可能导致崩溃.一方面,如果允许编译get_person()
,则可能会从两个线程调用它并在没有互斥锁保护的情况下修改底层对象,从而导致String
上的数据竞争> 里面的物体.更糟糕的是,即使在单线程场景中,也可以通过执行以下操作来造成严重破坏:
The compiler rejects your code because it violates the correctness model enforced by Rust and could cause crashes. For one, if get_person()
were allowed to compile, one might call it from two threads and modify the underlying object without the protection of the mutex, causing data races on the String
object inside. Worse, one could wreak havoc even in a single-threaded scenario by doing something like:
let mut ref1 = db.get_person(1).unwrap();
let mut ref2 = db.get_person(1).unwrap();
// ERROR - two mutable references to the same object!
let vec: Vec<Person> = vec![];
vec.push(*ref1); // move referenced object to the vector
println!(*ref2); // CRASH - object already moved
要更正代码,您需要调整设计以满足以下约束:
To correct the code, you need to adjust your design to satisfy the following constraints:
- 不允许引用比被引用对象的生命周期更长;
- 在可变引用的生命周期内,可能不存在对该对象的其他引用(可变或不可变).
add_person
方法已经符合这两个规则,因为它会吃掉你传递给它的对象,将它移动到数据库中.
The add_person
method already complies with both rules because it eats the object you pass it, moving it to the database.
如果我们修改 get_person()
以返回一个不可变的引用会怎样?
What if we modified get_person()
to return an immutable reference?
pub fn get_person(&self, id: i64) -> Option<&Person> {
self.db.lock().unwrap().get(&id)
}
即使是这个看似无辜的版本仍然无法编译!那是因为它违反了第一条规则.Rust 无法静态证明引用不会超过数据库本身,因为数据库是在堆上分配并进行引用计数的,因此可以随时删除它.但是,即使有可能以某种方式明确声明引用的生命周期,该引用可能无法超过数据库,但在解锁互斥锁后保留引用将允许数据竞争.根本没有办法实现 get_person()
并且仍然保持线程安全.
Even this seemingly innocent version still doesn't compile! That is because it violates the first rule. Rust cannot statically prove that the reference will not outlive the database itself, since the database is allocated on the heap and reference-counted, so it can be dropped at any time. But even if it were possible to somehow explicitly declare the lifetime of the reference to one that provably couldn't outlive the database, retaining the reference after unlocking the mutex would allow data races. There is simply no way to implement get_person()
and still retain thread safety.
读取的线程安全实现可以选择返回数据的副本.Person
可以实现 clone()
方法并且 get_person()
可以像这样调用它:
A thread-safe implementation of a read can opt to return a copy of the data. Person
can implement the clone()
method and get_person()
can invoke it like this:
#[derive(Clone)]
struct Person {
id: i64,
name: String
}
// ...
pub fn get_person(&self, id: i64) -> Option<Person> {
self.db.lock().unwrap().get(&id).cloned()
}
这种更改不适用于 get_person()
的其他用例,其中该方法用于获取可变引用以更改数据库中的人员的明确目的.获取对共享资源的可变引用违反了第二条规则,并可能导致如上所示的崩溃.有几种方法可以使其安全.一种是在数据库中提供一个代理来设置每个Person
字段:
This kind of change won't work for the other use case of get_person()
, where the method is used for the express purpose of obtaining a mutable reference to change the person in the database. Obtaining a mutable reference to a shared resource violates the second rule and could lead to crashes as shown above. There are several ways to make it safe. One is by providing a proxy in the database for setting each Person
field:
pub fn set_person_name(&self, id: i64, new_name: String) -> bool {
match self.db.lock().unwrap().get_mut(&id) {
Some(mut person) => {
person.name = new_name;
true
}
None => false
}
}
随着 Person
上字段数量的增加,这很快就会变得乏味.它也可能变慢,因为每次访问都必须获得一个单独的互斥锁.
As the number of fields on Person
grows, this would quickly get tedious. It could also get slow, as a separate mutex lock would have to be acquired for each access.
幸运的是有更好的方法来实现条目的修改.请记住,使用可变引用违反了规则除非 Rust 可以证明该引用不会逃脱"使用它的块.这可以通过反转控制来确保 - 而不是返回可变引用的 get_person()
,我们可以引入一个 modify_person()
,它passes 对可调用对象的可变引用,它可以随心所欲地使用它.例如:
There is fortunately a better way to implement modification of the entry. Remember that using a mutable reference violates the rules unless Rust can prove that the reference won't "escape" the block where it is being used. This can be ensured by inverting the control - instead of a get_person()
that returns the mutable reference, we can introduce a modify_person()
that passes the mutable reference to a callable, which can do whatever it likes with it. For example:
pub fn modify_person<F>(&self, id: i64, f: F) where F: FnOnce(Option<&mut Person>) {
f(self.db.lock().unwrap().get_mut(&id))
}
用法如下:
fn main() {
let mut db = Database::new();
db.add_person(1, Person::new(1, "Bob"));
assert!(db.get_person(1).unwrap().name == "Bob");
db.modify_person(1, |person| {
person.unwrap().set_name("Bill");
});
}
最后,如果您担心 get_person()
克隆 Person
的性能仅仅是为了检查它,那么创建一个不可变版本的modify_person
作为 get_person()
的非复制替代方案:
Finally, if you're worried about the performance of get_person()
cloning Person
for the sole reason of inspecting it, it is trivial to create an immutable version of modify_person
that serves as a non-copying alternative to get_person()
:
pub fn read_person<F, R>(&self, id: i64, f: F) -> R
where F: FnOnce(Option<&Person>) -> R {
f(self.db.lock().unwrap().get(&id))
}
除了共享对 Person
的引用之外,read_person
还允许闭包在选择时返回一个值,通常是从它接收到的对象中获取的值.它的用法类似于 modify_person
的用法,但增加了返回值的可能性:
Besides taking a shared reference to Person
, read_person
is also allowing the closure to return a value if it chooses, typically something it picks up from the object it receives. Its usage would be similar to the usage of modify_person
, with the added possibility of returning a value:
// if Person had an "age" field, we could obtain it like this:
let person_age = db.read_person(1, |person| person.unwrap().age);
// equivalent to the copying definition of db.get_person():
let person_copy = db.read_person(1, |person| person.cloned());