oxigraph::store

Struct BulkLoader

Source
pub struct BulkLoader { /* private fields */ }
Expand description

A bulk loader allowing to load at lot of data quickly into the store.

The operations provided here are not atomic. If the operation fails in the middle, only a part of the data may be written to the store. Results might get weird if you delete data during the loading process.

Memory usage is configurable using with_max_memory_size_in_megabytes and the number of used threads with with_num_threads. By default the memory consumption target (excluding the system and RocksDB internal consumption) is around 2GB per thread and 2 threads. These targets are considered per loaded file.

Usage example with loading a dataset:

use oxigraph::io::RdfFormat;
use oxigraph::model::*;
use oxigraph::store::Store;

let store = Store::new()?;

// quads file insertion
let file =
    b"<http://example.com> <http://example.com> <http://example.com> <http://example.com> .";
store
    .bulk_loader()
    .load_from_reader(RdfFormat::NQuads, file.as_ref())?;

// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, ex))?);

Implementations§

Source§

impl BulkLoader

Source

pub fn with_num_threads(self, num_threads: usize) -> Self

Sets the maximal number of threads to be used by the bulk loader per operation.

This number must be at last 2 (one for parsing and one for loading).

The default value is 2.

Source

pub fn with_max_memory_size_in_megabytes(self, max_memory_size: usize) -> Self

Sets a rough idea of the maximal amount of memory to be used by this operation.

This number must be at last a few megabytes per thread.

Memory used by RocksDB and the system is not taken into account in this limit. Note that depending on the system behavior this amount might never be reached or be blown up (for example if the data contains very long IRIs or literals).

By default, a target 2GB per used thread is used.

Source

pub fn on_progress(self, callback: impl Fn(u64) + 'static) -> Self

Adds a callback evaluated from time to time with the number of loaded triples.

Source

pub fn on_parse_error( self, callback: impl Fn(RdfParseError) -> Result<(), RdfParseError> + 'static, ) -> Self

Adds a callback catching all parse errors and choosing if the parsing should continue by returning Ok or fail by returning Err.

By default the parsing fails.

Source

pub fn load_from_reader( &self, parser: impl Into<RdfParser>, reader: impl Read, ) -> Result<(), LoaderError>

Loads a file using the bulk loader.

This function is optimized for large dataset loading speed. For small files, Store::load_from_reader might be more convenient.

This method is not atomic. If the parsing fails in the middle of the file, only a part of it may be written to the store. Results might get weird if you delete data during the loading process.

This method is optimized for speed. See the struct documentation for more details.

To get better speed on valid datasets, consider enabling RdfParser::unchecked option to skip some validations.

Usage example:

use oxigraph::store::Store;
use oxigraph::io::{RdfParser, RdfFormat};
use oxigraph::model::*;

let store = Store::new()?;

// insert a dataset file (former load_dataset method)
let file = b"<http://example.com> <http://example.com> <http://example.com> <http://example.com/g> .";
store.bulk_loader().load_from_reader(
    RdfParser::from_format(RdfFormat::NQuads).unchecked(), // we inject a custom parser with options
    file.as_ref()
)?;

// insert a graph file (former load_graph method)
let file = b"<> <> <> .";
store.bulk_loader().load_from_reader(
    RdfParser::from_format(RdfFormat::Turtle)
        .with_base_iri("http://example.com")?
        .without_named_graphs() // No named graphs allowed in the input
        .with_default_graph(NamedNodeRef::new("http://example.com/g2")?), // we put the file default graph inside of a named graph
    file.as_ref()
)?;

// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, NamedNodeRef::new("http://example.com/g")?))?);
assert!(store.contains(QuadRef::new(ex, ex, ex, NamedNodeRef::new("http://example.com/g2")?))?);
Source

pub fn load_dataset( &self, reader: impl Read, format: impl Into<RdfFormat>, base_iri: Option<&str>, ) -> Result<(), LoaderError>

👎Deprecated since 0.4.0: use BulkLoader.load_from_reader instead

Loads a dataset file using the bulk loader.

This function is optimized for large dataset loading speed. For small files, Store::load_dataset might be more convenient.

This method is not atomic. If the parsing fails in the middle of the file, only a part of it may be written to the store. Results might get weird if you delete data during the loading process.

This method is optimized for speed. See the struct documentation for more details.

Usage example:

use oxigraph::io::RdfFormat;
use oxigraph::model::*;
use oxigraph::store::Store;

let store = Store::new()?;

// insertion
let file =
    b"<http://example.com> <http://example.com> <http://example.com> <http://example.com> .";
store
    .bulk_loader()
    .load_dataset(file.as_ref(), RdfFormat::NQuads, None)?;

// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, ex))?);
Source

pub fn load_graph( &self, reader: impl Read, format: impl Into<RdfFormat>, to_graph_name: impl Into<GraphName>, base_iri: Option<&str>, ) -> Result<(), LoaderError>

👎Deprecated since 0.4.0: use BulkLoader.load_from_reader instead

Loads a graph file using the bulk loader.

This function is optimized for large graph loading speed. For small files, Store::load_graph might be more convenient.

This method is not atomic. If the parsing fails in the middle of the file, only a part of it may be written to the store. Results might get weird if you delete data during the loading process.

This method is optimized for speed. See the struct documentation for more details.

Usage example:

use oxigraph::io::RdfFormat;
use oxigraph::model::*;
use oxigraph::store::Store;

let store = Store::new()?;

// insertion
let file = b"<http://example.com> <http://example.com> <http://example.com> .";
store.bulk_loader().load_graph(
    file.as_ref(),
    RdfFormat::NTriples,
    GraphName::DefaultGraph,
    None,
)?;

// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, GraphNameRef::DefaultGraph))?);
Source

pub fn load_quads( &self, quads: impl IntoIterator<Item = impl Into<Quad>>, ) -> Result<(), StorageError>

Adds a set of quads using the bulk loader.

This method is not atomic. If the process fails in the middle of the file, only a part of the data may be written to the store. Results might get weird if you delete data during the loading process.

This method is optimized for speed. See the struct documentation for more details.

Source

pub fn load_ok_quads<EI, EO: From<StorageError> + From<EI>>( &self, quads: impl IntoIterator<Item = Result<impl Into<Quad>, EI>>, ) -> Result<(), EO>

Adds a set of quads using the bulk loader while breaking in the middle of the process in case of error.

This method is not atomic. If the process fails in the middle of the file, only a part of the data may be written to the store. Results might get weird if you delete data during the loading process.

This method is optimized for speed. See the struct documentation for more details.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V