pub struct BulkLoader { /* private fields */ }
Expand description
A bulk loader allowing to load at lot of data quickly into the store.
Memory usage is configurable using with_max_memory_size_in_megabytes
and the number of used threads with with_num_threads
.
By default the memory consumption target (excluding the system and RocksDB internal consumption)
is around 2GB per thread and 2 threads.
These targets are considered per loaded file.
Usage example with loading a dataset:
use oxigraph::io::RdfFormat;
use oxigraph::model::*;
use oxigraph::store::Store;
let store = Store::new()?;
// quads file insertion
let file =
b"<http://example.com> <http://example.com> <http://example.com> <http://example.com> .";
store
.bulk_loader()
.load_from_reader(RdfFormat::NQuads, file.as_ref())?;
// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, ex))?);
Implementations§
Source§impl BulkLoader
impl BulkLoader
Sourcepub fn with_num_threads(self, num_threads: usize) -> Self
pub fn with_num_threads(self, num_threads: usize) -> Self
Sets the maximal number of threads to be used by the bulk loader per operation.
This number must be at last 2 (one for parsing and one for loading).
The default value is 2.
Sourcepub fn with_max_memory_size_in_megabytes(self, max_memory_size: usize) -> Self
pub fn with_max_memory_size_in_megabytes(self, max_memory_size: usize) -> Self
Sets a rough idea of the maximal amount of memory to be used by this operation.
This number must be at last a few megabytes per thread.
Memory used by RocksDB and the system is not taken into account in this limit. Note that depending on the system behavior this amount might never be reached or be blown up (for example if the data contains very long IRIs or literals).
By default, a target 2GB per used thread is used.
Sourcepub fn on_progress(self, callback: impl Fn(u64) + 'static) -> Self
pub fn on_progress(self, callback: impl Fn(u64) + 'static) -> Self
Adds a callback
evaluated from time to time with the number of loaded triples.
Sourcepub fn on_parse_error(
self,
callback: impl Fn(RdfParseError) -> Result<(), RdfParseError> + 'static,
) -> Self
pub fn on_parse_error( self, callback: impl Fn(RdfParseError) -> Result<(), RdfParseError> + 'static, ) -> Self
Adds a callback
catching all parse errors and choosing if the parsing should continue
by returning Ok
or fail by returning Err
.
By default the parsing fails.
Sourcepub fn load_from_reader(
&self,
parser: impl Into<RdfParser>,
reader: impl Read,
) -> Result<(), LoaderError>
pub fn load_from_reader( &self, parser: impl Into<RdfParser>, reader: impl Read, ) -> Result<(), LoaderError>
Loads a file using the bulk loader.
This function is optimized for large dataset loading speed. For small files, Store::load_from_reader
might be more convenient.
This method is optimized for speed. See the struct documentation for more details.
To get better speed on valid datasets, consider enabling RdfParser::unchecked
option to skip some validations.
Usage example:
use oxigraph::store::Store;
use oxigraph::io::{RdfParser, RdfFormat};
use oxigraph::model::*;
let store = Store::new()?;
// insert a dataset file (former load_dataset method)
let file = b"<http://example.com> <http://example.com> <http://example.com> <http://example.com/g> .";
store.bulk_loader().load_from_reader(
RdfParser::from_format(RdfFormat::NQuads).unchecked(), // we inject a custom parser with options
file.as_ref()
)?;
// insert a graph file (former load_graph method)
let file = b"<> <> <> .";
store.bulk_loader().load_from_reader(
RdfParser::from_format(RdfFormat::Turtle)
.with_base_iri("http://example.com")?
.without_named_graphs() // No named graphs allowed in the input
.with_default_graph(NamedNodeRef::new("http://example.com/g2")?), // we put the file default graph inside of a named graph
file.as_ref()
)?;
// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, NamedNodeRef::new("http://example.com/g")?))?);
assert!(store.contains(QuadRef::new(ex, ex, ex, NamedNodeRef::new("http://example.com/g2")?))?);
Sourcepub fn load_dataset(
&self,
reader: impl Read,
format: impl Into<RdfFormat>,
base_iri: Option<&str>,
) -> Result<(), LoaderError>
👎Deprecated since 0.4.0: use BulkLoader.load_from_reader instead
pub fn load_dataset( &self, reader: impl Read, format: impl Into<RdfFormat>, base_iri: Option<&str>, ) -> Result<(), LoaderError>
Loads a dataset file using the bulk loader.
This function is optimized for large dataset loading speed. For small files, Store::load_dataset
might be more convenient.
This method is optimized for speed. See the struct documentation for more details.
Usage example:
use oxigraph::io::RdfFormat;
use oxigraph::model::*;
use oxigraph::store::Store;
let store = Store::new()?;
// insertion
let file =
b"<http://example.com> <http://example.com> <http://example.com> <http://example.com> .";
store
.bulk_loader()
.load_dataset(file.as_ref(), RdfFormat::NQuads, None)?;
// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, ex))?);
Sourcepub fn load_graph(
&self,
reader: impl Read,
format: impl Into<RdfFormat>,
to_graph_name: impl Into<GraphName>,
base_iri: Option<&str>,
) -> Result<(), LoaderError>
👎Deprecated since 0.4.0: use BulkLoader.load_from_reader instead
pub fn load_graph( &self, reader: impl Read, format: impl Into<RdfFormat>, to_graph_name: impl Into<GraphName>, base_iri: Option<&str>, ) -> Result<(), LoaderError>
Loads a graph file using the bulk loader.
This function is optimized for large graph loading speed. For small files, Store::load_graph
might be more convenient.
This method is optimized for speed. See the struct documentation for more details.
Usage example:
use oxigraph::io::RdfFormat;
use oxigraph::model::*;
use oxigraph::store::Store;
let store = Store::new()?;
// insertion
let file = b"<http://example.com> <http://example.com> <http://example.com> .";
store.bulk_loader().load_graph(
file.as_ref(),
RdfFormat::NTriples,
GraphName::DefaultGraph,
None,
)?;
// we inspect the store contents
let ex = NamedNodeRef::new("http://example.com")?;
assert!(store.contains(QuadRef::new(ex, ex, ex, GraphNameRef::DefaultGraph))?);
Sourcepub fn load_quads(
&self,
quads: impl IntoIterator<Item = impl Into<Quad>>,
) -> Result<(), StorageError>
pub fn load_quads( &self, quads: impl IntoIterator<Item = impl Into<Quad>>, ) -> Result<(), StorageError>
Adds a set of quads using the bulk loader.
This method is optimized for speed. See the struct documentation for more details.
Sourcepub fn load_ok_quads<EI, EO: From<StorageError> + From<EI>>(
&self,
quads: impl IntoIterator<Item = Result<impl Into<Quad>, EI>>,
) -> Result<(), EO>
pub fn load_ok_quads<EI, EO: From<StorageError> + From<EI>>( &self, quads: impl IntoIterator<Item = Result<impl Into<Quad>, EI>>, ) -> Result<(), EO>
Adds a set of quads using the bulk loader while breaking in the middle of the process in case of error.
This method is optimized for speed. See the struct documentation for more details.