From Event Streams to NeuroSymbolic Programming
A Research Journey Across Semantic Web, ISO 25964, and Modern NeuroSymbolic AI
Over the course of this research exploration, we investigated a surprisingly coherent trajectory across several areas of computer science:
Event sourcing / Kappa architectures
Semantic Web and stream reasoning
ISO 25964-2 vocabulary interoperability
NeuroSymbolic AI
Modern neurosymbolic programming frameworks like Scallop
The original goal was simple: identify the most serious academic projects that connect these domains.
The result was a much deeper picture of how several research traditions have evolved and partially converged.
This article reconstructs the entire exploration—from early Semantic Web infrastructure projects to the most alive NeuroSymbolic AI frameworks today.
1. The Original Question
The starting point of the investigation was:
What are the top-tier academic projects connecting event sourcing (event-driven/Kappa architectures) with Semantic Web technologies and ISO 25964-2 thesaurus interoperability?
The short answer was immediately surprising:
There is no single canonical project combining all three.
Instead, the research landscape splits into two major clusters:
Cluster A — Semantic Event & Stream Processing
RDF stream processing
semantic complex event processing
real-time reasoning over evolving knowledge graphs
Cluster B — Vocabulary Interoperability
SKOS and ISO 25964
thesaurus mapping
controlled vocabulary infrastructure
The overlap between these clusters remains thin, but several projects approach it from different directions.
2. The Foundational Semantic Web Projects
LarKC — The Large Knowledge Collider
One of the most ambitious projects in this space was LarKC, funded under the EU FP7 program between 2008–2011.
LarKC attempted to solve a fundamental challenge:
How do we perform reasoning over web-scale knowledge when the data is noisy, incomplete, heterogeneous, and constantly changing?
Instead of building a single massive reasoner, LarKC proposed a pipeline architecture:
retrieve → abstract → select → reason → decideThis approach introduced several ideas that later became mainstream:
incomplete reasoning
workflow-based reasoning pipelines
pluggable reasoning components
distributed semantic computation
LarKC produced a service-oriented platform allowing reasoning plugins to be composed into distributed workflows.
It also inspired research in:
stream reasoning
semantic middleware
hybrid AI architectures
However, LarKC itself did not survive as a production platform.
Its legacy instead lived on in several subfields.
3. The Rise of Semantic Stream Reasoning
The most influential descendant of LarKC was the stream reasoning research field.
This field attempts to answer a central question:
How can machines reason over continuous streams of data rather than static knowledge bases?
Important systems included:
C-SPARQL
Continuous SPARQL queries over RDF streams.
ETALIS
A semantic complex event processing engine.
CityPulse
A large EU project for real-time semantic processing of smart-city data streams.
These systems introduced key ideas:
RDF streams
continuous queries
semantic event detection
real-time reasoning over knowledge graphs
Although influential academically, many of these systems eventually slowed down or stopped active development.
4. The ISO 25964 Vocabulary Ecosystem
Parallel to the stream reasoning world, another ecosystem evolved around controlled vocabularies.
ISO 25964-2 focuses on interoperability between thesauri and other knowledge organization systems.
Several major infrastructure projects emerged.
Skosmos
A widely used vocabulary publication platform.
Features include:
browsing SKOS vocabularies
SPARQL integration
multilingual thesaurus support
Skosmos remains one of the most active repositories today.
JSKOS Server
A backend infrastructure for vocabulary storage and mapping.
Key capabilities include:
concept management
mapping between vocabularies
concordances and annotations
streaming change notifications via WebSockets
This makes JSKOS Server one of the rare systems combining vocabulary infrastructure with event-style updates.
Cocoda
A collaborative tool for creating mappings between knowledge organization systems.
It enables the practical work required by ISO 25964-2:
vocabulary alignment
mapping creation
cross-domain terminology integration
Together, Skosmos, JSKOS Server, and Cocoda form the most alive modern ecosystem for semantic vocabulary interoperability.
5. The Current Semantic Streaming Landscape
Most early stream-reasoning systems are no longer the center of innovation.
The modern ecosystem includes newer projects such as:
RDF-Connect
A framework for RDF-based streaming data pipelines.
OntopStream
A system enabling streaming virtual knowledge graphs using Apache Flink.
Kolibrie
A modern Rust-based RDF stream reasoning engine.
I-DLV-sr
A logic-based system for reasoning over streaming data using Apache Flink.
Among these, Kolibrie is particularly interesting.
6. Kolibrie — A Modern Stream Reasoning Engine
Kolibrie is developed by the Stream Intelligence Lab at KU Leuven.
It focuses on:
continuous SPARQL queries
reasoning over timestamped triples
sliding and tumbling windows
integration with machine learning
Kolibrie is currently used primarily for:
benchmark evaluation
research experiments
neurosymbolic stream reasoning research
It does not yet power large production systems but is actively used in academic research.
7. The Shift Toward NeuroSymbolic AI
While exploring these systems, the investigation naturally expanded into NeuroSymbolic AI.
NeuroSymbolic systems aim to combine:
neural learning
symbolic reasoning
probabilistic logic
structured knowledge representations
Several major frameworks dominate the academic landscape.
8. The Most Alive NeuroSymbolic AI GitHub Projects
The most active academic projects today include:
Scallop
A neurosymbolic programming language based on Datalog.
DeepProbLog
A probabilistic logic framework integrating neural networks.
PyNeuraLogic / NeuraLogic
A differentiable logic programming framework.
DeepSeaProbLog
An extension supporting richer probabilistic reasoning.
DeepStochLog
A framework combining grammars, probabilities, and neural networks.
PEIRCE
An emerging framework combining LLMs with symbolic reasoning.
These systems represent different approaches to the same core problem:
How can symbolic reasoning and neural learning be integrated into a single framework?
Among them, Scallop stands out as one of the most ambitious projects.
9. The History of Scallop
Scallop evolved through several distinct phases.
Phase 1 — Prehistory
Scallop builds on earlier systems like:
TensorLog
DeepProbLog
probabilistic deductive databases
These systems struggled with scalability due to exact probabilistic reasoning.
Phase 2 — Scallop v1 (2021)
The first Scallop system appeared in a NeurIPS 2021 paper.
Its main innovation was:
scalable differentiable reasoning using Datalog
Instead of evaluating all proofs, Scallop computes the top-k proofs for queries.
This dramatically improved scalability.
Phase 3 — Scallop as a Programming Language (2023)
The PLDI 2023 paper reframed Scallop as a full language.
Key ideas:
relations as the core data model
Datalog as the reasoning language
provenance semirings for differentiable reasoning
At this stage Scallop became a full programming ecosystem including:
compiler
interpreter
REPL
Python/PyTorch bindings
Phase 4 — Education and Community
Scallop became widely used for teaching neurosymbolic programming.
The project hosts tutorials and materials for:
LOG22
SSFT22
PLDI23
SSNP24 summer school
This transformed Scallop from a research prototype into a learning platform for the field.
Phase 5 — Integration with Foundation Models
Recent work extends Scallop to integrate with:
vision models
language models
multimodal foundation models
Plugins such as:
scallop-gptscallop-clip
show how symbolic reasoning can be combined with modern AI systems.
10. Where the Field Stands Today
The investigation revealed a fascinating landscape.
Three major ecosystems coexist:
Semantic Stream Reasoning
Focus on real-time reasoning over data streams.
Vocabulary Interoperability
Focus on controlled vocabularies and thesaurus mapping.
NeuroSymbolic AI
Focus on integrating neural learning with symbolic reasoning.
The biggest gap is the intersection of all three.
There is still no flagship system combining:
event sourcing / Kappa architectures
semantic stream reasoning
ISO 25964 vocabulary interoperability
neurosymbolic reasoning
But the pieces now exist.
11. A Possible Future Architecture
A realistic modern architecture might look like this:
Event Log (Kafka / Kappa architecture)
↓
Semantic Stream Layer
(RDF-Connect / Kolibrie)
↓
NeuroSymbolic Reasoning
(Scallop / DeepProbLog)
↓
Vocabulary & Mapping Layer
(JSKOS Server / Skosmos / Cocoda)Such a stack could support:
real-time knowledge graph reasoning
semantic interoperability
explainable AI decisions
controlled vocabulary governance
12. Final Thoughts
This exploration began with a simple question about event-driven semantic architectures.
It ended by mapping an entire research ecosystem spanning:
Semantic Web infrastructure
stream reasoning
controlled vocabulary interoperability
neuro-symbolic programming
What emerged is a clear lesson:
The future of intelligent systems is likely hybrid.
Not purely neural.
Not purely symbolic.
Not purely streaming.
But a combination of all three.
And the pieces of that architecture are already being built today.

