This document provides a detailed technical overview of Qwello's Knowledge Graph System, including its structure, entity types, relationship types, entity resolution process, graph merging algorithm, and graph enrichment capabilities.
Graph Structure
Qwello generates knowledge graphs with a clear, consistent JSON structure that represents entities and their relationships.
description: Textual description of the relationship
mentioned_on_pages: Array of page numbers where the relationship appears
[custom attributes]: Additional attributes specific to the relationship
Entity Types
Qwello recognizes and classifies entities into various types to provide semantic meaning and enable filtering and visualization.
Core Entity Types
Concept
Represents abstract ideas, theories, principles, or notions.
{
"id": "e1",
"type": "concept",
"name": "Machine Learning",
"attributes": {
"description": "A branch of AI focused on building systems that learn from data",
"mentioned_on_pages": [3, 5, 8],
"subtypes": ["supervised learning", "unsupervised learning", "reinforcement learning"]
}
}
Person
Represents individual people mentioned in the document.
{
"id": "e2",
"type": "person",
"name": "Geoffrey Hinton",
"attributes": {
"description": "Computer scientist known for his work on neural networks",
"mentioned_on_pages": [4, 7],
"affiliation": "University of Toronto",
"role": "researcher"
}
}
Organization
Represents companies, institutions, groups, or other organizational entities.
{
"id": "e10",
"type": "time",
"name": "AI Winter",
"attributes": {
"description": "Period of reduced funding and interest in AI research",
"mentioned_on_pages": [2],
"period": "1970s-1980s",
"significance": "slowed progress in AI development"
}
}
Custom Entity Types
Qwello supports the extension of entity types for domain-specific applications. Custom entity types follow the same structure as core types but with specialized attributes relevant to the domain.
Relationship Types
Qwello identifies various types of relationships between entities to represent how they are connected in the document.
Core Relationship Types
Hierarchical Relationships
Represent parent-child or superclass-subclass connections.
{
"source": "e1",
"target": "e11",
"type": "includes",
"attributes": {
"description": "Machine Learning includes Deep Learning as a subfield",
"mentioned_on_pages": [3],
"relationship_strength": "strong"
}
}
Common hierarchical relationship types:
includes: Entity encompasses another entity
is_part_of: Entity is a component of another entity
is_a: Entity is a type or instance of another entity
contains: Entity physically or conceptually contains another entity
Associative Relationships
Represent general connections between entities.
{
"source": "e2",
"target": "e3",
"type": "affiliated_with",
"attributes": {
"description": "Geoffrey Hinton is affiliated with University of Toronto",
"mentioned_on_pages": [4],
"role": "professor",
"period": "current"
}
}
Common associative relationship types:
related_to: General association between entities
affiliated_with: Connection between person and organization
collaborates_with: Collaborative relationship
associated_with: General connection without specific type
Temporal Relationships
Represent time-based connections between entities.
{
"source": "e7",
"target": "e5",
"type": "preceded",
"attributes": {
"description": "ImageNet Competition 2012 preceded the development of Transformer Architecture",
"mentioned_on_pages": [5],
"time_gap": "5 years"
}
}
Common temporal relationship types:
preceded: Entity came before another entity
followed: Entity came after another entity
during: Entity existed or occurred during another entity
coincided_with: Entities existed or occurred at the same time
Causal Relationships
Represent cause-effect connections between entities.
{
"source": "e7",
"target": "e12",
"type": "led_to",
"attributes": {
"description": "ImageNet Competition 2012 led to increased interest in deep learning",
"mentioned_on_pages": [3],
"significance": "major impact"
}
}
Common causal relationship types:
causes: Entity directly causes another entity
led_to: Entity resulted in or contributed to another entity
enables: Entity makes another entity possible
prevents: Entity stops or hinders another entity
Spatial Relationships
Represent location-based connections between entities.
{
"source": "e3",
"target": "e4",
"type": "located_in",
"attributes": {
"description": "DeepMind is located in London",
"mentioned_on_pages": [6],
"specificity": "city"
}
}
Common spatial relationship types:
located_in: Entity is physically located in another entity
near: Entity is geographically close to another entity
originated_from: Entity came from or started in another entity
operates_in: Entity functions or works in another entity
Functional Relationships
Represent connections based on function or purpose.
{
"source": "e5",
"target": "e13",
"type": "used_for",
"attributes": {
"description": "Transformer Architecture is used for natural language processing",
"mentioned_on_pages": [5, 10],
"effectiveness": "high"
}
}
Common functional relationship types:
used_for: Entity is utilized for another entity
enables: Entity makes another entity possible
supports: Entity provides assistance to another entity
implements: Entity puts another entity into practice
Custom Relationship Types
Qwello supports the creation of custom relationship types for domain-specific applications. Custom relationship types follow the same structure as core types but represent specialized connections relevant to the domain.
Entity Resolution Process
Entity resolution is the process of identifying when different mentions in a document refer to the same real-world entity. This is a critical capability in Qwello that ensures the knowledge graph accurately represents the document's content.
Resolution Algorithm
function resolveEntity(masterGraph: KnowledgeGraph, newEntity: Entity): string | null {
// No entities in master graph yet
if (!masterGraph.entities || masterGraph.entities.length === 0) {
return null;
}
// Calculate match scores for each existing entity
const matchScores = masterGraph.entities.map(existingEntity => {
return {
id: existingEntity.id,
score: calculateMatchScore(existingEntity, newEntity)
};
});
// Find the best match above threshold
const bestMatch = matchScores.reduce((best, current) =>
current.score > best.score ? current : best,
{ id: null, score: 0.5 } // Threshold of 0.5
);
return bestMatch.id;
}
function calculateMatchScore(entity1: Entity, entity2: Entity): number {
let score = 0;
let factors = 0;
// Name similarity (most important factor)
if (entity1.name && entity2.name) {
const nameSimilarity = calculateStringSimilarity(
entity1.name.toLowerCase(),
entity2.name.toLowerCase()
);
score += nameSimilarity * 2; // Double weight for name
factors += 2;
// Check for acronym match
if (isAcronymMatch(entity1.name, entity2.name)) {
score += 1;
factors += 1;
}
}
// Type match
if (entity1.type && entity2.type && entity1.type === entity2.type) {
score += 1;
factors += 1;
}
// Description similarity
if (entity1.attributes?.description && entity2.attributes?.description) {
const descSimilarity = calculateStringSimilarity(
entity1.attributes.description.toLowerCase(),
entity2.attributes.description.toLowerCase()
);
score += descSimilarity;
factors += 1;
}
// Additional attribute matching
const attributeMatchScore = compareAttributes(entity1.attributes, entity2.attributes);
if (attributeMatchScore.factors > 0) {
score += attributeMatchScore.score;
factors += attributeMatchScore.factors;
}
// Return normalized score
return factors > 0 ? score / factors : 0;
}
Resolution Challenges
Entity resolution addresses several challenges:
Name Variations: The same entity might be referred to by different names (e.g., "IBM" and "International Business Machines")
Acronyms: Entities might be referred to by acronyms (e.g., "AI" for "Artificial Intelligence")
Partial References: Entities might be referred to by partial names (e.g., "Turing" instead of "Alan Turing")
Ambiguity: Different entities might share the same name (e.g., "Apple" the company vs. "apple" the fruit)
Cross-Page References: The same entity might be mentioned on different pages with different contexts
Resolution Process Steps
Qwello's entity resolution process involves several steps:
Name Matching: Compare entity names for exact or fuzzy matches
Acronym Resolution: Recognize acronyms and their expanded forms
Contextual Analysis: Use surrounding context to disambiguate similar entities
Attribute Comparison: Compare entity attributes for similarity
Reference Pattern Analysis: Analyze how entities are referenced throughout the document
When entities are resolved as referring to the same concept, their attributes and relationships are merged, creating a more comprehensive representation of the entity.
Graph Merging Algorithm
The graph merging algorithm combines individual page graphs into a unified knowledge graph that represents the entire document.
Merging Implementation
function mergeKnowledgeGraphs(masterGraph: KnowledgeGraph, pageGraph: KnowledgeGraph, pageNum: number): void {
// Track ID mappings from page graph to master graph
const idMap: Record<string, string> = {};
// Process entities
for (const entity of pageGraph.entities || []) {
// Try to resolve this entity to an existing one
const existingId = resolveEntity(masterGraph, entity);
if (existingId) {
// Entity already exists, merge attributes
idMap[entity.id] = existingId;
const existingEntity = masterGraph.entities.find(e => e.id === existingId);
// Merge attributes (avoiding duplicates)
for (const [key, value] of Object.entries(entity.attributes || {})) {
if (!existingEntity.attributes) existingEntity.attributes = {};
if (!existingEntity.attributes[key]) {
existingEntity.attributes[key] = value;
} else if (Array.isArray(existingEntity.attributes[key])) {
// If attribute is array, append new values avoiding duplicates
const newValues = Array.isArray(value) ? value : [value];
for (const newValue of newValues) {
if (!existingEntity.attributes[key].includes(newValue)) {
existingEntity.attributes[key].push(newValue);
}
}
}
}
// Add page reference
if (!existingEntity.attributes.mentioned_on_pages) {
existingEntity.attributes.mentioned_on_pages = [];
}
if (!existingEntity.attributes.mentioned_on_pages.includes(pageNum)) {
existingEntity.attributes.mentioned_on_pages.push(pageNum);
}
} else {
// New entity, add to master graph with new ID
const newId = `e${masterGraph.entities.length + 1}`;
idMap[entity.id] = newId;
// Clone entity with new ID and add page reference
const newEntity = {
...entity,
id: newId,
attributes: {
...(entity.attributes || {}),
mentioned_on_pages: [pageNum]
}
};
masterGraph.entities.push(newEntity);
}
}
// Process relationships
for (const rel of pageGraph.relationships || []) {
// Map source and target IDs to master graph IDs
const sourceId = idMap[rel.source];
const targetId = idMap[rel.target];
// Skip if source or target wasn't mapped
if (!sourceId || !targetId) continue;
// Check if this relationship already exists
const existingRel = masterGraph.relationships.find(r =>
r.source === sourceId &&
r.target === targetId &&
r.type === rel.type
);
if (existingRel) {
// Merge attributes if relationship exists
for (const [key, value] of Object.entries(rel.attributes || {})) {
if (!existingRel.attributes) existingRel.attributes = {};
if (!existingRel.attributes[key]) {
existingRel.attributes[key] = value;
}
}
// Add page reference
if (!existingRel.attributes.mentioned_on_pages) {
existingRel.attributes.mentioned_on_pages = [];
}
if (!existingRel.attributes.mentioned_on_pages.includes(pageNum)) {
existingRel.attributes.mentioned_on_pages.push(pageNum);
}
} else {
// New relationship, add to master graph
const newRel = {
source: sourceId,
target: targetId,
type: rel.type,
attributes: {
...(rel.attributes || {}),
mentioned_on_pages: [pageNum]
}
};
masterGraph.relationships.push(newRel);
}
}
}
Merging Process Steps
The graph merging process involves several steps:
Entity Mapping: Create a mapping between page-level entity IDs and master graph IDs
Entity Resolution: Determine which entities refer to the same concept
Attribute Merging: Combine attributes from multiple mentions of the same entity
Relationship Consolidation: Merge relationships that connect the same entities
Page Reference Tracking: Maintain references to original page locations
Consistency Checking: Ensure the resulting graph maintains logical consistency
This merging process creates a comprehensive knowledge graph that represents the entire document while preserving the context and provenance of each entity and relationship.
Graph Enrichment
After the initial graph is generated and merged, Qwello applies several enrichment processes to enhance the value and utility of the knowledge graph.
Confidence Scoring
Qwello assigns confidence scores to entities and relationships based on various factors:
function assignConfidenceScores(graph: KnowledgeGraph): void {
// Assign entity confidence scores
for (const entity of graph.entities) {
entity.attributes.confidence = calculateEntityConfidence(entity, graph);
}
// Assign relationship confidence scores
for (const relationship of graph.relationships) {
relationship.attributes.confidence = calculateRelationshipConfidence(relationship, graph);
}
}
function calculateEntityConfidence(entity: Entity, graph: KnowledgeGraph): number {
let confidence = 0.5; // Base confidence
// Factors that increase confidence
// 1. Mentioned on multiple pages
if (entity.attributes.mentioned_on_pages) {
const pageCount = entity.attributes.mentioned_on_pages.length;
confidence += Math.min(0.3, pageCount * 0.05); // Up to 0.3 for 6+ pages
}
// 2. Has multiple relationships
const relationshipCount = graph.relationships.filter(
r => r.source === entity.id || r.target === entity.id
).length;
confidence += Math.min(0.2, relationshipCount * 0.02); // Up to 0.2 for 10+ relationships
// 3. Has detailed attributes
const attributeCount = Object.keys(entity.attributes).filter(
key => key !== 'mentioned_on_pages' && key !== 'confidence'
).length;
confidence += Math.min(0.1, attributeCount * 0.02); // Up to 0.1 for 5+ attributes
// Cap confidence at 1.0
return Math.min(1.0, confidence);
}
Topic Clustering
Qwello identifies related groups of entities to form topic clusters:
function identifyTopicClusters(graph: KnowledgeGraph): TopicCluster[] {
// Create a graph representation for clustering
const nodes = graph.entities.map(e => e.id);
const edges = graph.relationships.map(r => ({ source: r.source, target: r.target }));
// Apply community detection algorithm (e.g., Louvain method)
const communities = detectCommunities(nodes, edges);
// Convert communities to topic clusters
return communities.map((communityNodes, index) => {
const entities = communityNodes.map(nodeId =>
graph.entities.find(e => e.id === nodeId)
).filter(Boolean);
// Determine the most common entity types in this cluster
const typeCount = entities.reduce((counts, entity) => {
counts[entity.type] = (counts[entity.type] || 0) + 1;
return counts;
}, {});
const dominantType = Object.entries(typeCount)
.sort((a, b) => b[1] - a[1])
.map(([type]) => type)[0];
// Find the most connected entity as the central entity
const centralEntity = entities.reduce((central, entity) => {
const connectionCount = graph.relationships.filter(
r => r.source === entity.id || r.target === entity.id
).length;
return connectionCount > central.connections
? { entity, connections: connectionCount }
: central;
}, { entity: null, connections: 0 }).entity;
// Generate a name for the cluster
const clusterName = centralEntity
? `${centralEntity.name} Cluster`
: `Topic Cluster ${index + 1}`;
return {
id: `cluster_${index + 1}`,
name: clusterName,
entities: entities.map(e => e.id),
centralEntity: centralEntity?.id,
dominantType,
size: entities.length
};
});
}
Relationship Validation
Qwello validates relationships to ensure they are logically consistent:
function validateRelationships(graph: KnowledgeGraph): void {
// Check for contradictory relationships
const contradictions = findContradictoryRelationships(graph);
// Resolve contradictions based on confidence scores
for (const contradiction of contradictions) {
if (contradiction.rel1.attributes.confidence > contradiction.rel2.attributes.confidence) {
// Keep rel1, mark rel2 as low confidence
contradiction.rel2.attributes.confidence =
Math.min(0.3, contradiction.rel2.attributes.confidence);
contradiction.rel2.attributes.contradicted_by = contradiction.rel1.id;
} else {
// Keep rel2, mark rel1 as low confidence
contradiction.rel1.attributes.confidence =
Math.min(0.3, contradiction.rel1.attributes.confidence);
contradiction.rel1.attributes.contradicted_by = contradiction.rel2.id;
}
}
// Check for transitive relationships
const transitiveRelations = findTransitiveRelationships(graph);
// Add inferred relationships with appropriate confidence
for (const transitive of transitiveRelations) {
// Check if the inferred relationship already exists
const existingRel = graph.relationships.find(r =>
r.source === transitive.source &&
r.target === transitive.target &&
r.type === transitive.inferredType
);
if (!existingRel) {
// Add inferred relationship with lower confidence
graph.relationships.push({
source: transitive.source,
target: transitive.target,
type: transitive.inferredType,
attributes: {
description: `Inferred from transitive relationship`,
confidence: 0.4, // Lower confidence for inferred relationships
inferred: true,
inferred_from: [transitive.rel1, transitive.rel2]
}
});
}
}
}
This comprehensive documentation covers the technical details of Qwello's Knowledge Graph System, from its structure and entity types to the sophisticated algorithms for entity resolution, graph merging, and enrichment. The system's flexible design allows for representation of diverse knowledge domains while maintaining a consistent structure for visualization and querying.