Case-Based Reasoning

Mentor: Dr. Andrew McCallum (CICS, UMass Amherst)

Collaborator: Rajarshi Das, Nicholas Monath, Manzil Zaheer

A knowledge graph (KG) is formed by the entities as nodes and relations between entities as directed edges. KG completion is the problem of answering queries of the type (e, r, ?) i.e. given a query entity e and query relation r, find the entity e_ans that has the relation r with e. As you can imagine, this task is used to complete partial KGs by guessing missing edges.

Given a new problem, a Case-based Reasoning system:

  1. Retrieves similar cases seem in the past
  2. Reuses solutions to previously solved cases
  3. Revises solutions to apply to the current query
  4. Retains modified solutions as new cases

We adapt this system for knowledge graph completion by using the training graph as the store of solved cases. Similar entities are found by nearest neighbors using a simple one-hot representation based on connected relations (this captures the notion of entity types). Solutions are paths (sequence of relations) that reach the answer node. Then starting from the query entity, we traverse the graph using the paths collected. If such a system can achieve high accuracy, it is appealing owing to its simplicity, interpretability, and scalability.

We demonstrated that this simple system is highly performant in A Simple Approach to Case-Based Reasoning in Knowledge Bases.

After performing an error analysis on this system, we then imposed a probabilistic structure to allow the system to weight the answers. Concretely, we use the prior probability of a path reaching the tail entity for the given query relation and the precision of that path type to downweight spurious paths. We use clustering to help aggregate probabilitities. We demonstrated in Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion that this systems matches/outperforms baselines on FB122, WN18RR and Nell-995. Moreover, we demonstrated that this method can be applied to dynamic growing KGs with minimal overhead whereas the best embedding based methods deteriorate with successive KG updates.

We further entended this strategy to Case-based Reasoning for Natural Language Queries over Knowledge Bases by relying on the strengths of pre-trained generative language models. In this setting, we are expected to generate a SPARQL program given a natural language question. We utilized cases as additional input when conditionally generating the output program (this is reminiscent of prompting and in-context learning that has gained attention in recent years).

The previous approach requires supervision in the form of output SPARQL programs. In Knowledge Base Question Answering by Case-based Reasoning over Subgraphs, we further relax this assumption; we diesn a system to answer natural language questions when only answer annotations are available. We designed a contrastive learning based training scheme for graph neural networks (GNN). The answer nodes are found by matching analogous nodes between query and case subgraphs.

Ameya Godbole
Ameya Godbole
PhD Student

My research interests are reasoning and generalization in NLP.