| Title |
Predicting patent litigation outcomes using large language model-based data extraction and machine learning |
| Translation of Title |
Patentų ginčų baigčių prognozavimas taikant didžiaisiais kalbos modeliais pagrįstą duomenų išgavimą ir mašininį mokymąsi. |
| Authors |
Stoškus, Tomas |
| Full Text |
|
| Pages |
53 |
| Keywords [eng] |
patentų ginčai, rezultatų prognozavimas, dideli kalbos modeliai, informacijos išgavimas, mašininis mokymasis, teisės analizė patent litigation, outcome prediction, large language models, information extraction, machine learning, legal analytics |
| Abstract [eng] |
Existing patent litigation datasets lack reliable outcome information, limiting research on how cases resolve after filing. This study extends the USPTO Patent Litigation Docket Reports Data by us ing a large language model to extract litigation outcomes from docket entry text, then links these labeled cases to patent characteristics from PatentsView and constructs attorney experience metrics from the litigation history. The extraction pipeline classified 29,084 singlepatent cases filed between 2003 and 2020, achieving 81.1% accuracy when validated against handcoded labels on a sample of 300 cases. We decompose outcome prediction into three binary classification tasks corresponding to litigation stages: survival (dismissal versus continuation), settlement (settlement versus adjudication), and adjudication (plaintiff versus defendant win). XGBoost achieved the best discrimination for survival (AUC 0.771) and settlement (AUC 0.716), while Random Forest performed best for adjudication (AUC 0.778, 95% CI: 0.7180.834). Feature importance analysis revealed that attorney characteristics and venue indicators dominated earlystage prediction, patent complexity measures gained prominence at the settlement stage, and technology classification was the strongest predictor at adjudication. Patent text embeddings did not improve prediction at any stage. Temporal validation using a 2016 cutoff showed performance degradation for settlement prediction, reflecting shifts in the litigation landscape during the study period. |
| Dissertation Institution |
Vilniaus universitetas. |
| Type |
Master thesis |
| Language |
English |
| Publication date |
2026 |