CivicLens | About

About CivicLens.

CivicLens has a goal of putting the public back in public commenting. Our aim is to make public commenting opportunities more accessible and approachable for the public.

How CivicLens Works

How we find existing public commenting opportunities

Regulations.gov is a federal government site that allows individuals to understand and comment on federal proposed rules, rules, public notices, or other documents.

We pull data on all public commenting opportunities from the regulations.gov API on a nightly basis. We were inspired by a couple of prior projects ( regulations-public-comments and regulations-comments-downloader ) and we build on their work by:

adding a guide to how the commenting process works, and
making data from regulations.gov more understandable by improving the search functionality and providing meaningful analysis on existing comments.

How we make public commenting more accessible

We seek to make public commenting easier to understand by synthesizing information from the regulations.gov, including:

Document titles: We make regulation document titles more readable and easier to understand, cutting out confusing government jargon.
Unique comments and form letters: We compare all comments posted on a document and identify which comments are unique and which appear to come from form letters.
Representative comments: We group similar comments together and identify comments which “represent” that group. Our goal is to make it easier to quickly understand a range of perspectives on a given topic.
Comment topics and sentiment: We identify the topics addressed in different groups of comments as well as the sentiment (how positive or negative the tone is) of all comments.

We explain our specific methodology in the section below.

How we use AI

CivicLens uses natural language processing and Large Language Models (LLMs), to make finding and commenting on regulations more accessible. These techniques generate useful content and information but they can be imperfect.

The models we are using are “unsupervised”, meaning that there is no built-in method to verify their accuracy. We have built in safeguards to ensure the content and information we provide is as reliable as possible, but some errors or mis-generated text could occur.

Methodology

Document titles

When you search CivicLens for regulations to comment on, the titles you see for popular regulations are generated by natural language processing. We found that document titles on regulations.gov can be hard to understand and full of government jargon, so we used Google's flan base model to create more understandable titles in plain english based on the document summary.

Unique comments and form letters

When searching for a document or looking at a specific document page, you'll see numbers indicating how many unique comments versus form letter submissions there are. These numbers are estimates created by comparing the semantic similarity of comments. If lots of comments are 99% similar, we assume they are form letters. If not, we count them as unique comments. We measure this similarity using the SBERT paraphrase mining model that calculates the cosine similarity of comment pairs.

Representative comments

On popular document pages, you'll see a display of what we call "representative comments". We group similar comments together and identify comments which “represent” that group. To find these comments, we create a network graph of all the comments and connect them by their similarity score. Then we identify network clusters (Louvain Communities) as groups of similar comments. The most central comment in the cluster is the one that is most semantically similar to all the comments in the cluster. We take all the central comments and display them as representative comments.

Content topics and sentiment

We identify the topics addressed and sentiment of representative comments. Any document with representative comments on CivicLens will also have a graph that shows the main commenting topics as well as the average sentiment used by commenters on the subject. The topics are created by first clustering common themes in comments through a Hierarchical Dirichlet Process. The highest probability terms for each topic are chosen to build a vector of strings to represent topics for a document. These terms are filtered through a version of Google's flan model fine-tuned to create labels for topics. Represented comments are additionally analyzed for sentiment, using a RoBERTa-base model. Comments are classified as either positive, negative, or neutral based on their sentiment. Statistics on topics and their sentiments are calculated by counting the number of representative comments that correspond to a given topic, and cross referencing that with the representative comments sentiment. Document topics are used as additional search parameters which allows our site to have better search results for topics that users are interested in than are possible on regulations.gov.

About us.

We are a group of University of Chicago Master of Science in Computational Analysis and Public Policy students. We made this site as a class project for Software Engineering for Civic Tech, taught by Professor James Turk.

We embarked on this project because we felt there was an opportunity to make public commenting opportunities more accessible. We felt it was important to be faithful to the underlying data; we have sought to summarize and communicate the documents and comments available on regulations.gov as accurately as possible.

Our code lives on Github and our documentation lives on CivicLens Docs. We welcome comments, bug reports, and feedback!

Claire Boyd

Claire is a data scientist and quantitative researcher with a passion for experimenting with new recipes.

Abe Burton

Abe is a data scientist with a background in econometrics and computer science who spends his free time convincing people that disc golf is a real sport.

Gregory Ho

Greg is a researcher with a background in poverty, housing, and urban development, passionate about applying mathematical and computational methods to enhance societal well-being.

Andrew Dunn

Andrew Dunn is public sector consultant who enjoys solving new problems with data science.

Reza Pratama

Reza is an internal auditor skilled in data science and information systems audit.

Jack Gibson

Jack is a technologist and public policy researcher with dreams of spending more time on his bicycle than his laptop.

John Christenson

John is a political technologist and analytics engineer who desires to utilize technology to make people's lives easier. When he's not working, he can often be found daydreaming that he's chilling in the ocean waves.