CivicLens has a goal of putting the public back in public commenting. Our aim is to make public commenting opportunities more accessible and approachable for the public.
Regulations.gov
is a federal government site that allows
individuals to understand and comment on federal proposed
rules, rules, public notices, or other documents.
We pull data on all public commenting opportunities from
the regulations.gov API on a nightly basis. We were inspired
by a couple of prior projects (
regulations-public-comments
and
regulations-comments-downloader
) and we build on their work by:
We seek to make public commenting easier to understand by synthesizing information from the regulations.gov, including:
CivicLens uses natural language processing and Large Language Models
(LLMs), to make finding and commenting on regulations more accessible.
These techniques generate useful content and information but they can
be imperfect.
The models we are using are “unsupervised”, meaning that there is no
built-in method to verify their accuracy. We have built in safeguards to
ensure the content and information we provide is as reliable as possible,
but some errors or mis-generated text could occur.
When you search CivicLens for regulations to comment on, the titles you see for popular regulations are generated by natural language processing. We found that document titles on regulations.gov can be hard to understand and full of government jargon, so we used Google's flan base model to create more understandable titles in plain english based on the document summary.
When searching for a document or looking at a specific document page, you'll see numbers indicating how many unique comments versus form letter submissions there are. These numbers are estimates created by comparing the semantic similarity of comments. If lots of comments are 99% similar, we assume they are form letters. If not, we count them as unique comments. We measure this similarity using the SBERT paraphrase mining model that calculates the cosine similarity of comment pairs.
On popular document pages, you'll see a display of what we call "representative comments". We group similar comments together and identify comments which “represent” that group. To find these comments, we create a network graph of all the comments and connect them by their similarity score. Then we identify network clusters (Louvain Communities) as groups of similar comments. The most central comment in the cluster is the one that is most semantically similar to all the comments in the cluster. We take all the central comments and display them as representative comments.
We identify the topics addressed and sentiment of representative comments. Any document with representative comments on CivicLens will also have a graph that shows the main commenting topics as well as the average sentiment used by commenters on the subject. The topics are created by first clustering common themes in comments through a Hierarchical Dirichlet Process. The highest probability terms for each topic are chosen to build a vector of strings to represent topics for a document. These terms are filtered through a version of Google's flan model fine-tuned to create labels for topics. Represented comments are additionally analyzed for sentiment, using a RoBERTa-base model. Comments are classified as either positive, negative, or neutral based on their sentiment. Statistics on topics and their sentiments are calculated by counting the number of representative comments that correspond to a given topic, and cross referencing that with the representative comments sentiment. Document topics are used as additional search parameters which allows our site to have better search results for topics that users are interested in than are possible on regulations.gov.
We are a group of University of Chicago Master of Science in Computational Analysis and
Public Policy students. We made this site as a class project for
Software Engineering for Civic Tech,
taught by Professor James Turk.
We embarked on this project because we felt there was an opportunity to make public
commenting opportunities more accessible. We felt it was important to be faithful to
the underlying data; we have sought to summarize and communicate the documents and
comments available on regulations.gov as accurately as possible.
Our code lives on
Github
and our documentation lives on
CivicLens Docs.
We welcome comments, bug reports, and feedback!