AI-Me · Nathaniel's Portfolio

Roadmap towards developing an intelligence to answer questions and direct users in this blog to improve search and understanding of myself. I call her, AI-Me (Amy), a robotic sister and listener.

My Motivation

I’ve went through job applications and found myself thinking about all my relevant projects to choose for my cover letter. Most of the time, I have more than one and it’s hard to keep it short and simple without a 2 page cover letter. With the completion of my CoverLetter Template to handle visual style, I’m now looking at ways to improve the structure to determine project and workplace examples. For comparison, I have a plethora of hand-written cover letters I send to companies. I spend roughly 15-30 minutes to write them and when I don’t get the job, it hurts. With it, I have plenty of data from my past cover letters, which reflects me, and are not AGI generated concepts or scraped. Truly growing from failures.

Audience Personas

While designing this, I keep in mind the types of visitors of my porfolio/personal website and wish for Amy to be able to answer all of their burning questions.

Recruiters
Friends
Wanderers

Recruiters

These visitors are going to find me through my portfolio and resume links provided when I apply. They’ll be quickly looking to evaluate a candidate and probably read 1-2 projects to see if I’ll match. I generally make it a rule to only apply to companies I’m interested in, so when they visit, it’s like catching a big fish.

Their journey would focus on the about and projects. It’s rare they’ll get to a blog unless it’s a tie-breaker.

Friends

I write my blog with the intent of helping my friends learn more about what I do. I have a wide range of technical and non-technical friends from Boomers to Zoomers, where my culture and values could fit a mix of both. You’ll find more about this as your read the blogs.

Wanderers

I keep my blog personal and anti-seo. Ironically, this means I use my SEO skills to develop a website that will not appear high on search results, by keeping it minmalist, metadata free, and lock down web scrappers with a robots.txt. Wanders find me organically through my Github and only my Github or shared friend.

I do it as a lever for those that find my site to be real people I want to interact with, not some random corporate or scammer using my personal information. I trust all site visitors that land here, and as part of that trust I don’t censor my PII in my resume or blogs.

Release Log

Semantic Sheets (Research)
Resume Prompts (Sourced)
Blog understandings (Prompted)
Confidence Intervals (Evaluation)
Tuning with Rules Reranker (Manual)
Integration with site (Code)

Semantic Sheets

I’ve looking through many options, but decided to go with the simplest choice of Semantic Reactor for Input/Response models. There were many other considerations besides Semantic Reactor. My ideal choice would be a K-Graph (Knowledge Graph), based model to establish connections between the user’s question intent and the knowledge within my resume and blogs. This choice would’ve allowed me to cite sources of where the data came from. Instead, using Semantic Reactor, there is no Knowledge Graph being created so the model will instead just pick from a handwritten response depending on question intent. It’s nowhere near as cool, but it’s also not as complex (or data intentsive). Afterall, it’s run on client-side javascript.

Still, this approach is better than a generic 1:1 hardcoded decision tree: if (“this exact question”) then (“this exact answer”). What the Semantics look like will be if (“this open-ended question”) then (“this most probablistic answer greater than 0.6”) else (“generic I don’t know).

Resume Prompting

My next step is going writing questions about my resume based on my interview experience, job descriptions, and what I want to share about my resume. On my resume, I write about my skills, growth, and workplace. Breaking down skills, there are technical and non-technical, but they all bottle down to 3 things: Code, Write, Lead. In terms of my growth, I focus on the environment: Academia, Startups, Big Tech. Finally for workplace, I describe my audiences and industry.

So first I’ll answer questions about myself, load the responses into the Semantic Sheets, and see if the model can properly match the answers.

Blog Understandings

Once the resume prompting and verification is done, I’ll move on to expand it’s knowledge through feeding it information from the blog. This is a major factor of why I wanted to get tags running on this site, so I could include tag links for post discovery. I have quite a lot of posts, so I may ask an AI to assist me in condensing this information. The condensed information will be vetted by me to have it still be purely my ideas.

For instance, it should be able to answer what was asked in the Codex challenge, how to do cold calls, why I started blogging, and more.

Confidence Intervals

This part I’m not sure how to best implement, but saw two choices between a frequency-based matching algorithm and a transformer model. The frequency-based matching algorithm is similar to what I used for SuppCheck at a Hackathon where we count keywords and return the highest number of matches. This can be relatively easy to do with a wordbank of keywords that correlate to each of the 6 distinct tags, but the performance will likely be poor. On the other hand there’s transformer models, I’m looking at BERT or FastText to handle tagging the responses and questons to return.

Confidence Intervals will help in securing the right recommendations based on queries. So if you ask it a question, Amy can pull the relevant posts to read.

Tuning with Rules Reranked

The nice thing about Semantic Reactor is that fine tuning can be wholly controled through a set of given rules that I have full control over. Any candidate list can have rules attached to it. Add another sheet (tab) and name it as your candidate list but with “.rules” appended to it. These let me assign weights to make “I don’t know” responses less common, and tune to add depth to the model.

Rules, Disclaimers, Pitfalls

This isn’t perfect, and here’s why.

AI-Me is not Me

Amy will not have any information that I did not share on my blog. For instance, if you ask it a politically charged statement, it won’t know how to answer and will either give an I don’t know, or the wrong answer. Likewise, I’m training the model to detect when an answer is restricted and will respond with “I cannot say, but perhaps it is under NDA”.

Time is frozen

My opinions may change over time, but the model is not trained on the most recent data at all instances of my life. So, don’t ask it time-gated questions. It has no respect to time. This means as I add more blog posts it won’t know about the information until I manually retrain it and fine tune it, which will be rare. I will add a knowledge cut-off disclaimer for Amy when ready.

The understanding and knowledge is at a fixed point. If events change, such as software deprecation, the responses are wrong.

If I’m not right, It’s not right

I’m human and can be wrong so please do not cite it as an irrefutable reference. Just like any chat bot out there, it’s not equipped with the knowledge to answer beyond-surface level questions. It uses a BERT model instead of GPT, aka trained on less, and directly optimized to answer focused questions about my experience in my stead. For direct answer, please explore my blog instead.

To further improve the model, I’ll go and update any content that’s become dated or could be improved to answer more questions.

Progress

When this is ready, I’ll set my sheet to public so you can search the sheet instead of the model.

Semantic Reactor Spreadsheet