Customer support has a tough job. They interact with a computer that has a 6 billion neuron brain (you). To make matters worse, both the customer and the customer support personnel bring emotions to the interaction. Wouldn't it be nice if you could easily understand the right thing to say at just the right moment to the customer? Or to know that you took the best course of action given the situation? This is a first part of a project I'm working on that will do just that: Predict how your customer will feel given everything that's been said thus far in text. The basic idea is that there are patterns that occur in natural conversation. For example, when one person asks a question, that influences what will be said next by the other person. (i.e. "Shall I answer this question?") Thanks to *yanran.li/dailydialog.html, we have a labeled dataset that tells us the speech act and sentiment of utterances in a conversation between two people. *This dataset is ~13k text chat conversations between two people with each utterances labeled as one of the four speech acts {'commissive', 'directive', 'inform', 'question'} and one of the six emotions {'happy', 'sad', 'fear', 'surprise', 'anger', 'disgust'} ApproachThis tool is a rest api that developers can use to augment their existing chat tools (or add to extensions that connect to Slack, Intercom, or other messaging apps). You can play with the rest api right now at rest.chatty.ryanglambert.com. Right now there are two endpoints you can play with. One of them will take text as copied directly from slack. The other will accept utterances that have already been split apart. The first portion of this project is focused purely on classifying the speech act itself. This will be a meta feature for use in the ultimate goal of predicting how a customer feels. The Speech Act Classifier The speech act classifier is an ensemble / voting classifier. It comprises a voting combination of Support Vector Classifier, Gradient Boosted Tree, and Logistic Regression model. The features consist of: unigram lemmas and some somewhat complicated spacy dependency parse tree tokens that are pulling out tokens consisting of "subj part of speech -> verb lemma -> obj part of speech -> interjection lemma -> auxiliary lemma" PerformanceThe classifier does really well with Question and Inform speech acts. Directive and Commissive not as much. For more in depth look see the python notebook here:
https://github.com/Ryanglambert/chatty/blob/master/research/daily_dialogue/Speech%20Act%20Classifier%20Final.ipynb There is more work to be done! Until then, please feel free to play with this tool here: rest.chatty.ryanglambert.com
0 Comments
The results were competitive with the Princeton benchmark: http://3dshapenets.cs.princeton.edu/ ModelNet10 Accuracy: 93% ModelNet10 Mean Average Precision: 88% ModelNet40 Accuracy: 82% ModelNet40 Mean Average Precision: 70% So what are Capsule Networks? Capsule Networks are this cool new network architecture designed to generalize better than CNNs. For an explanation of what's going on please go here: https://www.youtube.com/watch?v=pPN8d0E3900&t=7s. This paper was an investigation in their performance for information retrieval of CAD models. Why that would be useful has to do with how capsule networks handle rotational variance. CNNs basically don't handle it at all. The visualization below is what you get if you modulate one of the capules by 1.5 std in either direction. In the paper, this is referred to as "dimensional perturbations". ![]() Although the primary goal of achieving rotation invariance on 3D CAD models wasn't achieved, it was fun to play with a new architecture! Give it a read and don't be bashful in the comments! ![]()
(note: Pubmatch.co is down indefinitely, it may come back in the future!) PUBmatch.co is a tool I thought of while I was at Metis Data Science Bootcamp. You can see my slides from my presentation here. This post has a companion python notebook if you'd like to follow along here: Latent Semantic Indexing Notebook Latent Semantic Indexing appealed to me because it involves the use of Singular Value Decomposition. Singular Value Decomposition is at the heart of signal processing. Signal processing is involved in so many systems that we use, it's the unsung hero of the information age. Telephones, radar, non-destructive testing, data compression, and movie special effects. Treating conversation like a signal simplifies the process of extracting meaning. Not buying it? Consider this: How many different ways you can express the same ideas and concepts. In communication we take our ideas and put together (compose) the appropriate words (frequencies). We say these words to others (raw signals) to communicate concepts. Other people hear these words (raw signals) and decompose them into different meanings (frequencies). What does it look like? The above pictures illustrate the ability to "decompose" a signal into "principal components". In the example above we're "decomposing" a given signal into two parts. How does this relate to LSI (Latent Semantic Indexing)? Imagine that each one of those signals in "SVD Original" is a sentence. Because we've decomposed these two signals we can throw one away. We'll throw away the Noise and keep the SVD smooth signal. However, with LSI we're usually filtering out hundreds of thousands of dimensions. The same idea still applies though: Decompose the signal, throw away the parts you don't care about, keep the ones that have meaning. Let's do a small example of LSI pseudo-by-hand. I have seven sentences. I'd like to extract the signal or meaning from each one of these sentences so I can see how similar they are to one another. The bacon egg and cheese was there. Bacon goes well with egg and cheese. Bacon is not cheese. There are books about cheese. I read books about cheese. I read books. You read books. What sort of signal is in there? LSI will start by having us get rid of "stop words" then put the words into a matrix counting the number of occurrences of each word in each document. The bacon egg and cheese was there. Bacon goes well with egg and cheese. Bacon is not cheese. There are books about cheese. I read books about cheese. I read books. You read books. Now put them into a matrix. Now each sentence is represented by the number of occurrences of words. This is called a "Bag of Words" or "Term Document Matrix". Each document can be imagined as a vector whose direction is within a hyperspace whose axes are represented by each of the words. It's called a hyperspace because as soon as you have more than 3 words your space you can't visualize it. Now let's decompose this system into 2 principal components. I use the scipy library to do this. ` from scipy.sparse.linalg import svds as SVDS ` ` SVDS(<the matrix above>)` This outputs three matrices: docs, eigen_roots, terms_T. We're interested in comparing documents so we will use the docs Matrix. In the above matrix, we're representing each doc in a two dimensional space. Since this is convenient to view for humans so let's visualize these sentences in 2d space. The X and Y axes are linear combinations of the preexisting words. You can see the X axis weighs the word "bacon" negatively whereas the Y axis weighs it positively by roughly the same order of magnitude. This would cause sentences that have and do not have the word bacon, to be further away from one another. Look at the sentences that had the word bacon, they're all pointing in a different direction than the sentences that did not have bacon in them. What can I do with this though? Now you can check similarity of sentences within this reduced space you've created. This new space we've created contains the "semantic meaning" we've chosen to care about. Cosine SimilarityCosine Similarity is just 1 - Cosine. You use Cosine because it is only a function of the angular distance between the vectors and thus ignores the magnitudes. This is important. If someone is talking about a Bacon Egg and Cheese McMuffin and you're writing a book about the history of the Bacon Egg and Cheese McMuffin, you'll be able to compare the two if you're using cosine similarity, but if you just computed, say, Euclidean distance, the two topics would appear to be vastly unrelated. Ok let's compute the cosine similarity between two of these sentences. "There are books about cheese" "You read books" For this I'll use `from scipy.spatial.distance import pdist` `1 - pdist((<you read books about cheese>,<I read books>))` Cosine Similarity = 0.763 We can also look at the entire corpus and see how each sentence compares to all the others. For more reading, or if you're interested in just jumping right in and doing this yourself I suggest you checkout: Gensim
Tools used: python, scipy, matplotlib, numpy, jupyter notebook PUBmatch.co is my passion project completed while at Metis Data Science Bootcamp. PUBmatch.co makes it easier to parse through the giant open access database PubMed by allowing you to input anything from a news article clipping to an email thread. Watch the presentation here: PUBmatch uses a technique called Latent Semantic Indexing to parse through everything ever published on PubMed (48GB!), finding the most conceptually similar research articles to a given input document. This project is open source. Check it out at www.github.com/ryanglambert/pubmatch
Overview: Stochastic SimulationsFound at edx.org (week 3 and 4). The goal is to model virus growth, via monte carlo simulation, in a patient over time to understand the behavior of virus growth and its interaction with time and a number of prescriptions. We use a monte carlo simulation in this case because we understand the microscopic behavior of each virus (reproduction probabilities, clearance probabilities) but want to extrapolate to a population of viruses. There are two major classes: Patients, and Viruses. Each “Patient” instance holds a list of “Virus” instances that have various probabilities assigned to things like: mutation probability, reproduction probability, clearance probability. Depending on the random numbers generated the different outcomes (whether to clear, whether to reproduce, whether to mutate) will be enacted. Example: Each virus has is a “Clearance Probability”. This is the probability that a virus will die at a given iteration. For these tests the “Clearance Probability was 5%. We generate a random number in python with In [1]: import random In [8]: random.random() <= .05 Out[8]: False In this example, since the outcome is false the virus particle would not reproduce. For more reading related to montecarlo simulations: https://docs.python.org/2/library/random.html, and https://en.wikipedia.org/wiki/Stochastic_simulation Simulation Comparison: No Treatment, 1 Drug, and 2 Drugs at various times
Simulation 1: No DrugsSimulation 2: Drug @ T = 0 "Ideal Case"Receiving Treatment at T = 0 results in 0 mutations and therefore no resistances developed. This is obviously ideal and unrealistic. Since the default starting resistance to each drug is "false" at T = 0 , and since viruses aren't allowed to reproduce in the presence of a drug they're not resistant to, you can see there is no reproduction and all viruses clear by roughly T = ~70. With a final population of 0 in all patients. Simulation 3: Drug @ T = 75Simulation 4: Drug @ T = 150Simulation 5: Drug @ T = 300For Prescriptions administered at T = 75, T = 150, and T = 300, there is a marginal difference in number of patients cured (virus pop < 50). Although, due to T = 300 not being at a steady state we can't include that in our comparison. Simulation 6: 1st Drug @ T = 0 2nd Drug @ T = 150As expected, this looks exactly like the test with one drug at T = 0. If the patient is cured before T = 100 then a drug at T = 150 should show no difference on either of the histogram or the time series charts. Simulation 7: 1st Drug @ T = 75 2nd Drug @ T = 150Simulation 8: 1st Drug @ T = 150 2nd Drug @ T = 150For sim 7: T = 75 T = 150 and Sim 8: two drugs at T = 150 , the results were surprising. I reran these a couple of times to see that I just hadn't mislabeled them. What I expected was for the Sim 7 to have more cured patients since any treatment at all was started earlier. However, Sim 8 had more cures even though both drugs were administered later. I think I have an explanation. When the patient receives a Drug 1 at T = 75 the virus population hasn't quite hit the "steady state" ceiling that slows reproduction. So viruses that haven't reached their steady state limit yet have more opportunities to reproduce, and therefore more opportunities for resistance mutation. For Sim 8, the drugs are administered at the same time, and at a steady state virus population which means two things: There are fewer opportunities for resistance mutation, and the mutations have two drugs to mutate resistance to simultaneously giving them an effective mutation probability of .25% ( .5% * .5%) at time of drug administration. More analysis would be necessary to characterize this difference in more detail. These two simulations also highlight a key difference in the "Time Window of Adminstration" that results in a cure (virus population < 50). Simulation 9: 1st Drug @ T = 150 2nd Drug @ T = 300Sim 9 is kind of funny. The drugs are spaced far enough apart that the viruses can mutate against each one independently enough to return to a steady state close to what would have happened if they had taken no drugs at all. ConclusionFor the realistic cases (No Drug at T = 0) it appears that taking two drugs simultaneously instead of one or two at different times allows for a larger window of administration (i.e. not requiring you to catch the virus at an early enough time) and higher likelihood of cure. Simulation 3 - 5 and 9 resistant populations seem to show that it is actually worse to take 1 drug "not early enough" or to take 1 or 2 drugs too far apart. Doing so would result in developing mutated viruses that no longer respond to either of the drugs administered. ThoughtsThis simulation doesn't consider things like:
which helps me understand how this is a whole entire niche industry in itself. (https://en.wikipedia.org/wiki/Bioinformatics) I can also see the use of very powerful computers for this kind of analysis since each simulation run here took roughly 2 minutes running on a MacBook 2.6 gHz 8 GB RAM. Learning to run these kinds of simulations in the cloud is something I'm planning on the horizon. Tools used: matplotlib, numpy, Python Source: https://github.com/Ryanglambert/virus_simulation |
Archives |