adv-nlp-3

PropBank and Universal PropBank

Sources:

Both use the same frame files.

Differences:

Universal PropBank is based on universal dependencies, while PropBank is based on constituency.
- In PropBank files, there’s parentheses around predicate and argument labels to signify constituency.
- In Universal PropBank, heads (of a phrase/constituent) are labeled.
UP 1 and 2 are different. We’ll use UP 1.
- In the second version, annotations are done on a span-level. (BIO-labels)

Universal PropBank

In universal dependencies, arrows run from heads to dependents.
- In the

Assignments

Both due on March 7th. That’s in 3.5 weeks. That’s about 1.5 weeks per assignment.

Assignment 1

Preprocessing

Replicate sentences for every predicate.

Features must be self-sufficient! You can’t use anything other than

In the notebook, print the three features in unencoded form
[ ]

Assignment 2

Post-processing: recombine subword token labels!!!

Midterm

[2/3] Basics of SRL
[2/3] Differences PropBank vs Universal propbank (mostly focus on the data structure of universal propbank (be able to read the annotation))
Basics of ML for this course
- [1/3] Basics of traditional ML (feature engineering, interaction across architectures Log reg, SVM, Naive Bayes)
- [2/3] Basics of Transformers/fine-tuning (incl. subword tokenization)

Reflection on midterm

I’m mostly very happy with the ease at which I completed the test. My doubts arose mainly around:

Feature interactions with logistic regression
Dependency paths

Feature interactions refer to whether a model is able to model the combined functionality of two features. The example given in the exam was clarifying: data points for SRL in a two-dimensional space were given. The x-axis expressed whether the target word occured before or after the predicate. The y-axis expressed whether the sentence was in passive or active voice. The reason this example was chosen is because the subject and thus ARG-0 of a sentence likely occurs before the predicate in active voice, and after the predicate in passive voice. However, a logistic regression algorithm that learns weights for these features separately, can’t model this relationship, because it can only draw linear decision boundaries. This is because every feature is mapped to one weight (per class). So a model can only learn a linear relationship between a feature and the model output. If you take, for example, the feature ‘before/after predicate’, the logistic regression model can’t learn a weight for the ARG-0 class. This is because in passive sentences, ‘after’ is more likely to correlate with ARG-0, while in active sentences ‘before’ is more likely to correlate with ARG-0.

To solve for this, during feature engineering, features with such interactions should be combined into a single new feature.

For example:

Target word occurs before predicate and sentence has active voice: more likely to be ARG-0
Target word occurs before predicate and sentence has passive voice: less likely to be ARG-0
Target word occurs after predicate and sentence has active voice: more likely to be ARG-0
Target word occurs after predicate and sentence has passive voice: less likely to be ARG-0

As shown in the examples above, in a multiclass logistic regression, all 4 values for this feature have a linear predictive power for the class ARG-0.

Note that string features are hashed to numbers and represented as one-hot encodings in a sparse feature space.

Dependency paths

Dependency paths are lists of one or more "". Dependency paths always go from head to dependent.