[CWB] How to use int() to get all the sentences with a numeric positional attribute higher then some value
Stephanie Evert
stefanML at collocations.de
Tue May 31 19:51:45 CEST 2022
> I have a corpus for which each sentence we have an structural attribute s_lsent_linguistic_features that can have a value from 0 to 1.
> I want to filter the sentences with a value lower than 0.3.
> I've seen in the documentation that one can use the int built-in function to make comparisons with values that should be interpreted as numbers.
> I'm using something like this
> <s>[_.s_sent_quality_score = "0\.[1|2|3].*"]
> But I was wondering, whether using the int built-in function it could be written in a better/easier way.
Not directly: int() does exactly what its name says and converts the annotated string to an integer if possible. In this case, you'll probably always get a 0 result.
However, you could convert your annotations to fixed-point representation, e.g. multiply by 1000 and round to integer for three decimal digits of precision. So e.g. 0.3 would be stored as the string "300" in your s-attribute.
Then your query translates to
<s> [int(_.s_sent_quality_score) >= 100 & int(_.s_sent_quality_score) < 400]
Best,
Stephanie
More information about the CWB
mailing list